High Performance Computing in Bioinformatics -- or -- Should I drive the Porsche or the Station Wagon

Presenter: Robert Lyon, IBEST UI

Abstract:

Bioinformatics, the study and application of computer science to biological questions, has rapidly expanded over the last few years. Next generation sequencing is now capable of producing petabytes of data so powerful tools and enormous amounts of computational power are needed to process the data. Unfortunately, bioinformatics tools and applications are lagging years behind the hardware and many tools have never been designed to take advantage of computing in a parallel environment. This has left scientists with a conundrum. When is it appropriate to use HPC computing clusters (the Porsche) and when is appropriate to use single multi-core servers with hundreds of gigabytes of system RAM (the Station Wagon)? This question is applicable not only to bioinformatics, but also to the many fields that seek to represent the complex relationships found in large amounts of data.

The limitations of parallel HPC and single multi-core systems in context with the tools and algorithms designed to process biological data will be explored. We will also discuss areas within bioinformatics where computer scientists with a deep understanding of parallel computing, data processing and performance optimization at a hardware and software level can make significant contributions to the field. Other topics included are: A very brief history of HPC computing; requirements and limitations for parallel processing; message passing and multi-core processing libraries; and component bottlenecks when dealing with parallel systems and large amounts of data.

Robert Lyon is the director of the IBEST Computational Resources Core