King's College London is Using High Performance Computing to Understand Links Between Genetics, Disease

In April 2010, medical researchers at the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London are for the first time using a High Performance Computer (HPC system).  This will enable researchers to more quickly analyse, store and archive vast quantities of data generated during their search to understand the role of genetics in a range of common health issues, such as the development of cancer.

“The sequence of the human genome has been known for ten years now so we are using new sequencing technologies to sequence specific regions of the genome in large numbers of people in order to help understand the contributory factors to a variety of common complex disorders and developmental defects,” says Dr. Rebecca Oakey, Reader in Epigenetics, Department of Medical & Molecular Genetics, School of Medicine, King’s College London.  “These include skin diseases such as psoriasis, inflammatory bowel disease and the step by step development of vascular disorders, psychiatric disorders, diabetes, infection and immune disease as well as genetic components in cancer development.”

Dr Oakey adds: “To do so we need innovative sequencing technology to generate the data and the processing power to analyse, store and archive the data.”

The two sequencing machines in use in the King’s College London and NIHR Biomedical Research Centre’s genomics facility collectively generate up to 50 billion base pairs of usable DNA sequence data every 10 days. The HPC system can reduce the time necessary to analyse this data 20-fold or more, reducing the time scales for analysis from days to hours.

The HPC system’s bespoke design, implementation, configuration, ongoing support and user training is handled by OCF plc.  

“Data generated and stored by organisations around the world is growing at an alarming rate,” says Julian Fielden, managing director, OCF plc.  “According to IBM, worldwide data volumes are doubling every two years and IDC puts the total worldwide data figure at 281 Billion Gigabytes. 

Sequencing machines specifically - similar to those in use in the NIHR Biomedical Research Centre’s funded genomics facility at King’s College London - are generating vast quantities of data on a regular basis.  The comprehensive Biomedical Research Centre is a great example of an organisation that acknowledges data on its own delivers little or no value; organisations must analyse and take value from their data so that the findings can be translated to improved patient care at the earliest opportunity.  In many cases this analysis is best performed using an HPC system.” 

 

Normal 0 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;}

The HPC System design includes:

- IBM’s iDataplex server hardware to meet the Biomedical Research Centre’s high performance, low power consumption and low weight requirements

- The iDataplex includes ultra-low latency, 10Gb Ethernet switching modules from BLADE Network Technologies. These G8124 switches allow for high-speed, highly efficient and low cost networking for the HPC environment, whilst being very energy efficient

 

- Panasas ‘plug and play’ ActiveStor Series 8 clustered storage using its built-in Panasas ActiveScale distributed parallel file system which enables:

-- Biomedical Research Centre staff to have faster access to results because the HPC system can read and write data quickly from a high-performance storage system, rather than waiting for data to be extracted from a traditional, slower storage system

-- Current storage of up to 180 TB of raw data

-- An infinitely scalable storage system to meet researchers’ current and future storage requirements

Ease of deployment, administration and upgrades to storage hardware

-- An IBM TS3310 Tape Library Unit with Tivoli Storage Manager to enable long-term, secure, off-site data back-up

Julian continues: “By using well configured and innovative technologies from companies like IBM and Panasas, it is now possible for companies of any size in any industry to own a practical, economical, environmentally friendly HPC.”