CTC Consultants Help Create New Center & System for The Feinstein Biorepository

The Cornell Theory Center (CTC), an interdisciplinary research center at Cornell University focused on providing cyber-infrastructure resources for research and education, has announced that CTC systems and database consultants helped design and create a new high performance computing center and informatics system for the Biorepository at The Feinstein Institute for Medical Research. "As a high performance data warehouse and large-scale data-mining environment, the new system allows The Feinstein Biorepository to manage vast amounts of information derived from the collection, processing and analysis of large numbers of biological specimens. A data-management system of this magnitude is unprecedented within academic research facilities," explains Anthony Ingraffea, CTC's acting director. The Biorepository at The Feinstein Institute was built in 1998 and has grown to store hundreds of thousands of human samples of different types, such as serum, plasma, DNA, cells, tissues and tumors, along with extensive amounts of associated data, to support many large scientific studies. Both control and disease-affected samples are collected and managed along with clinical, laboratory and bioinformatics data. One segment of sample analysis that has grown dramatically in the past six months is the identification of single nucleotide polymorphisms, or SNPs. SNPs are DNA sequence variations that occur when a single nucleotide in a genome sequence is altered. SNPs make up about 90% of all human genetic variation and scientists believe SNPs may predispose people to a disease or influence their response to a drug. Currently, researchers at The Feinstein Institute are generating approximately eight to 10 million SNP genotypes each day, and they anticipate accumulating three billion or more SNP genotypes over the next year. "The difficulties in managing and manipulating these very large datasets required the creation of a new data center capable of high performance data management," said Robert Lundsten, Biorepository Director. "Management of research-subject annotation is also quickly becoming a high performance computing issue," he added. The Feinstein Biorepository informatics system includes a symmetrical multi-processor (SMP) Unisys ES7000 computer expandable to 256 GB of RAM running four 64-bit Intel Itanium 2 processors (expandable to 64 processors). The system is unique in that it runs Microsoft Windows Server 2003 Enterprise Edition R2. The platform was designed to run SQL Server 2005 64-bit Enterprise Edition. Data is stored directly through four host bus adapters to an EMC CLARiiON CX300 RAID disk array. The computing center also has an assortment of in-house 32-bit applications running on Dell PowerEdge servers and Dell PowerVault disk arrays. Creating an efficient data-management environment is the first step in developing an effective data-mining environment. "CTC's experience in data-management design was very helpful," Lundsten emphasized. "They know how to design systems and databases that optimize performance in a Microsoft Windows and SQL 2005 environment."