CLOUD
Sun & Chinese Institute Collaborate on Genomics Research
By Steve Fisher, Editor In Chief -- Last Wednesday, the Genomics and Bioinformatics Center of the Chinese Academy of Sciences/Beijing Genomics Institute (BGI) announced the completion of a rice genome working draft of a major crop genome sequenced after the human genome. The Institute employed a new algorithm, Sun Microsystem’s Enterprise 10000 server and Chinese-made Dawning 3000 and 2000 supercomputers. To learn more about new algorithm, the role of Sun technology, and the overall significance of the announcement, Supercomputing Online interviewed Dr. Matthew Huang, deputy director of BGI and Dr. Stefan Unger, business development manager for computational biology in Sun’s Global Education and Research group. SCO: Please tell us about the relationship between the Beijing Genomics Institute and Sun Microsystems. UNGER: BGI is a Sun Center of Excellence in Genomics. Sun supports BGI’s bioinformatics research into alternative splicing algorithms and proteomics and has installed hardware at both the Beijing and Hangzhou sites of BGI. Other, Sun COEs in Computational Biology include the University of Wisconsin-Madison, Virgina Bioinformatics Institute, Delaware Biotechnology Institute, and University of Calgary—with more to come. SCO: Please explain the overall significance of sequencing this rice genome. HUANG: This is the first plant genome shotgun working draft to be published, with over 90 percent coverage of the 430 million bp in 12 chromosomes. -- It was performed ahead of the international rice genome (japonica) project. -- It is the first major genome sequencing project independently accomplished by a developing country, China -- This strain of rice, indicia, or “super hybrid rice,” which was invented by well-known agriculturist Yuan Longping, is a very productive strain. SCO: Please tell the readers about the new algorithm that enabled the assembly of the rice genome scaffold. HUANG: We introduce a whole-genome sequence assembler, RePS, that explicitly identifies exact 20-mer repeats from the shotgun dataset, and masks them out prior to contig assembly to minimize the number of assembly errors. It then constructs scaffolds, using clone-end pairing information to order-and-orient any non-overlapping contigs. Details of the initial contig assembly are handled by Phrap, an established sequence assembler that evaluates single-base error probabilities before making each join. We demonstrate on real sequence data, from both human and rice, so that accurate assemblies are possible even at rough-draft coverage of 4x to 6x. SCO: In detail, please describe the role Sun’s E10K played in this project. HUANG: The E10K was used extensively to assemble the rice genome scaffold, being a robust, reliable workhorse, as well as to perform annotation of the genome, including Blast searches. BGI also has a Sun 4500 server at its Hangzhou center. SCO: Please provide our readers with as much information as you can on the Chinese-made Dawning 3000 & 2000 supercomputers including their specific role(s) in the project. HUANG: They are IBM server clones, made by Institute of Computing Technology, Chinese Academy of Sciences. The 3000 model has 40 nodes 80 CPUs, 100BHz, 3Tb storage; the 2000 model has 10BHz and 1Tb storage. BGI has two Dawning 3000s and two Dawning 2000s. They carried out Blast homology searches, other annotation computation work, and they are also database servers. SCO: Does this announcement/discovery put China “on the map” as far as genomics research? If there were a top ten list of nations doing this sort of work, where would China place? HUANG: China is already on the map of major genomics players since its participation of international human genome project consortium. Last year’s annual human genome project strategy meeting was held in Hangzhou, China. (By the way, BGI has an affiliated center at Hangzhou, as well.) China is now the 6th country in the “genome club”, with BGI being the world’s 6th most productive genome center. We are now collaborating with the Danes on the pig genome, and will continue work on the rice genome, developing alternative splicing algorithms, and other projects, as well. SCO: Is there anything you’d like to add? UNGER: This is just the beginning of rice genome analysis, and the Sun/BGI collaboration. Sun provides a broad range of solutions for computational biology, from “desktop to teraflops,” with many different configurations possible to handle the various computational and databasing tasks of both genomics and post-genomics analysis, as well as molecular modeling, clinical informatics, medical imaging and more.