SCIENCE
BGI announces first release of updated bioinformatics software
Latest software enables more efficient and reliable results of a wide range of bioinformatics analysis and the download is free with early access now available
BGI (previously known as the Beijing Genomics Institute), the largest genomics organization in the world, released its latest bioinformatics software, including its Short Oligonucleotide Analysis Package (SOAP series, etc.), Population Genetics Analysis Package, and Parallelization and Optimization of Traditional Tools. These provide the latest and most advanced solutions for biologists, and enable more efficient and reliable results of a wide range of bioinformatics analyses.
“These new capabilities enhance and complement BGI’s existing state-of-the-art bioinformatics software applications for individual and population based research in animals, plants, microbes and human disease areas, meeting the requirements of leading research centers in the analysis and exploration of a wide range of biological data,” stated Yingrui Li, Director of Science and Technology Department in BGI. “This is the first time BGI has provided detailed insights into our latest bioinformatics applications, pipelines and tools,” he added.
The Short Oligonucleotide Analysis Package (SOAP series, etc.) has evolved from a single alignment tool to a suite of applications providing the complete solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder (SOAPindel), a structural variation scanner (SOAPsv), a de novo short reads assembler (SOAPdenovo), and a GPU-accelerated alignment tool (SOAP3-GPU). Today’s release updated the SOAP series, including SOAP3 GPU/CPU, SOAPdenovo 2, SOAPindel (graph-based), and SOAPsv (assembly-based). The new software was released at a press conference where several international specialists in R&D bioinformatics platforms shared their views of the updated bioinformatics tools.
“SOAP3 is a GPU-based software for aligning short reads with a reference sequence,” stated Chuang Yu from the Science and Technology Department of BGI. “When compared with its previous version SOAP2, SOAP3 can be up to tens times faster.”
“SOAPdenovo2, the latest update of the extremely popular SOAPdenovo package, can assemble more accurately, continuous and completed genomes,” said Zhenyu Li, a specialist of the bioinformatics R&D platform at BGI. “Furthermore, there are other improvements which make SOAPdenovo2 more feasible for different situations and analysis,”
When referring to the new SOAPindel (graph-based) tool, the Software Developer Jianliang Lu, said: “Indel Detection Based on Assembly perhaps offers the best chance for long indels and structural variations. Here we present SOAPIndel, based on de Bruiji graphs for calling longer indels more rapidly.”
The analysis of genetic diversity within a species is vital for understanding evolutionary processes at the population and genomic level. Population genetics analysis tools, based on the large quantities of data produced from population genomics research, aid the detection and analysis of variation in populations more efficiently and accurately. “Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits,” stated Haojing Shao, Technology Leader of Science and Technology Department of BGI. “Most variants identified so far confer relatively small increments in risk. To take a comprehensive look at variants, we provide new software for analyzing indel or even SV in large sample size, low coverage next generation sequencing data.”
Evan Xiang, R&D Director at the Flexible Computing Center of BGI, demonstrated how BGI’s new cloud based distributed solutions, Hecate & Gaea, are used to solve many research problems in a “flexible” manner. He noted, “Hecate is a distributed solution and is very “flexible” on a cloud computing platform, which can reduce the cost of de novo assembly by more than 50%. In contrast, Gaea, a cloud based solution, is able to balance the workload over the entire cluster, which can improve the efficiency of cluster usage by more than 30%.”
With the current rapid improvement of genetic and genomic technologies, the demand on storage and computing power has been increasing 10 fold every 12-18 months, which is far beyond the infamous Moore’s Law. “To tackle these difficulties, BGI and its collaborators are working on GPU accelerated bioinformatics tools, including alignment and variation detection, for example,” stated Dr. Bingqiang Wang, Director of the High Performance Bioinformatics Center of BGI. “The improvements in speed are impressive -- the prototype version alignment tool is 10-fold faster than its CPU counterpart, while SNP detection codes are about two magnitudes faster.”
Finally, BGI announced it is providing free access to download and use the updated software. These latest packages complement and build upon many current tools and approaches for data analysis, which can meet the requirements of leading research institutions to explore a wide range of biological data. Moreover, the software can help to perform genome data analysis, including assembly, gene prediction, annotation, repetitive sequence analyses, SNPs and evolution analysis. Each year, BGI will host a series of workshops on cutting-edge bioinformatics to help researchers learn how to use the latest software advances and explore more deeply their biological data.