Three groups at The Scripps Research Institute (TSRI) have been awarded grants from the National Institutes of Health (NIH) to develop methods for computational modeling and to apply them to cutting-edge systems in biology and health.

"The three projects are highly symbiotic, each addressing a different state-of-the-art challenge in computational biology, but built using a common computational framework that will allow facile collaboration between the groups," said Professor Arthur Olson, founder of the Molecular Graphics Laboratory, which is currently part of the TSRI Department of Integrative Structural and Computational Biology.

Stefano Forli, assistant professor of integrative structural and computational biology, was awarded $2.7 million to continue development of AutoDock, the most widely used method for computational docking of drugs and inhibitors to targets of medicinal interest. The proposed work will include improvements to the scoring function, allowing faster and more accurate prediction of how drugs act, new tools for advanced structure-based design of drugs and simplified methods for use by a wide community of non-expert users.

David S. Goodsell, associate professor of molecular biology, was awarded $2.3 million to develop new methods for modeling the molecular structure of entire cells. The size and complexity of these structural models is unprecedented, and the project is currently leveraging methods developed by the gaming community. In collaboration with experimental scientists, the methods will be used to study bacterial nucleoid structure, cell division and other cellular processes.

Michel F. Sanner, associate professor of molecular biology, initiated a research project addressing a significant challenge: the incorporation of the dynamic nature of proteins into docking simulations. Years six through nine of this project have been funded with an award of $1.6 million. His group will continue development of AutoDockFR, a fast computational docking method with advanced features, to represent local and global receptor motions during binding of flexible drug-like molecules to biomolecular targets. 

CAPTION The new grant will support the development of computational methods for studying cellular and atomic structures. This example shows modeling protein flexibility with AutoDockFR.
CAPTION The new grant will support the development of computational methods for studying cellular and atomic structures. This example shows modeling protein flexibility with AutoDockFR.

Researchers apply machine learning tools to infer ancestry mix of individuals

The same algorithms that personalize movie recommendations and extract topics from oceans of text could bring doctors closer to diagnosing, treating and preventing disease on the basis of an individual's unique genetic profile.

In a study to be published Monday, Nov. 7 in Nature Genetics, researchers at Columbia and Princeton universities describe a new machine-learning algorithm for scanning massive genetic data sets to infer an individual's ancestral makeup, which is key to identifying disease-carrying genetic mutations.

On simulated data sets of 10,000 individuals, TeraStructure could estimate population structure more accurately and twice as fast as current state-of-the art algorithms, the study said. TeraStructure alone was capable of analyzing 1 million individuals, orders of magnitude beyond modern software capabilities, researchers said. The algorithm could potentially characterize the structure of world-scale human populations.

"We're excited to scale some of our recent machine learning tools to real-world problems in genetics," said David Blei, a professor of computer science and statistics at Columbia University and member of the Data Science Institute.

The cost of genetic sequencing has fallen sharply since the first complete mapping of the human genome in 2003. More than a million people now have sequenced genomes, and by 2025 that number could rise to 2 billion.

The technology to put this data into context, however, has lagged and remains one of the barriers to tailoring healthcare to an individual's DNA. To identify disease-causing variants in a genome, one of the goals of personalized medicine, researchers need to know something about his or her ancestry to control for normal genetic variation within a subpopulation.

"We can run software on a few thousand people, but if we increase our sample size to a few hundred thousand, it can take months to infer population structure," said Kai Wang, director of clinical informatics at Columbia's Institute for Genomic Medicine, who was not involved in the study. "This new tool addresses these limitations, and will be very useful for analyzing the genomes of large populations."

The researchers' algorithm, called TeraStructure, builds on the widely used and adapted STRUCTURE algorithm first described in the journal Genetics in 2000. The STRUCTURE algorithm cycles through an entire data set, genome by genome, one million variants at a time, before updating its model to both characterize ancestral populations and estimate their proportion in each individual. The model gets refined after repeated passes through the data set.

TeraStructure, by contrast, updates the model as it goes. It samples one genetic variant at one location, and compares it to all variants in the data set at the same location across the data set, producing a working estimate of population structure. "You don't have to painstakingly go through all the points each time to update your model," said Blei.

STRUCTURE is mathematically similar to a topic-modeling algorithm Blei developed independently in 2003 that made it possible to scan large numbers of documents for overarching themes. Blei's algorithm and its underlying LDA model have been used, among other things, to analyze published research in the journal Science to understand the evolution of scientific ideas and review regulatory meeting transcripts for insight into how the U.S. Federal Reserve sets interest rates.

More recently, Blei has experimented with statistical techniques to extend probabilistic models to massive data sets. One technique, stochastic optimization, developed in 1951 by statistician Herbert Robbins just before arriving at Columbia, uses a small, random subset of observations to compute a rough update for the model's parameters.

Continuously refining the model with each new observation, stochastic optimization algorithms have been enormously successful in scaling up machine learning approaches used in deep learning, recommendation systems and social network analysis.

In a 2010 paper, Online Learning for LDA, Blei and his colleagues applied stochastic optimization to Blei's earlier LDA model. In a later paper, Stochastic Variational Inference, they showed that stochastic optimization could be applied to a range of models. As Matthew Hoffman, a coauthor of both papers, now a senior research scientist at Adobe Research explains, "Stochastic optimization algorithms often find a good solutions before they've even analyzed the whole dataset."

In the Nature Genetics study, they apply these ideas to the STRUCTURE method. In their analysis of two real-world data sets--940 individual genomes from Stanford's Human Genome Diversity Project and 1,718 genomes from the 1000 Genomes Project--they found that TeraStructure performed comparably to the more recent ADMIXTURE and fastSTRUCTURE algorithms.

But when they ran TeraStructure on a simulated data set of 10,000 genomes, it was more accurate and two to three times faster at estimating population structure, the study said. The researchers also showed that TeraStructure alone could analyze data sets as large as 100,000 genomes and 1 million genomes.

Matthew Stephens, a genetics researcher at University of Chicago who helped develop the STRUCTURE algorithm, called TeraStructure's performance impressive. "I think these results will motivate future applications of this kind of algorithm in challenging inferences problems," he said

The study also received praise from other researchers working with big genetic data sets. "We now have the technology to create the data," said Itsik Pe'er, a computational geneticist at Columbia Engineering who was not involved in the study. "But this paper really allows us to use it."


CAPTION On simulated data sets of 10,000 individuals, TeraStructure could estimate population structure more accurately and twice as fast as current state-of-the art algorithms, the study found. TeraStructure alone was capable of analyzing 1 million individuals. Each vertical slice represents a person; the colors, their mix of ancestral populations. CREDIT Wei Hao/Princeton
CAPTION On simulated data sets of 10,000 individuals, TeraStructure could estimate population structure more accurately and twice as fast as current state-of-the art algorithms, the study found. TeraStructure alone was capable of analyzing 1 million individuals. Each vertical slice represents a person; the colors, their mix of ancestral populations. CREDIT Wei Hao/Princeton

$2.8 billion transaction combines CenturyLink data centers and colocation business with Medina Capital’s cybersecurity and analytics portfolio

Leading international private equity firm BC Partners and Medina Capital, a private equity firm that focuses on investing in companies in the cybersecurity, data analytics and IT infrastructure markets announced today the formation of a joint venture that combines a portfolio of data centers and the associated colocation business to be acquired from CenturyLink along with the acquisition of Medina Capital’s security and data analytics portfolio.

The new venture will deliver a global secure data infrastructure platform by combining 57 premium data centers along with a suite of highly-differentiated security and data analytics services from the Medina Capital portfolio, including:

  • Cryptzone
    Leader in the software-defined perimeter space, providing secure, dynamic user access to business critical applications across physical devices, private cloud and public cloud infrastructure.
  • Catbird
    Provider of software-defined segmentation, infrastructure visualization and security policy enforcement across cloud infrastructure.
  • Easy Solutions
    Security provider focused on the comprehensive detection and prevention of electronic fraud across all devices, channels and infrastructure, with a robust authentication platform and transaction anomaly detection.
  • Brainspace
    Leader in machine-learning software that powers advanced data discovery and analytics, insider threat detection and defense intelligence capabilities.

The new company will be an immediate leader in the global colocation market, with more than 3,500 customers and 2.6 million square feet of raised floor capacity, and is well-positioned within some of the fastest-growing segments of the information security market, which is estimated to grow to $113.4 billion by 2020.

 “There is a growing need by companies around the world who are seeking greater flexibility over their IT infrastructure and a greater focus on the demand for an expansive suite of applications to be closely and securely interconnected with customers, suppliers, software vendors, financial service providers and cloud providers. Our new venture is an answer to that need,” said Justin Bateman, a managing partner at BC Partners. “We are creating a completely secure, global infrastructure to exceed today’s information security, scale and availability challenges. We are pleased to partner with Manny and the team at Medina Capital who have deep experience and proven success incorporating disruptive technologies within a portfolio of top-tier data centers.”

Manuel D. Medina, founder and managing partner of Medina Capital, will lead the new company as chief executive officer. He will be joined by his executive team from Medina Capital, which includes the former senior leadership team of Terremark, a leading provider of data center, cybersecurity and infrastructure services that was acquired by Verizon in 2011 in a $2 billion transaction representing a 19x EBITDA multiple and 5x return on equity.

“We’re combining a worldwide footprint of best-in-class data centers with cutting-edge security and analytics services, integrating these capabilities into a global, highly secure platform that meets today’s critical enterprise, public sector and service provider demands for cybersecurity, colocation and connectivity,” said Manuel D. Medina, founder and managing partner for Medina Capital. “Our customers will be able to leverage a suite of on-net security and advanced analytics services deeply integrated into the data center.”

The transaction is expected to close in first quarter 2017, pending regulatory approvals and customary closing conditions. Branding for the new company will be announced at a later date.

LionTree Advisors acted as financial advisor to BC Partners and its consortium investors. Latham & Watkins LLP is serving as legal advisor and PricewaterhouseCoopers is serving as accounting advisor. Citigroup, JP Morgan, Barclays, Credit Suisse, Jefferies, HSBC, Macquarie and Citizens have underwritten the debt package to finance the acquisition.

Greenberg Traurig served as legal advisor to Medina Capital.

Joel Kastner studies young stars and planets

Rochester Institute of Technology professor Joel Kastner is broadening and deepening his research program on the origins of our solar system and planetary systems orbiting other stars while on four consecuutive fellowships and visiting positions during his sabbatical this academic year.

Kastner, professor in RIT's Chester F. Carlson Center for Imaging Science and the School of Physics and Astronomy, is the Study Abroad International Faculty Fellow for the month of November at the Arcetri Observatory in Florence, Italy. He is collaborating with former RIT postdoctoral fellow Germano Sacco and other Arcetri scientists to identify and study young stars within a few hundred light years of the sun using newly available data from the European Space Agency's Gaia space telescope.

He was also awarded two additional fellowships for 2017--the prestigious Merle A. Tuve Fellowship from the Carnegie Institution for Science Department of Terrestrial Magnetism in Washington, D.C., for his six-week residency there, starting in January 2017; and a Smithsonian Institution Short Term Visitor fellowship for his residency at the Smithsonian Astrophysical Observatory in Cambridge, Mass., in March and April 2017. 

Prior to his residency in Florence, Kastner spent two months as a visiting astronomer at the Institut de Planetologie et Astronomie de Grenoble, or IPAG, in France, studying the compositions of planet-forming disks around young stars in a collaboration with scientists there who work in the areas of interstellar and solar system chemistry.

"The astrophysicists at IPAG, Arcetri, Carnegie and the Smithsonian Astrophysical Observatory are combining observations with the world's most powerful astronomical facilities with sophisticated computer modeling to attack the complex problem of how planetary systems, including our own solar system, have come into being," Kastner said. "I feel very fortunate to be able to work so closely with so many 'black belt' astrophysicists during one sabbatical year."

An international research team led by Carnegie Mellon University has found that when the brain "reads" or decodes a sentence in English or Portuguese, its neural activation patterns are the same.

Published in NeuroImage, the study is the first to show that different languages have similar neural signatures for describing events and scenes. By using a machine-learning algorithm, the research team was able to understand the relationship between sentence meaning and brain activation patterns in English and then recognize sentence meaning based on activation patterns in Portuguese. The findings can be used to improve machine translation, brain decoding across languages and, potentially, second language instruction.

"This tells us that, for the most part, the language we happen to learn to speak does not change the organization of the brain," said Marcel Just, the D.O. Hebb University Professor of Psychology and pioneer in using brain imaging and machine-learning techniques to identify how the brain deciphers thoughts and concepts.

"Semantic information is represented in the same place in the brain and the same pattern of intensities for everyone. Knowing this means that brain to brain or brain to computer interfaces can probably be the same for speakers of all languages," Just said.

For the study, 15 native Portuguese speakers -- eight were bilingual in Portuguese and English -- read 60 sentences in Portuguese while in a functional magnetic resonance imaging (fMRI) scanner. A CMU-developed computational model was able to predict which sentences the participants were reading in Portuguese, based only on activation patterns.

The computational model uses a set of 42 concept-level semantic features and six markers of the concepts' roles in the sentence, such as agent or action, to identify brain activation patterns in English.

With 67 percent accuracy, the model predicted which sentences were read in Portuguese. The resulting brain images showed that the activation patterns for the 60 sentences were in the same brain locations and at similar intensity levels for both English and Portuguese sentences.

Additionally, the results revealed the activation patterns could be grouped into four semantic categories, depending on the sentence's focus: people, places, actions and feelings. The groupings were very similar across languages, reinforcing the organization of information in the brain is the same regardless of the language in which it is expressed.

"The cross-language prediction model captured the conceptual gist of the described event or state in the sentences, rather than depending on particular language idiosyncrasies. It demonstrated a meta-language prediction capability from neural signals across people, languages and bilingual status," said Ying Yang, a postdoctoral associate in psychology at CMU and first author of the study.

Discovering that the brain decodes sentences the same in different languages is one of the many brain research breakthroughs to happen at Carnegie Mellon. CMU has created some of the first cognitive tutors, helped to develop the Jeopardy-winning Watson, founded a groundbreaking doctoral program in neural computation, and is the birthplace of artificial intelligence and cognitive psychology. Building on its strengths in biology, computer science, psychology, statistics and engineering, CMU launched BrainHub, an initiative that focuses on how the structure and activity of the brain give rise to complex behaviors. 

CAPTION This image compares the neural activation patterns between images from the participants' brains when reading
CAPTION This image compares the neural activation patterns between images from the participants' brains when reading "O eleitor foi ao protesto" (observed image) and the computational model's prediction for "The voter went to the protest" (predicted image).

Page 1 of 406