Allison Campbell and Lou Terminello

Allison Campbell, Lou Terminello chosen after national searches

Allison Campbell and Louis Terminello have been selected as the inaugural associate laboratory directors of two recently created science directorates at the Department of Energy's Pacific Northwest National Laboratory.

Campbell, a chemist, will head the Earth and Biological Sciences Directorate, and Terminello, also a chemist, will head the Physical and Computational Sciences Directorate. Both have been serving in acting roles leading their respective organizations since Oct. 1, when the laboratory created a new structure to increase the impact of its science mission.

The selections are the result of national searches during which PNNL sought the best candidates to lead the two major thrusts of the laboratory's science mission. Together, Campbell and Terminello are responsible for programs of the Department of Energy's Office of Science — the largest single supporter of research staff at PNNL — working to deliver scientific innovation and impact to address some of the most challenging problems in science today.

With Campbell's leadership, EBSD scientists work toward discoveries in bioenergy, the restoration of our environment, climate science, microbiology, and biomedical science. With Terminello's leadership, PCSD scientists explore areas including materials science, computational science, chemistry, particle physics and fusion energy.

"The new directorates enable us to better align collaboration and resources to pursue our science vision of understanding, predicting and controlling the behavior of complex adaptive systems, such as those found in biology and chemistry," said Steven Ashby, PNNL director. "Both Allison and Lou stood out in their abilities to help achieve that vision."

The pair has extensive experience working closely with organizations that fund PNNL and its researchers — recognizing scientific problems of national importance early on, forming collaborations within PNNL and beyond to tackle those challenges, and recruiting the best scientific talent to make discoveries in the laboratory.

Campbell began her career at PNNL in 1990 as a postdoctoral fellow in the materials science department. She became a full-time staff scientist in 1992 and later the technical group leader for materials synthesis and modification. In 2005 she became director of EMSL, the Environmental Molecular Sciences Laboratory, a DOE Office of Science user facility at PNNL; under her leadership, scientists from around the world sought out EMSL to pursue new science for environmental and energy applications. She has strengthened the collaboration between EMSL and other user facilities, notably the Joint Genome Institute, creating ways for researchers to more easily integrate the capabilities of multiple user facilities into their research.

As a scientist, she invented a process for coating biomedical bone implants with biocompatible and anti-infection coatings in the hopes of improving recovery time for patients and extending the life of joints. That work led to six U.S. patents, an R&D 100 award, and a Federal Laboratory Consortium award for excellence in technology transfer.

Recently, Campbell became president-elect of the American Chemical Society, the world's largest scientific society. She also is a fellow of the American Association for the Advancement of Science, a member of the Washington State Academy of Sciences, and a member of the National Academies Chemical Sciences Roundtable. She earned a bachelor of arts degree in chemistry from Gettysburg College in Pennsylvania and a doctorate in chemistry from State University of New York at Buffalo.

Terminello has served in a series of leadership roles in several research areas since joining PNNL in 2009. He served as chief scientist for fundamental and computational sciences, where he was responsible for setting and implementing a strategic vision to build PNNL's fundamental science capabilities. There he led initiatives in chemical imaging and dynamic systems that have greatly strengthened the lab's leadership in chemical imaging. He has also been chief science and technology officer for PNNL's National Security Directorate, as well as chair of the laboratory's Science and Technology Committee, which helps develop and guide the lab's integrated science and technology strategy and priorities.

Before joining PNNL, Terminello held several leadership positions at Lawrence Livermore National Laboratory, including division leader for materials science and technology and deputy associate director for programs in the Physical and Life Sciences Directorate.

Terminello earned his bachelor of science degree in chemistry at the Massachusetts Institute of Technology and his Ph.D. in physical chemistry from the University of California, Berkeley. Much of his scientific career has focused on the nexus of chemistry, physics, and materials science and technology. He has produced more than 160 publications on synchrotron radiation studies of nanostructured and interfacial materials, has earned several patents, and has served on numerous scientific advisory and review committees. He is a fellow of the American Physical Society and was recently named a fellow of the American Association for the Advancement of Science.

A "dream team" of experts in sensors, electronics, data analysis and neuroscience has been awarded a $5 million grant to help unravel the mysteries of the brain and cross-train an international group of neuroscientists and engineers.

The University of Michigan leads the project, funded by the National Science Foundation, to stimulate and sense brain activity at the single-neuron level and reconstruct neural circuits with supercomputer simulations.

"The goal of all of this is better health care," said Kensall Wise, the William Gould Dow Distinguished University Professor Emeritus and a co-leader of the project. "By bringing together people with different approaches and expertise, the result could be a quantum leap forward in neuroscience and our understanding of the brain."

The advances in knowledge and neural implant technology could lead to better prosthetics and treatments for conditions like Parkinson's disease, deafness, blindness, paralysis and depression, Wise said.

To investigate which neurons produce which behaviors, the team intends to use a technique called optogenetics, which relies on genetically modified neurons that can be switched on and off by different colors of light. By activating and deactivating neurons, researchers can map the neurons and patterns of activity that are responsible for different behaviors and abilities. These tests are typically done with rodents.

At present, most neuroscientists typically use probes that measure only tens of neurons at once and then reconstruct where the firing neurons were located with software, said Euisik Yoon, who heads the project and is a professor of electrical engineering and computer science. He added that recording in the brain is a bit like trying to use a microphone to record from individuals cheering inside a massive arena.

"If you are sitting outside a stadium, you may hear the crowd roar," Yoon said. "That's what we have been doing. What we want to do is get into the stadium with a lot of microphones, monitoring and eavesdropping on how these neurons are sending signals to understand what is happening inside the stadium."

The microphones are optoelectrodes: devices that combine electrodes, which measure activity from nearby neurons, with miniature LEDs that can turn the neurons on and off. The optoelectrode arrays resemble tiny combs, each tine equipped to address hundreds of individual neurons. The tines are so small that the brain tissue does not react to them as much as with larger probes.

Collaborators in Singapore, South Korea and Germany will help to optimize the arrays for the experiments. By the end of the five-year project, neuroscientists should regularly be able to stimulate and measure a thousand neurons at a time using the new technology. The huge improvement in detail will mean new data processing challenges. Kenneth Harris of University College London in the U.K. will develop algorithms for identifying and analyzing signals from individual neurons

Meanwhile, collaborators at New York University, University of Puerto Rico and University of Hamburg-Eppendorf in Germany, will use optoelectrodes and data processing capabilities to answer questions about brain function. These include how memories are stored and retrieved, how fear is learned and evolves over time, how signals from nerves are processed for sensing, and how activity patterns in early life affect adult brain activity.

John Seymour, assistant research scientist in electrical engineering and project co-leader, emphasized that the task of understanding the brain is a long-term challenge.

"People will be studying the brain for a very long time, well beyond our lifetimes," he said. "One of our main objectives is to develop the next generation of neurotechnologists and neuroscientists who have been cross-trained in both fields and can really push technology both into neuroscience and eventually into health care."

Fifty-five students and researchers will participate in the international exchange program. Over five years, 40 undergraduate students from U.S. universities will attend a neuroscience and neurotechnology "boot camp" at U-M and then go to a collaborating institution for a summer research project. The grant also allows for 15 graduate students or postdoctoral researchers from the U.S. labs to spend an extended period of time working in a collaborating lab.

The grant was awarded under the Partnerships for International Research and Education program. Other co-leaders include György Buzsáki, a neuroscientist at New York University; Gregory Quirk, a psychologist at the University of Puerto Rico; Karel Svoboda, a neuroscientist at the Howard Hughes Medical Institute, Janelia Research Campus; and Edward Stuenkel, a neuroscientist at U-M and the director of education for this program.

Other key researchers from outside the U.S. include Alex Gu at the Institute for Microelectronics in Singapore; Dong Jin Kim of the Korea Institute for Science and Technology in Seoul, South Korea; Oliver Paul of the University of Freiburg in Germany; and Ileana Hanganu-Opatz at the University Medical Center Hamburg-Eppendorf.

The 3rd Heidelberg Laureate Forum (HLF), from August 23-28, hosts a multi-faceted discussion riveted on Big Data and resolving challenges produced by computational science on Tuesday, August 25. The Hot Topic at the this year’s Forum, ‘Brave New Data World’, broken down into presentations from leading authorities, moderated workshops and an open debate among the participants. The Heidelberg Laureate Forum Foundation (HLFF) strives to create the opportunity for progressive discourse to flourish, which is most effectively championed by the conglomeration of diverse mindsets.

The Hot Topic at the 3rd HLF dives into enigmatic questions that are woven throughout computational science. How secure is our data? How is intellectual property evolving? Should we blindly accept massive data mining? How is computational science most effectively used for good? How should we regulate this ‘brave new data world’? Set to address these issues are: Alessandro Acquisti of Carnegie Mellon University, Kristin Tolle of Microsoft Research and Jeremy Gillula of the Electronic Frontier Foundation. Four workshops for the participants are moderated by: Ciro Cattuto of the ISI Foundation, Megan Price of Human Rights Data Analysis Group, Peter Ryan of the University of Luxembourg and Frank Rieger of Chaos Computer Club.

Divergent backgrounds are fundamental to achieving a well-balanced and progressive dissection of any issue. This is precisely why the panel selections emanating from varied professions, from elite academia to powerhouse companies to progressive research centers. There are three substantial key-note speakers scheduled to prelude the four workshops tackling consequential current issues. Following the workshops, which are led by experts who are assisted by selected young researchers, the session culminates with an unbarred debate. 
Speakers and subjects:

Alessandro Acquisti (Carnegie Mellon u) – “Privacy in the Age of Augmented Reality”

As a Professor of Information Technology and Public Policy at the Heinz College, Carnegie Mellon University (CMU), Alessandro Acquisti is to say the least, versed in privacy. He is the director of the Peex (Privacy Economics Experiments) lab and the co-director CBDR (Center for Behavioral and Decision Research), both at CMU, producing fascinating studies in privacy protection. Acquisti will present his research and experiments on the inadequacy of the online "notice and consent" mechanisms for privacy protection.

Kristin Tolle (Microsoft Research) – “Using Big Data, Cloud Computing and Interoperability to Save Lives”
In addition to her illustrious career at Microsoft Research Outreach as the Director of the Data Science Initiative, Kristin Tolle is one of the editors and authors of one of the earliest books on data science, The Fourth Paradigm: Data Intensive Scientific Discovery. She is currently concentrating on developing a program using data to improve user experiences across the board. Tolle will discuss the challenges to privacy of combining multiple datasets, as well as the crucial utility of this procedure in reacting to natural disasters.

Jeremy Gillula (Electronic Fronteer Foundation) – “Big Data and the Surveillance-Industrial Complex”
Jeremy Gillula’s work as a Staff Scientists for civil society organization Electronic Frontier Foundation (EFF) has enabled him to cover a broad range of issues. Though he acknowledges the benefits of autonomous technologies, he is simultaneously aware of the threats they pose to our civil liberties. Gillula will discuss the misuse of online tracking for advertisement and spying and what computer scientists can do to get better results without sacrificing privacy.

Ciro Cattuto (ISI Foundation) – “From the Black Box in Your Car, to the Black Box Society”
He is the Scientific Director and leader of the Data Science Laboratory at the ISI Foundation. Ciro Cattuto’s focuses include behavioral social networks, digital epidemiology, online social networks and web science. His interests have led to him founding SocioPatterns, which measures and maps human spatial behavior. Cattuto will start with the example of black boxes insurances put in cars to tackle the broader issue of scoring and the potential of a black box society.

Megan Price (Human Rights Data Analysis Center) – “Big Data Promises and Pitfalls: Examples from Syria”
Human Rights Data Analysis Group (HRDAG) uses statistical and scientific methods to find the most accurate truth which paves the way to accountability. Megan Price is the director of research as HRDAG and was the lead statistician in both Guatemala and Syria. She will explain her work in using data to estimate the number of war victims in Syria.

Peter Ryan (University of Luxembourg) – “Back Doors, Trap Doors, and Crypto Wars”
Peter Ryan has been a Professor of Applied Security at the University of Luxembourg for over six years. As a pioneer in applying algebras to modeling and analysis of secure systems, he has over two decades experience in cryptography and information assurance. Ryan will discuss the attempts to introduce government-controlled backdoors in encryption algorithms and the pitfalls of this strategy.

Frank Rieger (Chaos Computer Club) – pending
The Chaos Computer Club (CCC), based in Germany, has been the largest hacker association in Europe for over thirty years. CCC is a series of decentralized clubs whose focuses range from technical research to anonymity services. Frank Rieger has held the status as one of the honorary speakers of the CCC for several years. – subject pending.

The Hot Topic has been coordinated by Michele Catanzaro, author of “Networks: A Very Short Introduction” and a highly accomplished freelance science journalist. Catanzaro, who will moderate the debate, sees the 3rd HLF as an ideal environment and “fertile ground for making scientists provocative and constructive allies to the public”.

The Hot Topic will be held in the New Auditorium of Heidelberg University, Grabengasse , 69117 Heidelberg.

Cites statistics as 1 of 3 foundational communities in data science

 In a policy statement issued today, the American Statistical Association (ASA) stated statistics is "foundational to data science"--along with database management and distributed and parallel systems--and its use in this emerging field empowers researchers to extract knowledge and obtain better results from Big Data and other analytics projects.

The statement also encourages "maximum and multifaceted collaboration" between statisticians and data scientists to maximize the full potential of Big Data and data science.

"Through this statement, the ASA and its membership acknowledge that data science encompasses more than statistics, but at the same time also recognize that statistical science plays a critical role in the fast-growing field," said ASA President David R. Morganstein, who is director of the statistical staff for Westat, Inc. "It is our hope the statement will reinforce the relationship of statistics to data science and further foster mutually collaborative relationships among all key contributors in data science."

The ASA statement acknowledges the lack of consensus on what constitutes data science, but notes the following essential role of each of the three computer science and statistics professional communities that are foundational to the field:

  • Database Management, which enables transformation, conglomeration, and organization of data resources
  • Statistics and Machine Learning, which convert data into knowledge
  • Distributed and Parallel Systems, which provide the computational infrastructure to carry out data analysis

"At its most fundamental level, we view data science as a mutually beneficial collaboration among these three professional communities, complemented with significant interactions with numerous related disciplines," says the ASA statement.

It continues by elaborating on the key role of statistics in the data science field: "Framing questions statistically allows researchers to leverage data resources to extract knowledge and obtain better answers. The central dogma of statistical inference, that there is a component of randomness in data, enables researchers to formulate questions in terms of underlying processes and to quantify uncertainty in their answers. A statistical framework allows researchers to distinguish between causation and correlation and thus to identify interventions that will cause changes in outcomes. It also allows them to establish methods for prediction and estimation, to quantify their degree of certainty, and to do all of this using algorithms that exhibit predictable and reproducible behavior. In this way, statistical methods aim to focus attention on findings that can be reproduced by other researchers with different data resources. Simply put, statistical methods allow researchers to accumulate knowledge."

The statement also calls on the ASA membership to expand the cooperative relationships already in place among data science practitioners: "For statisticians to help meet the statistical challenges faced by data scientists requires a sustained and substantial collaborative effort with researchers with expertise in data organization and in the flow and distribution of computation. Statisticians must engage them, learn from them, teach them and work with them. Engagement must occur at all levels: with individuals, groups of researchers, academic departments and the [data science] profession as a whole."

New problem-solving strategies are needed to develop "soup-to-nuts" pipelines that start with managing raw data and end with user-friendly efficient implementations of principled statistical methods and the communication of substantive results. Engendering these next-generation strategies will be fostered from the ground up in data science and statistics programs at colleges and universities across the country, explains the statement.

"Statistical education and training must continue to evolve--the next generation of statistical professionals needs a broader skill set and must be more able to engage with database and distributed systems experts. While capacity is increasing within existing and innovative new degree programs, more is needed to meet the massive expected demand. The next generation must include more researchers with skills that cross the traditional boundaries of statistics, databases and distributed systems; there will be an ever-increasing demand for such 'multi-lingual' experts," concludes the statement.

Big data sets are important tools of modern science. Mining for correlations between millions of pieces of information can reveal vital relationships or predict future outcomes, such as risk factors for a disease or structures of new chemical compounds.

These mining operations are not without risk, however. Researchers can have a tough time telling when they have unearthed a nugget of truth, or what amounts to fool's gold: a correlation that seems to have predictive value but actually does not, as it results just from random chance.

A research team that bridges academia and industry has developed a new mining tool that can help tell these nuggets apart. In a study published in Science, they have outlined a method for successively testing hypotheses on the same data set without compromising statistical assurances that their conclusions are valid.

Existing checks on this kind of "adaptive analysis," where new hypotheses based on the results of previous ones are repeatedly tested on the same data, can only be applied to very large datasets. Acquiring enough data to run such checks can be logistically challenging or cost prohibitive.

The researchers' method could increase the power of analysis done on smaller datasets, by flagging ways researchers can come to a "false discovery," where a finding appears to be statistically significant but can't be reproduced in new data.

For each hypothesis that needs testing, it could act as a check against "overfitting", where predictive trends only apply to a given dataset and can't be generalized.

The study was conducted by Cynthia Dwork, distinguished scientist at Microsoft Research, Vitaly Feldman, research scientist at IBM's Almaden Research Center, Moritz Hardt, research scientist at Google, Toniann Pitassi, professor in the Department of Computer Science at the University of Toronto, Omer Reingold, principle researcher at Samsung Research America, and Aaron Roth, assistant professor in the Department of Computer and Information Science in the University of Pennsylvania's School of Engineering and Applied Science.

Adaptive analysis, where multiple tests on a dataset are combined to increase their predictive power, is an increasingly common technique. It also has the ability to deceive.

Imagine receiving an anonymous tip via email one morning saying the price of a certain stock will rise by the end of the day. At the closing bell, the tipster's prediction is borne out, and another prediction is made. After a week of unbroken success, the tipster begins charging for his proven prognostication skills.

Many would be inclined to take up the tipster's offer and fall for this scam. Unbeknownst to his victims, the tipster started by sending random predictions to thousands of people, and only repeated the process with the ones that ended up being correct by chance. While only a handful of people might be left by the end of the week, each sees what appears to be a powerfully predictive correlation that is actually nothing more than a series of lucky coin-flips.

In the same way, "adaptively" testing many hypotheses on the same data, each new one influenced by the last, can make random noise seem like a signal: what is known as a false discovery. Because the correlations of these false discoveries are idiosyncratic to the dataset in which they were generated, they can't be reproduced when other researchers try to replicate them with new data.

The traditional way to check that a purported signal is not just coincidental noise is to use a "holdout." This is a data set that is kept separate while the bulk of the data is analyzed. Hypotheses generated about correlations between items in the bulk data can be tested on the holdout; real relationships would exist in both sets, while false ones would fail to be replicated.

The problem with using holdouts in that way is that, by nature, they can only be reused if each hypothesis is independent of each other. Even a few additional hypotheses chained off one another could quickly lead to false discovery.

To this end, the researchers developed a tool known as a "reusable holdout." Instead of testing hypothesis on the holdout set directly, scientists would query it through a "differentially private" algorithm.

The "different" in its name is a reference to the guarantee that a differentially private algorithm makes. Its analyses should remain functionally identical when applied to two different datasets: one with and one without the data from any single individual. This means that any findings that would rely on idiosyncratic outliers of a given set would disappear when looking at data through a differentially private lens.

To test their algorithm, the researchers performed adaptive data analysis on a set rigged so that it contained nothing but random noise. The set was abstract, but could be thought of as one that tested 20,000 patients on 10,000 variables, such as variants in their genomes, for ones that were predictive of lung cancer.

Though, by design, none of the variables in the set were predictive of cancer, reuse of a holdout set in the standard way showed that 500 of them had significant predictive power. Performing the same analysis with the researchers' reusable holdout tool, however, correctly showed the lack of meaningful correlations.

An experiment with a second rigged dataset depicted a more realistic scenario. There, some of the variables did have predictive power, but traditional holdout use created a combination of variables with wildly overestimated this power. The reusable holdout tool correctly identified the 20 that had true statistical significance.

Beyond pointing out the dangers of accidental overfitting, the reusable holdout algorithm could warn users when they were exhausting the validity of a dataset. This is a red flag for what is known as "p-hacking," or intentionally gaming the data to get a publishable level of significance.

Implementing the reusable holdout algorithm will allow scientists to generate stronger, more generalizable findings from smaller amounts of data.

Page 3 of 406