Waterloo built AI brings the capability of natural language processing to African languages

Researchers have developed an AI model to help computers work more efficiently with a wider variety of languages. 

African languages have received little attention from computer scientists, so few natural language processing capabilities have been available to large swaths of the continent. The new language model, developed by researchers at the University of Waterloo’s David R. Cheriton School of Computer Science, begins to fill that gap by enabling computers to analyze text in African languages for many useful tasks.

The new neural network model, which the researchers have dubbed AfriBERTa, uses deep-learning techniques to achieve state-of-the-art results for low-resource languages. Getty Images

The neural language model works specifically with 11 African languages, such as Amharic, Hausa, and Swahili, spoken collectively by more than 400 million people. It achieves output quality comparable to the best existing models despite learning from just one gigabyte of text, while other models require thousands of times more data.

“Pretrained language models have transformed the way computers process and analyze textual data for tasks ranging from machine translation to question answering,” said Kelechi Ogueji, a master’s student in computer science at Waterloo. “Sadly, African languages have received little attention from the research community.”

“One of the challenges is that neural networks are bewilderingly text- and computer-intensive to build. And unlike English, which has enormous quantities of available text, most of the 7,000 or so languages spoken worldwide can be characterized as low-resource, in that there is a lack of data available to feed data-hungry neural networks.”

Most of these models work using a technique known as pretraining. To accomplish this, the researcher presented the model with text where some of the words had been covered up or masked. The model then had to guess the masked words. By repeating this process, many billions of times, the model learns the statistical associations between words, which mimics human knowledge of the language.

“Being able to pretrain models that are just as accurate for certain downstream tasks, but using vastly smaller amounts of data has many advantages,” said Jimmy Lin, the Cheriton Chair in Computer Science and Ogueji’s advisor. “Needing less data to train the language model means that less computation is required and consequently lower carbon emissions associated with operating massive data centers. Smaller datasets also make data curation more practical, which is one approach to reduce the biases present in the models.”

“This work takes a small but important step to bringing natural language processing capabilities to more than 1.3 billion people on the African continent.”

Assisting Ogueji and Lin in this research is Yuxin Zhu, who recently completed an undergraduate degree in computer science at Waterloo. Together, they present their research paper, Small data? No problem! Exploring the viability of pretrained multilingual language models for low-resource languages, at the Multilingual Representation Learning Workshop at the 2021 Conference on Empirical Methods in Natural Language Processing.

URI researchers win $1.5 million NOAA grant to protect New England coastal communities, national parks, wildlife refuges from impact of sea-level rise, extreme weather

Researchers at the University of Rhode Island and Penn State University have been awarded a four-year, $1.5 million grant through the National Oceanic and Atmospheric Administration to study the effects of sea-level rise and how it may exacerbate the impact of extreme weather. The project will draw on expertise from researchers at URI’s Graduate School of Oceanography, its College of the Environment and Life Sciences, the Department of Ocean Engineering within the URI College of Engineering, and the URI Coastal Resources Center.

Other collaborative participants include the Schoodic Institute and the National Park Service. The overall goal of the project is to help communities, the National Park Service, and the U.S. Fish and Wildlife Service adapt and improve their resilience as the climate continues to change and extreme weather such as hurricanes and nor’easters continue to increase in terms of frequency and severity. Realistic 3D visualization for Eastham, Mass. along the Cape Cod National Seashore using Advanced Circulation (ADCIRC) modeling results of the March 2018 Nor'easter with 1.0 m sea level rise.

According to NOAA, the rate of sea-level rise is accelerating. Since 1993, the average global sea level has increased by 3.4 inches. By the end of the century, it is likely to rise at least one foot over 2000 levels. Sea level plays a role in flooding, shoreline erosion, and other hazards, affecting nearly 40 percent of the U.S. population living in high-population density coastal areas. However, despite what is known about sea-level rise, there is a lack of research available when it comes to how the impacts of nor’easters and hurricanes may be amplified as a result.

“There are a number of studies that have been done looking at just sea-level rise or just extreme weather, but what we’re really lacking in terms of clear understanding is the combined impact of these two phenomena,” said Isaac Ginis, professor of oceanography, who is leading the study. “This is especially important to us on the East Coast and in New England, where we’ve seen significant coastal flooding produced by waves and storm surge during nor’easters and hurricanes. How these effects are amplified by sea-level rise has been largely unexplored. This information gap inhibits our ability to properly plan for the future and is likely to lead to under-informed and ineffective adaptation measures.”    

The project will expand the body of research related to the effects of extreme weather and sea-level rise on five New England national parks and two wildlife refuges – Cape Cod National Seashore, Boston Harbor Islands National Recreation Area, and New Bedford Whaling National Historical Park in Massachusetts; Ninigret National Wildlife Refuge, Trustom Pond National Wildlife Refuge, and Roger Williams National Memorial in Rhode Island; and Acadia National Park in Maine – as well as their surrounding communities.

The project was supported by all four members of Rhode Island’s congressional delegation and received letters of support from more than 15 local communities and local non-profits.

“This federal research funding will help URI faculty and students gather valuable information to answer questions about changing sea levels and in turn protect coastal communities from the effects of climate change,” said U.S. Senator Jack Reed.

Using state-of-the-art atmosphere, storm surge and wave/erosion supercomputer modeling the team will provide high-resolution recreations of the impact of the future storm and sea-level rise scenarios, identifying vulnerabilities in the ecosystems and infrastructure of the identified sites and their adjacent communities. The modeling will also include hazard, risk, and adaptability assessments, as well as mitigation scenarios.

“Rising seas and worsening storms driven by climate change are two major challenges Rhode Island will face in the decades to come,” said U.S. Senator Sheldon Whitehouse.  “It’s fitting that the University of Rhode Island has been selected by NOAA to research ways to protect coastal areas from these dual threats.”

Researchers will work closely with the National Park Service, U.S. Fish and Wildlife Service, and stakeholders at the community level to tailor the research and translate the science in a way that can be incorporated into local resource management and adaptation measures to improve coastal resilience and to protect communities, people and infrastructure, and ecosystems.

“Each community has their own needs,” says Ginis, “but our modeling results will produce tailored and tangible information for local decision-makers – state and local governments, emergency management officials, town and city planners, and other stakeholders – to address their specific needs and enable them to plan and adapt as the sea level rises and the climate continues to change.”

Taking historical data into account as well as topography, geology, water depth, land elevation, natural processes such as shoreline changes, and human influence, the team will be able to project more than 50 years into the future, using 3-D visualization to provide computer simulations illustrating storm hazards and identifying potentially effective mitigation measures.

“As climate change continues to wreak havoc on communities across our state, it’s more important than ever to better understand the effects of rising sea levels and their impact during extreme weather events,” said Congressman Jim Langevin. “Devastating storms and flooding are becoming more frequent and severe, so we must invest in climate resiliency and adaption before it’s too late. I’m thrilled that the University of Rhode Island will now have the federal funding to lead the way on this critical area of research, and I look forward to reviewing the research.”

Ultimately, the project will open an important dialogue between researchers and local stakeholders, and Federal resource managers, and facilitate the development of practical and well-informed, science-based best practices that will guide future policies and resource management strategies to protect our coastal economy, environmental, cultural, and community resources.

“Climate change is one of the greatest challenges of our time, and we need to deal with this existential threat head-on,” said Congressman David Cicilline. “With the support of this federal grant, the University of Rhode Island and Penn State University will be able to study the impact climate-caused sea-level rise has on extreme weather. Their project will ultimately provide our coastal communities with the information they need to improve resiliency and plan for rising sea levels and severe weather events.”

IIT Bombay profs propose light-induced valleytronics in pristine graphene paves the way to quantum supercomputers

Valleytronics is an emerging field in which valleys, local minima in the energy band structure of solids, are used to encode, process, and store quantum information. Though graphene was thought to be unsuitable for valleytronics due to its symmetrical structure, researchers from the Indian Institute of Technology Bombay, India, have recently shown that this is not the case. Their findings may pave the way to small-sized quantum supercomputers that can operate at room temperature.

From the consumer’s side, it’s pretty easy to notice the giant strides that the field of electronics has made over the past few decades; with wearable gadgets, smart cities, self-driving cars, improved space missions, robots, holography, and supercomputers, the possibilities of technological advancement seem infinite. However, unbeknownst to most people, this accelerated trend of technological advancement fueled by electronics is rapidly coming to a halt as electronic components reach their practical limits. If we are to keep improving our computing power and capacity, we will need to find new ways to store and process data beyond the simple flow and charge of electrons, which is how modern electronics operate. Scientists find a way to use pristine graphene in valleytronics, a promising technology for encoding and processing quantum information

That is why quantum computers have recently become a hot topic. By encoding information in quantum phenomena, quantum supercomputers transcend the binary notion of each bit being either “0” or “1.” Instead, quantum bits exist as superpositions of “0” and “1” and can therefore take intermediate values. By exploiting superpositions through carefully designed algorithms, quantum computers could theoretically outperform conventional computers by several orders of magnitude in terms of speed. Sadly, it has proven difficult to find suitable quantum phenomena to encode information at room temperature. Existing computers, such as those owned by Google, IBM, and Microsoft, have to be kept at ultralow temperatures below –196.1°C, which makes them costly and impractical to operate.

Fortunately, there is a very promising approach for encoding quantum information that is actively being explored: valleytronics. Aside from their charge, electrons have another parameter that can be manipulated, namely their “valley pseudospin,” which is the valley that the electron occupies. These so-called valleys are local minima in the energy bands of solids, which dictate the energetic state and location of electrons. Valleys, with their occupation state governed by quantum mechanics, can be used to encode, process, and store quantum information at less restrictive temperatures.

Recently, a team of scientists from the Indian Institute of Technology (IIT) Bombay, India, and Max-Born Institut, Germany, achieved a breakthrough in the field of valleytronics. In their latest study, published in Optica, they present a way to perform valley operations in monolayer or pristine graphene, which was assumed to be impossible by other researchers in the field. As the poster child of carbon nanomaterials, graphene is made from carbon atoms in a hexagonal pattern and bears a plethora of favorable properties. Atomically thin layers of graphene have electron valleys but, due to the material’s inherent symmetry, they were deemed useless for valley operations.

Despite the odds, the team came up with a strategy to break graphene’s valley symmetry using light. Associate Professor Gopal Dixit from IIT Bombay, who led the study, explains: “By tailoring the polarization of two beams of light according to graphene’s triangular lattice, we found it possible to break the symmetry between two neighboring carbon atoms and exploit the electronic band structure in the regions close to the valleys, inducing valley polarization.” In other words, this enables the use of graphene’s valleys to effectively “write” information. Dr. Dixit also highlights that flashes of light can cause electrons to wiggle several hundred trillion times a second. In theory, this means valleytronics at petahertz rates is possible, which exceeds modern computational speeds by a million times.

One of the most attractive aspects of conducting valley operations in graphene is that it’s possible to do so at room temperature. “Our work could open the door to miniature, general-purpose quantum supercomputers that can be used by regular people, much like laptops,” remarks Dr. Dixit. With the higher computational speeds provided by quantum supercomputers, it will be much quicker to perform molecular simulations, big data analysis, deep learning, and other computationally intensive tasks. In turn, this will accelerate the development of new drugs and the elucidation of molecular structures, which will help in the search for cures to complex diseases including COVID-19. Let us hope this study helps quantum computers arrive at our lives sooner!