Free dataset archive helps researchers quickly find a needle in a haystack

UCR STAR visualizes public spatio-temporal datasets through an interactive map

Let's say you're doing research that requires millions of geotagged tweets. Or perhaps you're a journalist who wants to map murders in Chicago from 2001 to the present. You need to find large spatio-temporal datasets -- but where? 

While there are hundreds of publicly available datasets, locating them can take months of searching. When potential sources are found, they rarely provide enough information for a researcher to decide if the set actually contains the kind of data they need without downloading the often huge file and sorting through it first.

Thanks to a computer scientist at the University of California, Riverside, finding the right dataset is now as easy as bookmarking a website, and it costs absolutely nothing.  {module In-article}

Ahmed Eldawy, an assistant professor of computer science in the Marlan and Rosemary Bourns College of Engineering, and his group spent the last three years combing the internet for public spatio-temporal datasets, studying their attributes, and summarizing the results for each set on interactive maps that show the user exactly what they're getting.

"People who work on data science need datasets but can spend a lot of time finding them," Eldawy said. "I wanted to build an archive they can find easily."

Called the UCR Spatio-temporal Active Repository, or UCR STAR, the archive is made available as a service to the research community to provide easy access to large spatio-temporal datasets through an interactive exploratory interface. Users can search and filter those datasets as if shopping for their research, except that everything is free. 

"The map interface visualizes the data, so you can see if it's a good fit," Eldawy said. "It's like a catalog for datasets."

At the heart of UCR STAR, the map provides an interactive exploratory interface for the dataset. Similar to Google Maps or other web maps, users can zoom in and out and pan around to get a quick overview of the data distribution, coverage, and accuracy. 

Important details are displayed once a dataset is selected, such as the original homepage, a link to the original download source, size in bytes, number of records, file format, and other useful information. The subset download feature allows users to quickly download the data in a given geographical region, which reduces the download size. They can also embed their customized view on a webpage or share the link via social media and bookmark it to revisit later.

UCR STAR contains 102 datasets and 5 billion records. The datasets were mapped using Da Vinci, an open source framework built on top of Apache Spark that Eldawy designed to work with spatial data. The UCR STAR website is best accessed through a desktop browser but also has a limited mobile-friendly interface.

IBM Research Australia’s review evaluates how AI could boost the success of clinical trials

In a review publishing July 17 in the journal Trends in Pharmacological Sciences, researchers examined how artificial intelligence (AI) could affect drug development in the coming decade.

Big pharma and other drug developers are grappling with a dilemma: the era of blockbuster drugs is coming to an end. At the same time, adding new drugs to their portfolios is slow and expensive. It takes on average 10-15 years and $1.5-2B to get a new drug to market; approximately half of this time and investment is devoted to clinical trials. 

Although AI has not yet had a significant impact on clinical trials, AI-based models are helping trial design, AI-based techniques are being used for patient recruitment, and AI-based monitoring systems aim to boost study adherence and decrease dropout rates.  {module In-article}

"AI is not a magic bullet and is very much a work in progress, yet it holds much promise for the future of healthcare and drug development," says lead author and computer scientist Stefan Harrer, a researcher at IBM Research-Australia. 

As part of the review and based on their research, Harrer and colleagues reported that AI can potentially boost the success rate of clinical trials by:

  • Efficiently measuring biomarkers that reflect the effectiveness of the drug being tested
  • Identifying and characterizing patient subpopulations best suited for specific drugs. Less than a third of all phase II compounds advance to phase III, and one in three phase III trials fail-not because the drug is ineffective or dangerous, but because the trial lacks enough patients or the right kinds of patients.
  • Start-ups, large corporations, regulatory bodies, and governments are all exploring and driving the use of AI for improving clinical trial design, Harrer says. "What we see at this point are predominantly early-stage, proof-of-concept, and feasibility pilot studies demonstrating the high potential of numerous AI techniques for improving the performance of clinical trials," Harrer says.

The authors also identify several areas showing the most real-world promise of AI for patients. For example:

  • AI-enabled systems might allow patients more access to and control over their personal data.
  • Coaching via AI-based apps could occur before and during trials.
  • AI could monitor individual patients' adherence to protocols continuously in real time.
  • AI techniques could help guide patients to trials of which they may not have been aware 
  • In particular, Harrer says, the use of AI in precision-medicine approaches, such as applying technology to advance how efficiently and accurately professionals can diagnose, treat and manage neurological diseases, is promising. "AI can have a profound impact on improving patient monitoring before and during neurological trials," he says. 

The review also evaluated the potential implications for pharma, which included:

  • Computer vision algorithms that could potentially pinpoint relevant patient populations through a range of inputs from handwritten forms to digital medical imagery.
  • Applications of AI analysis to failed clinical trial data to uncover insights for future trial design.
  • The use of AI capabilities such as Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP) for correlating large and diverse data sets such as electronic health records, medical literature, and trial databases to help pharma improve trial design, patient-trial matching, and recruiting, as well as for monitoring patients during trials. 

The authors also identified several important takeaways for researchers: 

  • "Health AI" is a growing field connecting medicine, pharma, data science and engineering.
  • The next generation of health-related AI experts will need a broad array of knowledge in analytics, algorithm coding and technology integration.
  • Ongoing work is needed to assess data privacy, security and accessibility, as well as the ethics of applying AI techniques to sensitive medical information. 

Because AI methods have only begun to be applied to clinical trials in the past 5 to 8 years, it will most likely be another several years in a typical 10- to 15-year drug-development cycle before AI's impact can be accurately assessed.

In the meantime, rigorous research and development is necessary to ensure the viability of these innovations, Harrer says. "Major further work is necessary before the AI demonstrated in pilot studies can be integrated in clinical trial design," he says. "Any breach of research protocol or premature setting of unreasonable expectations may lead to an undermining of trust-and ultimately the success-of AI in the clinical sector."

$4.6 million award creates program to train cybersecurity professionals

Program will address a shortage of highly skilled cybersecurity professionals and train students for careers in government agencies

A five-year, $4.63 million award from the National Science Foundation will enable a multi-disciplinary team of researchers at the University of Arkansas to recruit, educate and train the next generation of cybersecurity professionals.

The program will provide the knowledge and tools necessary to protect network and computer systems in three critical industries - cybersecurity, transportation security, and critical infrastructure security.

"The federal agencies that support these industries - all critical to our nation's security and economic health - understand that new cybersecurity challenges are met with an increasingly insufficient security workforce," said Jia Di, professor of computer science and computer engineering and principal investigator for the program. "But people at these agencies also understand that our university, with its specific research strengths, is uniquely positioned to expand the pool of highly skilled professionals who can address these challenges." {module In-article}

The "Cyber-Centric Multidisciplinary Security Workforce Development" program will draw on faculty research expertise in the departments of Computer Science and Computer Engineering, Electrical Engineering and Industrial Engineering. Faculty members will design curriculum focused on cybersecurity in the areas of computer and information systems, transportation and critical infrastructure with specific focus on the electrical power grid. The program will provide job training and research opportunities for graduate and undergraduate students, and all students will be offered internships at government agencies, where additional training could lead to job placement.

The program will focus on attracting students from underrepresented populations and will partner with Northwest Arkansas Community College to open paths for its students to pursue bachelors' and advanced degrees at the university.

The program will address a national shortage of a highly skilled cybersecurity professionals. Over a one-year period, from September 2017 to August 2018, for example, there were more than 300,000 open cybersecurity jobs in the United States, Di said. Professionals at these companies cited lack of education as the reason for this shortage. To qualify for these jobs, students must understand not only computer systems, networks and software, but also data storage protection, cryptography, malware and software vulnerabilities, as well as the nature of cyber-crimes and other threats to infrastructure.

THE PROGRAM

Led by the Arkansas Security Research and Education Institute (ASCENT), which Di directs, the "Cyber-Centric Multidisciplinary Security Workforce Development"program will include investigators affiliated with several U of A research centers - Center for Information Security and Reliability, Mack-Blackwell Transportation Center and Cybersecurity Center for Secure Evolvable Energy Delivery Systems. Students will conduct research at these centers.

Co-principal investigators for the program are Brajendra Panda, professor of computer science and computer engineering; Alan Mantooth, Distinguished Professor of electrical engineering; Dale Thompson, associate professor of computer science and computer engineering; and Chase Rainwater, associate professor of industrial engineering.