BIG DATA
Big data is yielding a new understanding of family structure
Social media can do more than just entertain us and keep us connected. It also can help scientists better understand human behavior and social dynamics. The volume of data created through new technology and social media such as Facebook and Twitter is lending insight into everything from mapping modern family dynamics to predicting postpartum depression.
"By analyzing different types of social media, search terms, or even blogs, we are able to capture people's thinking, communication patterns, health, beliefs, prejudices, group behaviors – essentially everything that has ever been studied in social and personality psychology," says James Pennebaker, president of the Society of Personality and Social Psychology (SPSP), which is kicking off its annual conference today in Austin. "We can examine thousands, even hundreds of thousands of people at once or track them over time."
Pennebaker of the University of Texas-Austin, whose work has explored the power of language in revealing our personality and behavior, is chairing a session today on the opportunities enabled by big data and new technology. No longer do research psychologists need to rely on traditional experimental designs, "running one upper middle class college student at a time," he says. "We now have access to the world of social behavior in ways never imagined before."
For example, a recent study by research scientists at Facebook analyzed 400,000 Facebook posts to determine differences in how parents talk to their children versus other friends and how they address their adult versus teen children. The posts, stripped of identifiable user information, showed that children's communication with their parents decreases in frequency from age 13 on, but then rises when they move out. Counter to previous research on familial communication, they also found that being farther away from each other does not diminish how much parents and children talk on Facebook.
The study also found differences between how mothers and fathers use Facebook. Automatic language coding showed that mothers' posts showed more emotion, using phrases like "poor baby" or "so proud of," while fathers' posts were more abstract, with phrases such as "keep it up" or "got your back." Also, mothers were more likely to ask children to call them, while fathers talked more about shared interests, such as politics or sports.
"The Internet offers a tremendous opportunity to understand important social phenomena like family structure and also to help us explore how sharing information influences people's emotional states and decision-making," says Adam Kramer, a data scientist at Facebook. Kramer will be presenting such varying examples at the SPSP conference in Austin today.
Eric Horvitz, distinguished scientist and director of Microsoft Research lab in Redmond, Wash., has been analyzing data from Twitter and other online media to better understand and predict people's health and well-being. "Large-scale data analyses generate insights about people – their mood, goals, intentions, health, and well-being – over both short and long periods of time," he says.
In recent work, Horvitz and colleagues used Twitter to identify 376 new mothers who might be at risk of postpartum depression. They analyzed some 36,000 tweets during the 3 months leading up to the births and some 40,000 tweets for 3 months after the births to detect changes in mood and behavior. They looked at everything from networks of social engagement to word usage, using Pennebaker and colleagues' measures of language shifts linked to downward mood shifts. For example, one potential indicator of postpartum depression is a shift from using third-person pronouns to first-person pronouns. Other indicators include a decrease in volume of tweets, a shrinking in the moms' social networks, and use of words indicating negative mood.
Based on these factors, Horvitz's team constructed a predictive model that can forecast significant postpartum shifts in mood in new mothers, using only observations available before the births. The model can identify mothers at risk of having such dramatic mood shifts as accurately as 70%. Next, the researchers need to test their model with women who have been already diagnosed with postpartum depression.
In other recent work, Horvitz and colleagues used participant reporting, along with pattern and network analysis, to examine the onset of major depressive episodes. His team first identified about 1,500 people with depression through an online assessment tool and then gave them the option of providing their Twitter handles. The researchers were then able to look at the Twitter feeds of the approximate 630 people who opted in to identify factors that predict the onset of major depressive disorders.
The hope, Horvitz says, is to develop new public health tools by leveraging the vast data available via social media with machine learning and linguistics analysis. He is also working on projects aimed understanding how women cope with breast cancer diagnoses, by analyzing patterns among anonymized Web search logs. Other work has explored how cognitive biases interact with search engine biases to fuel phenomena such as "cyberchondria" – the rise in anxiety about rare illnesses during Web searches of common, benign symptoms.
"We have reached an intriguing moment, perhaps unprecedented, when the data available to a handful of private companies – for example, Google, Facebook, Twitter, – could in principle make enormous impacts on social science research, especially social and personality psychology," says J.B. Michel of the Institute for Quantitative Social Science at Harvard. With Erez Aiden, Michel recently used millions of books digitized by Google to build a scientific tool for measuring trends in our shared culture, history, and language going back hundreds of years.
"Never before could we in principle know so much about so many people over such long periods of time with such ease. But, these data are virtually unused in this way," Michel says. "Addressing this divide is, in my mind, a transformative opportunity for the community of researchers interested in the human experience."
Social media is not the only tool scientists have in gathering bigger data. Roxane Cohen Silver of the University of California, Irvine, has been using online surveys to study how people cope with trauma in the aftermath of disaster. "The ability to collect data online after national events is far more efficient and useful than the prior way of collecting post-disaster data from representative samples, which required collecting data by telephone using 'random digit dialing,'" Silver says. She has studied the effects of 9/11 and more recently the Boston Marathon bombing, linking repeated media exposure in the early aftermath of the disaster to greater acute stress than being directly at or near the marathon.
Now Silver, with colleague Baruch Fischhoff from Carnegie Mellon University, is planning a project using a mobile app to study communities at-risk for severe weather events. "The goal is to collect assessments of risk, thoughts, and feelings before a hurricane, during the storm, and post-disaster reactions over time," she says.
As technological capabilities grow, so too will the possibilities for research psychology. "Look around you," Pennebaker says. "Ask your friends about their reliance on electronic communication. And then start figuring out ways to harness this technology to understand the world around us."