How researchers scrub Twitter for health data from real humans—not bots

Twitter is noisy—which makes it a perfect tool for bioinformatics experts like Graciela Gonzalez-Hernandez, who study language to help improve health outcomes. The platform has endless amounts of content posted over time for researchers to track people’s behavior or spot trends in medicine.

drawing of a large computer “bot” placing a text bubble on a giant smartphone and a human seated beneath the phone placing a text bubble below it on the smartphone.

The problem is, it’s hard to get to.

“There is tremendous value in social media, but mining it has its challenges,” says Gonzalez-Hernandez, an associate professor of informatics in the Department of Biostatistics, Epidemiology, and Informatics in the Perelman School of Medicine, and director of the Health Language Processing Center. “How do you find the right information?”

And by right, she also means real. For all its perks, Twitter is loaded with “bots” that every second of every day push out messages, both nefarious and legitimate, that researchers don’t want to analyze. Bots are software applications that run automated tasks, like generating messages using algorithms or aggregating related content from real accounts and sharing it under another one to perpetuate a particular idea.

For more than 10 years, Gonzalez-Hernandez has been studying natural language across social media to inform clinical care in work that’s funded through the National Library of Medicine and the National Institute of Allergy and Infectious Diseases.

The approach has led her and her team to discover information about pregnant women, including finding comments by mothers with children with birth defects that could potentially help researchers understand the cause behind these defects. It’s valuable insight for patient education and physicians looking to better communicate the risks and benefits of certain drugs, and why working towards more effective social media tools is so important. A larger sample of clean tweets, or any kind of data pulled from social networks, will only help researchers hear real patients’ voices and strengthen the science.

Read more at Penn Medicine News.