Navigating ‘information pollution’ with the help of artificial intelligence

Using insights from the field of natural language processing, computer scientist Dan Roth and his research group are developing an online platform that helps users find relevant and trustworthy information about the novel coronavirus.

There’s still a lot that’s not known about the novel coronavirus SARS-CoV-2 and COVID-19, the disease it causes. What leads some people to have mild symptoms and others to end up in the hospital? Do masks help stop the spread? What are the economic and political implications of the pandemic?

As researchers try to address many of these questions, many of which will not have a simple ‘yes or no’ answer, people are also trying to figure out how to keep themselves and their families safe. But between the 24-hour news cycle, hundreds of preprint research articles, and guidelines that vary between regional, state, and federal governments, how can people best navigate through such vast amounts of information?

Using insights from the field of natural language processing and artificial intelligence, computer scientist Dan Roth and the Cognitive Computation Group are developing an online platform to help users find relevant and trustworthy information about the novel coronavirus. As part of a broader effort by his group to develop tools for navigating “information pollution,” this platform is devoted to identifying the numerous perspectives that a single query might have, showing the evidence that supports each perspective and organizing results, along with each source’s “trustworthiness,” so users can better understand what is known, by whom, and why.

Computer science definitions

Natural language processing: Combining expertise in linguistics, computer science, and artificial intelligence, this area of study focuses on facilitating communication between humans and computers and on developing ways for computers to understand human language. Challenges being addressed by researchers in this field include natural language understanding, translation, and generation.

Machine learning: This is a subset of the field of artificial intelligence that addresses problems that a computer cannot be “programmed” to solve. Instead, algorithms are trained on large datasets and learn programs that produce the desired output, such as “this document is about sports” or “this document is written by a progressive author.”

Creating these types of automated platforms represents a huge challenge for researchers in the field of natural language processing and machine learning because of the complexity of human language and communication. “Language is ambiguous. Every word, depending on context, could mean completely different things,” says Roth. “And language is variable. Everything you want to say, you can say in different ways. To automate this process, we have to get around these two key difficulties, and this is where the challenge is coming from.”

Thanks to numerous conceptual and theoretical advances, the Cognitive Computational Group’s fundamental research in natural language understanding has allowed them to apply their research insights and to develop automated systems that can better understand the contents of human language, such as what is being written about in a news article or scientific paper. Roth and his team have been working on issues related to information pollution for many years and are now applying what they’ve learned to information about the novel coronavirus.

Information pollution comes in many forms, including biases, misinformation, and disinformation, and because of the sheer volume of information the process of sorting fact from fiction needs automated support. “It’s very easy to publish information,” says Roth, adding that while organizations like FactCheck.org, a project of Penn’s Annenberg Public Policy Center, manually verify the validity of many claims, there’s not enough human power to fact check every claim being posted on the Internet.

And fact checking alone isn’t enough to address all of the problems of information pollution, says Ph.D. student Sihao Chen. Take the question of whether people should wear face masks: “The answer to that question has changed dramatically in the past couple months, and the reason for that change is multi-faceted,” he says. “You couldn’t find an objective truth attached to that specific question, and the answer to that question is context-dependent. Fact checking alone doesn’t solve this problem because there’s no single answer.” This is why the team says that identifying various perspectives along with evidence that supports them is important.

To help address both of these hurdles, the COVID-19 search platform visualizes results that include a source’s level of trustworthiness while also highlighting different perspectives. This is different from how online search engines display information, where top results are based on popularity and keyword match and where it’s not easy to see how the arguments in articles compare to one another. On this platform, however, instead of displaying articles on an individual basis, they are organized based on the claims they make.

screenshot of penn information pollution project website, at the top is a search bar with topics including daily supply, death, diagnosis, ecology, economic implications, and more. the popular topic shown is "when is the vaccine for COVID-19 going to be available" and two perspectives are shown on the right — The landing page of the Information Pollution website. Search results are organized into three dimensions: article topic, category (such as news article or scientific study), and type (such as opinion piece or recommendation) and are grouped by a shared perspective. (Image: Penn Information Pollution Project)

“Search engines make a point not to touch the information and not to give suggestions and organize this material,” says Roth. The redundancy of information by itself is quite often misleading and leads to bias, since people tend to think that seeing something many times makes it more correct. “Here, if there are 500 articles that are saying the same thing, we cluster them together and say, ‘All these articles are quoting the same sources, so just focus on one of them. Then, these other articles are interviewing other people and making different claims, so you can sample from different clusters.’”

When visiting the website, users can enter a question, claim, or topic into the search bar, and results are grouped together based on the similarity of perspectives. Since everything is set up to be automated, the researchers are eager to share this first iteration of the platform with the community so they can improve the language-processing models. “It’s a community effort,” says Roth, adding that their platform was designed to be transparent and open source so that they can easily collaborate with others.

Chen hopes that their efforts support both the users who are interested in sorting through COVID-19 information pollution as well as fellow researchers in the field of natural language processing. “We want to help everyone who’s interested in reading news like this, and at the same time we want to build better techniques to accommodate that need,” says Chen.

Dan Roth is the Eduardo D. Glandt Distinguished Professor in the Department of Computer and Information Science in the School of Engineering and Applied Science at the University of Pennsylvania.

The online search platform is available on the Penn Information Pollution project website.

Additional information and resources on COVID-19 are available at https://coronavirus.upenn.edu/.

Credits

Writer

Erica K. Brockmeier

More from

School of Engineering & Applied Science

Computer Science

Recent Articles

Hannah Yamagata, Research Assistant Professor Kushol Gupta and postdoctoral fellow Marshall Padilla, holding 3D-printed models of nanoparticles in a lab.

Natural Sciences

Nanoparticle blueprints reveal path to smarter medicines

New research involving Penn Engineering shows detailed variation in lipid nanoparticle size, shape, and internal structure, and finds that such factors correlate with how well they deliver therapeutic cargo to a particular destination.

Jin Liu, Penn’s newest economics faculty member, specializes in international trade.

nocred

Social Sciences

Economics is ‘a unique way of thinking’ to see through the mystery of daily life

Economist Jin Liu focuses on international trade and industrial organization in her research.

Campus & Community

Monumental sculpture celebrated on Penn’s campus

A generous gift from alumni Glenn and Amanda Fuhrman brings the work of internationally acclaimed artist Jaume Plensa to the University of Pennsylvania. The latest addition to the Penn Art Collection expands Philadelphia's public art.

A researcher walking through a glacier in Greenland.

Natural Sciences

A massive chunk of ice, a new laser, and new information on sea-level rise

For nearly a decade, Leigh Stearns and collaborators aimed a laser scanner system at Greenland’s Helheim Glacier. Their long-running survey reveals that Helheim’s massive calving events don’t behave the way scientists once thought, reframing how ice loss contributes to sea-level rise.

Share this article