Navigating ‘information pollution’ with the help of artificial intelligence

Using insights from the field of natural language processing, computer scientist Dan Roth and his research group are developing an online platform that helps users find relevant and trustworthy information about the novel coronavirus.

There’s still a lot that’s not known about the novel coronavirus SARS-CoV-2 and COVID-19, the disease it causes. What leads some people to have mild symptoms and others to end up in the hospital? Do masks help stop the spread? What are the economic and political implications of the pandemic?

As researchers try to address many of these questions, many of which will not have a simple ‘yes or no’ answer, people are also trying to figure out how to keep themselves and their families safe. But between the 24-hour news cycle, hundreds of preprint research articles, and guidelines that vary between regional, state, and federal governments, how can people best navigate through such vast amounts of information?

Using insights from the field of natural language processing and artificial intelligence, computer scientist Dan Roth and the Cognitive Computation Group are developing an online platform to help users find relevant and trustworthy information about the novel coronavirus. As part of a broader effort by his group to develop tools for navigating “information pollution,” this platform is devoted to identifying the numerous perspectives that a single query might have, showing the evidence that supports each perspective and organizing results, along with each source’s “trustworthiness,” so users can better understand what is known, by whom, and why.

Computer science definitions

Natural language processing: Combining expertise in linguistics, computer science, and artificial intelligence, this area of study focuses on facilitating communication between humans and computers and on developing ways for computers to understand human language. Challenges being addressed by researchers in this field include natural language understanding, translation, and generation.

Machine learning: This is a subset of the field of artificial intelligence that addresses problems that a computer cannot be “programmed” to solve. Instead, algorithms are trained on large datasets and learn programs that produce the desired output, such as “this document is about sports” or “this document is written by a progressive author.”

Creating these types of automated platforms represents a huge challenge for researchers in the field of natural language processing and machine learning because of the complexity of human language and communication. “Language is ambiguous. Every word, depending on context, could mean completely different things,” says Roth. “And language is variable. Everything you want to say, you can say in different ways. To automate this process, we have to get around these two key difficulties, and this is where the challenge is coming from.”

Thanks to numerous conceptual and theoretical advances, the Cognitive Computational Group’s fundamental research in natural language understanding has allowed them to apply their research insights and to develop automated systems that can better understand the contents of human language, such as what is being written about in a news article or scientific paper. Roth and his team have been working on issues related to information pollution for many years and are now applying what they’ve learned to information about the novel coronavirus.

Information pollution comes in many forms, including biases, misinformation, and disinformation, and because of the sheer volume of information the process of sorting fact from fiction needs automated support. “It’s very easy to publish information,” says Roth, adding that while organizations like FactCheck.org, a project of Penn’s Annenberg Public Policy Center, manually verify the validity of many claims, there’s not enough human power to fact check every claim being posted on the Internet.

And fact checking alone isn’t enough to address all of the problems of information pollution, says Ph.D. student Sihao Chen. Take the question of whether people should wear face masks: “The answer to that question has changed dramatically in the past couple months, and the reason for that change is multi-faceted,” he says. “You couldn’t find an objective truth attached to that specific question, and the answer to that question is context-dependent. Fact checking alone doesn’t solve this problem because there’s no single answer.” This is why the team says that identifying various perspectives along with evidence that supports them is important.

To help address both of these hurdles, the COVID-19 search platform visualizes results that include a source’s level of trustworthiness while also highlighting different perspectives. This is different from how online search engines display information, where top results are based on popularity and keyword match and where it’s not easy to see how the arguments in articles compare to one another. On this platform, however, instead of displaying articles on an individual basis, they are organized based on the claims they make.

screenshot of penn information pollution project website, at the top is a search bar with topics including daily supply, death, diagnosis, ecology, economic implications, and more. the popular topic shown is "when is the vaccine for COVID-19 going to be available" and two perspectives are shown on the right — The landing page of the Information Pollution website. Search results are organized into three dimensions: article topic, category (such as news article or scientific study), and type (such as opinion piece or recommendation) and are grouped by a shared perspective. (Image: Penn Information Pollution Project)

“Search engines make a point not to touch the information and not to give suggestions and organize this material,” says Roth. The redundancy of information by itself is quite often misleading and leads to bias, since people tend to think that seeing something many times makes it more correct. “Here, if there are 500 articles that are saying the same thing, we cluster them together and say, ‘All these articles are quoting the same sources, so just focus on one of them. Then, these other articles are interviewing other people and making different claims, so you can sample from different clusters.’”

When visiting the website, users can enter a question, claim, or topic into the search bar, and results are grouped together based on the similarity of perspectives. Since everything is set up to be automated, the researchers are eager to share this first iteration of the platform with the community so they can improve the language-processing models. “It’s a community effort,” says Roth, adding that their platform was designed to be transparent and open source so that they can easily collaborate with others.

Chen hopes that their efforts support both the users who are interested in sorting through COVID-19 information pollution as well as fellow researchers in the field of natural language processing. “We want to help everyone who’s interested in reading news like this, and at the same time we want to build better techniques to accommodate that need,” says Chen.

Dan Roth is the Eduardo D. Glandt Distinguished Professor in the Department of Computer and Information Science in the School of Engineering and Applied Science at the University of Pennsylvania.

The online search platform is available on the Penn Information Pollution project website.

Additional information and resources on COVID-19 are available at https://coronavirus.upenn.edu/.

Credits

Writer

Erica K. Brockmeier

More from

School of Engineering & Applied Science

Computer Science

Recent Articles

People gather around a large map placed on the floor.

Global

From a desert to an oasis: Penn engages in ambitious greening effort in the Sahel

Students from the Weitzman School of Design journeyed to Senegal to help with a massive ecological and infrastructural greening effort as part of their coursework. The Dakar Greenbelt aims to combat desertification and promote sustainable urban growth.

People looking at the After Modernism exhibit at the Arthur Ross Gallery.

Arts, Humanities, & Social Sciences

The practice of art collection as a collaboration

As part of an undergraduate course, Penn faculty and students curated an Arthur Ross Gallery exhibition of works from the Neumann family’s extensive collection of modern and contemporary art.

Scientists holding a model of something (forthcoming)

Campus & Community

Penn Center for Innovation celebrates 10 years

The University’s nexus for technology transfer supports researchers in their innovative efforts, from CAR T to mRNA advancements that have dramatically reshaped the world.

The exterior of the Vagelos building lit up with dramatic lighting.

Technology

An illuminating celebration to a brighter, greener future

Members of the Penn community celebrated an energy research milestone: the unveiling of the new Vagelos Laboratory for Energy Science and Technology.

Share this article