Teaching computers how to learn

The goal of machine learning is to make a computer learn just like a baby—it should get better at tasks with experience. In fact, computers best humans at tasks like chess and numerical calculations. But there are certain aspects of human intelligence that machines cannot mirror.

Ben Taskar, the Magerman Term Assistant Professor in the Department of Computer and Information Science, says one of the biggest stumbling blocks in terms of artificial intelligence is the fact that computers learn slower than children.

A toddler learns what a car is by someone pointing to an automobile and saying several times, “car.” Computers learn what a car is by a person inputting thousands of images of a “car” from different viewpoints and of different shapes. The same goes for other common objects, such as dogs.

As part of the recent trend toward bridging perception and language through machine learning, Taskar and his colleagues are attempting to teach a computer to look at a video and answer the Five Ws: Who? What? When? Where? Why?

“The five questions reporters ask,” he says. “We want to ask that of an image or a video, and then get the computer to answer.”

Using novel learning algorithms that combine audio, video and text streams, Taskar and his research team are teaching computers how to recognize faces and voices in videos. Their system recognizes when someone in the video or audio mentions a name, whether he or she is talking about himself or herself, or whether he or she is talking about someone in the third person. It then maps that correspondence between names and faces and names and voices.

“An intelligent system needs to understand more than just visual input, and more than just language input or audio or speech. It needs to integrate everything in order to really make any progress,” Taskar says.

The information Taskar’s team feeds into the system is free training data harvested from the internet. Attempts to teach computers visual recognition in the pre-internet age were hampered in large part by a lack of training content. Today, Taskar says, the internet provides a “massive digitization of knowledge.” People post videos, comments, blogs, music, scripts and critiques about their favorite things and interests.

Take, for example, the hit ABC show “Lost.” Fans of the show flock to websites like Lostpedia or Lost.com and write reviews about the show, post comments or play games. Some fanatics post scripts from the show online.

As Tasker’s team feeds more data about “Lost” into the computer, such as video clips, scripts or blogs, the system improves at identifying people in the video. If, for example, a clip contains footage of characters Kate and Anna Lucia, after being taught, the computer will recognize their faces.

“The alogorithm is learning this from what people say, or from screenplays as well,” Taskar adds. “The screenplay doesn’t tell you who is who, but it tells you there’s a scene with [two characters] talking to each other.”

Taskar says the information the research has produced can be helpful in many ways, particularly in searching videos for content. Currently, if a father is searching for a photo of his daughter playing with the family dog in his gigabytes of photos and videos on his hard drive, unless the photo is tagged “daughter playing with dog,” chances are he isn’t going to be able to find it.

The system does not yet function in real time and, Taskar says, computers are still quite far from recognizing common objects. Although computers have proved capable of recognizing people and detecting a small number of actions, Taskar’s team would like to get to the point where computers can identify 10,000 different actions and 10,000 different common objects.

Credits

Writers

Greg Johnson

Recent Articles

People gather around a large map placed on the floor.

Global

From a desert to an oasis: Penn engages in ambitious greening effort in the Sahel

Students from the Weitzman School of Design journeyed to Senegal to help with a massive ecological and infrastructural greening effort as part of their coursework. The Dakar Greenbelt aims to combat desertification and promote sustainable urban growth.

People looking at the After Modernism exhibit at the Arthur Ross Gallery.

Arts, Humanities, & Social Sciences

The practice of art collection as a collaboration

As part of an undergraduate course, Penn faculty and students curated an Arthur Ross Gallery exhibition of works from the Neumann family’s extensive collection of modern and contemporary art.

Scientists holding a model of something (forthcoming)

Campus & Community

Penn Center for Innovation celebrates 10 years

The University’s nexus for technology transfer supports researchers in their innovative efforts, from CAR T to mRNA advancements that have dramatically reshaped the world.

The exterior of the Vagelos building lit up with dramatic lighting.

Technology

An illuminating celebration to a brighter, greener future

Members of the Penn community celebrated an energy research milestone: the unveiling of the new Vagelos Laboratory for Energy Science and Technology.

Share this article