Deep in the waters of the Indian Ocean is a system of underwater radar discreetly listening to its environment. These “listening stations”—intended, in theory, to monitor government activity—are shrouded in mystery; it’s possible they aren’t even real. But when Jim Sykes first heard about them, he was struck by the message the very idea of them sent.
“It’s very hard to confirm that these radars exist. Mostly it’s just rumors and promises, though probably some do exist,” says Sykes, an associate professor in the Department of Music in Penn’s School of Arts & Sciences. “But by announcing the intent to use this kind of system, one government is signaling to another, ‘We’re listening to you.’”
Hearing, rather than seeing, is the surveillance medium of choice, he adds, showing that in some cases, “governments believe sound is a better way to access the truth.”
Today, across Penn’s School of Arts & Sciences (SAS), a similar sentiment about the power of sound fuels researchers and students in disciplines from music history to neurobiology who are using it as a lens to understand the world. They’re tracking the movements of fin whales and the behaviors of songbirds. They’re building algorithmic ears and musical video games. And through it all, they’re demonstrating that sound is central to language, communication, history, and culture.
Most of their work fits collectively under the umbrella of “sound studies,” a term first coined in Jonathan Sterne’s 2003 book “The Audible Past” to describe a field that seeks to understand how sound shapes society. The book set off a flurry of interest in this space, Sykes says. And its publication happened to coincide with a new ability to digitize sound, making it ubiquitous, storable, and shareable. That environment amplified opportunities to look at sounds as worthy of study in their own right.
The ‘fundamental’ importance of sound
To study sounds, they must first be accessible. In 2011, that mission drove Al Filreis, Kelly Family Professor of English, to co-found the poetry archive PennSound.
Working with Charles Bernstein, a poet and professor emeritus at Penn, Filreis reached out to living poets and estate managers for poets who had died, asking to digitize any recordings they could get their hands on—sometimes trying for years, like they did with acclaimed 20th-century poet John Ashbery. Shortly before Ashbery died in 2017, his husband, David Kermani, finally agreed. Bernstein met Kermani at a diner, where he showed up with two plastic grocery bags full of cassette tapes—the only existing copies of Ashbery reading his works until PennSound digitized them.
So far, no one has said no to such records of their work, according to Filreis. And after 15 years, PennSound is the world’s largest audio poetry archive, comprising 80,000 individual files from more than 700 poets—all of them downloadable, for free.
The files in the archive are largely from live poetry readings, making them historical artifacts that retain not just the poem itself but also the sounds of a particular moment in time, down to the clinking of glasses at a bar. Filreis says it’s upended the way poetry can be taught.
When he was a student, for example, he remembers learning Allen Ginsberg’s iconic 1956 poem “America” by reading it from the thin pages of a bulky poetry anthology. It was in the same font as poets spanning centuries and styles. “It looked like every other poem that had ever been written,” Filreis says. But even as Filreis struggled with the words on paper, recordings of Ginsberg himself reading the poem—one a live version from 1955, upbeat and mischievous and before the poem was actually finished, another from a 1959 studio session, sorrowful and pleading—sat in the basement of a San Francisco poetry center.
Instead of that thin anthology page, Filreis’s students learn the poem by listening to both recordings. “They’re two completely different understandings of the tone and his relationship with America,” Filreis says. “There are still people thinking sound files are fun add-ons. I would contest that; they are fundamental.”
Exposure to music
Mary Channen Caldwell knows from experience that sound is fundamental. Caldwell, an associate professor in the Department of Music, teaches a music class for students who aren’t majoring or minoring in music. Called 1,000 Years of Musical Listening, the course begins in the year 476 with the earliest notated Western music and ends with contemporary composers, including some currently working at Penn, like Natacha Diels, assistant professor of music; Anna Weesner, Robert Weiss Professor of Music; and Tyshawn Sorey, Presidential Assistant Professor of Music.
Caldwell teaches the class almost entirely through sound. In the classroom and at home, students listen to the chants of medieval monks, French trouvère songs from the 13th century, operas from the 18th century. They must attend two concerts—the first time some of them have experienced live music—and Caldwell pushes the students to interrogate the ways in which historical prejudices and biases influence which music exists today.
The course’s textbook, for example, largely features white, male composers. Caldwell complements this narrative with extra readings that dive into music written and published by women and people of color, sometimes under pseudonyms out of necessity. “Knowing that is important and helps you be a better consumer of cultural products,” Caldwell says.
Exposure to music can also encourage the development of language itself. That may be, in part, because language has its own melodies and rhythms. And those melodies and rhythms, known as prosody, can change how people interpret and understand words, says Jianjing Kuang, an associate professor in the Department of Linguistics. “We’re like composers in real time,” she says.
Adults with greater musical proficiency often excelled at language and reading in childhood; Kuang wants to understand why the brain links the two, in the hopes of creating music education programs that can support language development.
To probe this question, Kuang, her graduate students, and several undergrads developed a video game featuring a musical elephant named Sonic. Children playing the game press keys along with Sonic’s rhythm or guess which instrument Sonic is playing. During the summer of 2023, Kuang brought laptops loaded with the game to a camp and scored how the children did. She and her students then tested the children’s language abilities by asking them to act out and guess emotions.
Early results suggest that younger children who do better with the musical games also have higher language abilities. But in older children—around age seven—the differences are less stark, suggesting that the brain finds multiple ways to get to the same end result of language. “There is not one path to achieve language development,” Kuang says. After all, she adds, some people are nearly “tone deaf,” unable to sing or identify a pitch, but can speak just fine.
“Language is so central in our daily life,” Kuang says. “If you want to understand our social behavior or cognition, language and speech are core.”
Language development and the cowbird
Sometimes this is true even for animals, like the songbirds biology professor Marc Schmidt studies. Research has shown that unlike most animals, songbirds learn to vocalize in similar ways as people do, by listening to the adults around them and receiving audio feedback from their own speechlike sounds. That makes them an excellent model for understanding the neural pathways behind speech development, Schmidt says.
Songbirds—a category that includes crows, cardinals, cowbirds, warblers, and robins, among others—make up almost half of all bird species on Earth. All birds vocalize to some degree, but songbirds use more complex patterns called songs.
In an outdoor “smart” aviary behind the Pennovation Center, Schmidt studies the songs and behaviors of 15 brown-headed cowbirds. One song in the male cowbird’s repertoire is nearly identical each time it’s sung. In fact, it’s one of the most precise behaviors in the entire animal kingdom, Schmidt says. “If you want to understand how the brain controls behavior, you want to choose a behavior that you can really rely on,” he adds.
The song is exceedingly complex to produce. Songbirds vocalize by pushing air through an organ called a syrinx, which sits at the juncture of the trachea, splitting into a “V” shape and connecting with the left and right lung. By alternating airflow between the two sides, the songbirds produce intricate, continuous notes. The male cowbird can switch sides every 25 milliseconds, indicating “incredible neuromuscular control,” Schmidt says.
Some birds are better at this than others, and Schmidt has found that female cowbirds pick up on quality. A female listening to recordings of the song will dip into a dramatic posture of approval, tilting her chest forward and raising her head, wings, and tail feathers—but only if she decides the song is coming from a worthy male. The male cowbird, on the other hand, changes his posture depending on whether his audience is another male or a female, puffing his feathers and spreading his wings for the former, holding completely still for the latter.
By recording activity from the muscles involved in breathing, Schmidt discovered that staying still actually requires more energy for these birds. These subtle behavioral changes suggest that when the brain processes information about sound, it takes in more than just basic features such as frequency and loudness. It also considers the social context in which the sound is heard.
“You cannot understand behavior in a vacuum,” Schmidt says. “You have to understand it within the behavioral ecology of the animals.”
Using sound to understand how we learn
To study this behavior in a more natural context, Schmidt has partnered with colleagues to develop a chip that could be implanted in a bird’s brain to record activity continuously. One of his collaborators is Vijay Balasubramanian, Cathy and Marc Lasry Professor in the Department of Physics and Astronomy.
Balasubramanian, a theoretical physicist with a curiosity about biology, once wanted to understand learning so he could build more intelligent machines. Today his interests are more fundamental: He wants to understand how animal brains learn from their environment, and he is using sound to do it.
Animals, Balasubramanian points out, learn incredibly efficiently. A brain runs on about 20 watts of power—less than a single living room lightbulb. Training one large language model of the kind on which ChatGPT runs, on the other hand, requires more power than 1,000 households use in an entire year. What’s more, a brain can learn largely on its own, whereas an artificial intelligence model typically requires humans to manually label objects first.
How brains manage this self-directed learning is one of the big questions in behavioral and neural science, Balasubramanian says. In a crowded room filled with noise, a person’s brain can pick out distinct sounds—say, her name—from the background cacophony. This phenomenon is known as the “cocktail party” effect. But how does the brain actually distinguish one sound “object” from another?
Balasubramanian theorizes that a sound object can be defined by two features that change over time: coincidence and continuity. Coincidence describes features of an object that appear together. In a visual object, such as a book, coincidence would describe how the edges of the front and the back covers coexist. Continuity describes how the edges move in concert if the book gets flipped over. For a sound, temporal continuity might mean the way pitch changes as a person speaks.
To understand how the brain uses this information to identify a sound, Balasubramanian, postdoctoral fellow Ronald DiTullio and Linran “Lily” Wei are studying vocalizations made by people, a type of monkey called a macaque, and several species of birds, including the blue jay, American yellow warbler, house finch, great blue heron, song sparrow, cedar waxwing, and American crow.
Wei and DiTullio developed a machine learning model that mimics the cochlea, the hair-lined cavity of the inner ear that makes the first step in auditory processing: The hairs vibrate in response to sounds, sending signals along a nerve directly to the brain. Instead, Wei and DiTullio use an algorithm to extract certain features from the responses of their model cochlea—changes in frequency, for example—then a simple neural network uses those features to guess the sound’s origin.
The team has found that frequencies that change slowly over time are sufficient to distinguish among species. They can also differentiate vocalizations from individuals of the same species, as well as different types of sounds, say friendly coos or angry grunts.
To Wei, the model is a way to break down auditory processing into its most basic parts. “One thing I’ve learned from physics is that simplification is a virtue,” Wei says. “We have very complicated theories about how our auditory system might work, but there should be a simpler way.” These studies take a step in that direction, she adds.
Learning from fin whales
Even more complex than bird, macaque, or human auditory systems are those of marine mammals, who use sounds not only to communicate but also navigate. A single fin whale—the second largest whale on Earth—can make nearly 5,000 calls every day and can produce sounds up to 190 decibels, as loud as a jet taking off from an aircraft carrier.
To grasp how a fin whale might react to, say, the construction of a wind farm, researchers first need to know the animal’s typical behavior, says John Spiesberger, a visiting scholar in the Department of Earth and Environmental Science. Of course, “you can’t interview a fin whale,” he says. And spotting one visually is also serendipitous; they dive deep and can stay under for 20 minutes. Instead, researchers often track the whales’ movements by following their sounds.
The process is similar to the way a smartphone knows its location. A phone measures the difference in signal arrival times between pairs of broadcasting GPS satellites, then uses the differences to calculate its location. Similarly, researchers drop into the water receivers that pick up whale calls, then calculate the whales’ locations from the differences in the times the calls arrive at the receivers.
But the process doesn’t work exactly the same underwater as in the air. The atomic clocks that GPS satellites carry and the speed of the GPS signal both enable incredibly precise locations. In the ocean, though, it’s much murkier, in part due to inexact receiver locations and clocks that can be off by minutes. In recently published research, Spiesberger showed that estimates of a whale’s location using the GPS technique can be incorrect by up to 60 miles.
For decades, he has been developing new methods to reliably locate sounds in the face of these errors and use the improved techniques to survey whale populations. There is often no way to identify an individual whale from its call. That means 100 fin whale calls could originate from one, seven, or even 100 whales. Spiesberger’s methods yielded a surprising result: The reliable locations made it possible to compute the fewest number of whales heard.
That information could help researchers better understand how whale populations respond to environmental changes, says Joseph Kroll, a professor in the Department of Physics and Astronomy and Spiesberger’s collaborator since the 1970s, when they were both studying at the Scripps Institution of Oceanography. “It doesn’t rely on actually going out and looking for these animals to count them,” Kroll adds. “We want to know what’s happening to these populations.”
A field expanding
Spiesberger’s research is a clear demonstration of the ability to learn about a creature through the sound it makes, a principle underlying the work of just about all of the sound study happening across Penn. For Jim Sykes, the sound studies researcher, the next step is applying this principle across human cultures more broadly.
About a decade after sound studies became more established, a growing sentiment started to develop: The field had focused too much on people in the West. New branches of inquiry emerged, including Black sound studies and Indigenous sound studies. That’s the subject of Sykes’s book “Remapping Sound Studies,” an anthology he co-edited with Princeton’s Gavin Steingo that suggests a global model for studying the ways sound contributes to cultural and technological development.
Sykes conducted his dissertation research in Sri Lanka, around the time its 26-year civil war ended in 2009. He was interested in the way the sounds of war—beyond militaristic bangs of gunfire and bombs—shaped the experience of the country’s citizens.
He found that for many, sound defined both their day-to-day lives and their long-term memories of the conflict. They might notice, for example, that the playground yells of schoolchildren shifted to a different time of day or location as the education system tried to adapt to eruptions of violence. They might hear soldiers stopping cars at checkpoints, or a mother yelling her son’s name outside of a prison.
“That should be included in thinking about the sounds of war,” Sykes says. “Yet somehow, if you’re just focusing on military equipment, that gets left out.”
It’s a sobering example of the way society can be experienced, and thus understood, through sound. Sound is, after all, everywhere. It fills homes and cities, connects people to one another and the world, drives animal behavior and ecosystems. It’s one of the key ways a brain learns about the environment in which it finds itself. That means researchers tackling just about any question can potentially find a solution in sound.
This story was originally published in OMNIA magazine.