What do ‘Bohemian Rhapsody,’ ‘Macbeth,’ and a list of Facebook friends all have in common?

New research finds that works of literature, musical pieces, and social networks have a similar underlying structure that allows them to share large amounts of information efficiently.

two network graphs showing the number of characters and network density in king lear and macbeth
Examples of statistical network analysis of characters in two of Shakespeare’s tragedies. Two characters are connected by a line, or edge, if they appear in the same scene.  The size of the circles that represent these characters, called nodes, indicate how many other characters one is connected to. The network's density relates to how complete the graph is, with 100% density meaning that it has all of the characters are connected. (Image: Martin Grandjean)

To an English scholar or avid reader, the Shakespeare Canon represents some of the greatest literary works of the English language. To a network scientist, Shakespeare’s 37 plays and the 884,421 words they contain also represent a massively complex communication network. Network scientists, who employ math, physics, and computer science to study vast and interconnected systems, are tasked with using statistically rigorous approaches to understand how complex networks, like all of Shakespeare, convey information to the human brain.   

New research published in Nature Physics uses tools from network science to explain how complex communication networks can efficiently convey large amounts of information to the human brain. Conducted by postdoc Christopher Lynn, graduate students Ari Kahn and Lia Papadopoulos, and professor Danielle S. Bassett, the study found that different types of networks, including those found in works of literature, musical pieces, and social connections, have a similar underlying structure that allows them to share information rapidly and efficiently. 

Technically speaking, a network is simply a statistical and graphical representation of connections, known as edges, between different endpoints, called nodes. In pieces of literature, for example, a node can be a word, and an edge can connect words when they appear next to each other (“my”— “kingdom”— “for”— “a”—“horse”) or when they convey similar ideas or concepts (“yellow”—“orange”—“red”). 

The advantage of using network science to study things like languages, says Lynn, is that once relationships are defined on a small scale, researchers can use those connections to make inferences about a network’s structure on a much larger scale. “Once you define the nodes and edges, you can zoom out and start to ask about what the structure of this whole object looks like and why it has that specific structure,” says Lynn. 

Building on the group’s recent study that models how the brain processes complex information, the researchers developed a new analytical framework for determining how much information a network conveys and how efficient it is in conveying that information. “In order to calculate the efficiency of the communication, you need a model of how humans receive the information,” he says.

With this analytical framework, the researchers evaluated 40 real-world communication networks to see what features were crucial for communicating information. They looked at works of English literature, including the canon of Shakespeare and Jane Austen’s “Pride and Prejudice,” along with musical pieces such as Mozart’s Sonata No 11 and Queen’s “Bohemian Rhapsody.” They also studied networks of social relationships, including co-authorship networks in science and Facebook friend connections. 

After looking at this diverse group of networks, the researchers found that the large-scale structure of a network was essential to that network’s ability to convey information. What was surprising was just how similar this structure was across the different networks, whether the network was representing noun transitions in a work of literature or melodic progressions in a piece of music. 

What makes these networks both information-rich and efficient is a balance between two key network features known as “community” structure and “heterogeneous” structure. Community structure occurs when nodes clump together and form clusters that evoke related concepts. Saying the word “dog” might bring to mind “ball,” “Frisbee,” or “bone,” for example. Such community structure helps make networks more efficient because a person can anticipate what word or idea might come next.

But if a person can anticipate what comes next, there won’t be much information conveyed because information is directly related to surprise. To provide information, networks have to have a “heterogeneous” mixture of both well-connected and sparsely connected nodes. Take the works of Shakespeare as an example. While “the” and “and” are used 28,944 and 27,317 times, respectively, there are also 12,493 word forms that only occur once. “At a hub like, ‘the,’ you can’t anticipate where you are about to go,” says Lynn. “It turns out that these hub nodes are really important for generating surprisal or, equivalently, information.”

What’s fascinating to Lynn is how the balance between heterogeneous and community structure is key for creating networks that are both information-rich but also easy to interpret. “People have studied these two structures for a long time; they are two of the foundational concepts of network science,” he says. “This study gives an explanation for why some of these networks are structured the way they are: because they are trying to communicate information efficiently. That’s what I think the coolest part is,” he says. 

The researchers will continue this work by expanding the types of communication networks they evaluate, with the goal of looking for trends across time and for differences and similarities between the works of other languages and cultures. “We are also particularly interested to delineate how efficient communication is related to error correction,” says Bassett. “Our preliminary findings suggest that real world networks help humans automatically correct for their own mistakes.” 

Danielle Bassett is the J. Peter Skirkanich Professor in the departments of Bioengineering and Electrical and Systems Engineering in the School of Engineering and Applied Science at the University of Pennsylvania. She also has appointments in the Department of Physics and Astronomy in the School of Arts & Sciences and the departments of Neurology and Psychiatry in Penn’s Perelman School of Medicine

Ari Kahn is a Ph.D. candidate in neuroscience in the Perelman School of Medicine at the University of Pennsylvania.  

Christopher Lynn is a postdoctoral research associate in the Department of Physics and Astronomy at the University of Pennsylvania.

Lia Papadopoulos is a graduate student in the Department of Physics and Astronomy at the University of Pennsylvania.

This research was primarily supported by the U.S. Army Research Office through DCIST-W911NF-17-2-0181 and by the National Science Foundation through a CAREER grant, PHY-1554488.