Reclaiming a fragmented history

Digital humanities scholars are orchestrating an epic crowdsourcing effort to sort and transcribe handwriting on thousands of documents discarded hundreds of years ago, known as the Cairo Geniza.

The Penn Libraries team reviews unique keyboards created for the Scribes of the Cairo Geniza crowdsourcing website to aid in deciphering text on the ancient fragments. From left: Laura Eckstein, Judaica digital humanities coordinator; Laurie Allen, director of digital scholarship; Yonatan Gutenmacher, rising sophomore and summer intern; and Amey Hutchins, a manuscript cataloging librarian.

Harnessing the power of human cooperation, digital humanities scholars at Penn Libraries are orchestrating an epic effort to sort and transcribe handwriting on thousands of documents discarded hundreds of years ago.

Through an innovative new website built by the Penn team in collaboration with Zooniverse, an online platform for crowdsourced research, citizen scholars can help analyze the digitized texts, which are written in five Hebrew and three Arabic scripts, some of them exceedingly rare.

Known as the Cairo Geniza, the 350,000 fragments of paper and parchment can be anything from the most holy religious manuscripts to the most mundane legal forms, holding endless opportunities to learn about medieval life in the Middle East.

Penn Libraries, which holds about 650 of the fragments, is coordinating with universities and other institutions in the public crowdsourced  project, Scribes of the Cairo Geniza.

“To get documents from a thousand years ago to be scrutinized by thousands of people from all around the world for the purposes of adding to the sum of human knowledge is innovative, pioneering, and global in scope,” says Will Noel, associate university librarian and director of the Kislak Center for Special Collections, Rare Books and Manuscripts. “The notion that you can actually get the world to help unlock the secrets of a massive dataset of fragments is incredibly exciting.”

Working with the research platform Zooniverse, the Libraries digital humanities team created the Scribes of the Cairo Geniza crowdsourcing site and launched the first phase last year. The second phase, to transcribe text on the sorted fragments, begins at the end of June.

The Scribes of the Cairo Geniza website allows the public to view digital images of the fragments and go through steps to decipher the language of the script.

The first phase, launched last August, challenged people to determine whether the script on a fragment is in Hebrew, Arabic, or both, and asked a few other sorting questions, such as whether the script was formal or informal.

So far, more than 30,000 fragments have been classified, each one scrutinized at least five times, meaning that volunteers came to the site more than 180,000 times to help classify the text. Nearly 3,500 people from around the globe have registered on the site. 

A second phase, which launches in late June, allows people to attempt to transcribe the text in the fragments that have been classified. The goal is that someday the transcriptions will enable readers to search the texts, opening up learning about the Geniza to a broader audience.

“My hope is that this project will not only serve the cause of research and discovery but it also will provide unprecedented opportunities for people to learn to read seemingly illegible texts, and to give everyone the opportunity to unlock and access this great chamber of handwritten medieval manuscript documents,” says Arthur Kiron, the Schottenstein-Jesselson Curator of Judaica Collections at Penn Libraries and a Penn adjunct assistant professor of history.

From Biblical texts to recipes

The Cairo Geniza is a colossal collection of scraps of paper and parchment discarded, yet preserved, in the attic storage chamber of the Ben Ezra synagogue in old Cairo over the course of a thousand years, the result of a Jewish tradition that dictates nothing be destroyed that includes God’s name. Most of the documents date to the 10th-through-13th centuries CE.

A shopping list, including fish and saffron, in Judeo-Arabic script (Arabic written in Hebrew script), in the Libraries collection.

Broadly, the fragments can be divided into two parts—the literary and the documentary, Kiron says. The literary, about 90 percent, include Biblical, rabbinic, and liturgical texts. The remaining 10 percent are documentary, dealing with social, political, and economic aspects of life, including letters, contracts, even recipes.

“It’s about people's everyday lives. That’s what makes it interesting: recipes with everyday ingredients like saffron and fish, letters, contracts, religious texts with different customs,” says Laura Newman Eckstein, the Libraries’ first Judaica Digital Humanities Coordinator. “Their ordinary becomes our extraordinary.”

The fragments are held by universities, institutions, and private collectors around the world, and the majority are digitized. Penn partners in the first phase of the Scribes of Geniza project include the Library of the Jewish Theological Seminary in New York and the Cambridge University Library in England, the two institutions that hold most of the world’s known fragments, as well as the Princeton Geniza Lab. Other partners include the University of Haifa in Israel, the University of Manchester library and the Bodleian Libraries at University of Oxford, in England.

The Cairo Geniza is widely recognized as the most important documentary source for reconstructing the social, economic, political, and religious lives of Jews and other inhabitants of the pre-modern Mediterranean basin, says Kiron. Yet its secrets remain largely undeciphered.

“The fact that here we are dealing with discarded texts is what makes this particularly fascinating,” Noel says. “Normally what we are dealing with is quite obviously texts people have gone out of their way to preserve. Now we are dealing with texts that people have deliberately sought not to preserve.”

The manuscripts include all sorts of bureaucratese and other records that are not usually represented in libraries and archives, Noel adds. “So they do give us insights, and on a scale that is simply colossal.”

From the Libraries collection, part of a Haggadah for Passover, with directions in Judeo-Arabic script.

Penn’s 650 fragments are held primarily at the Herbert D. Katz Center for Judaic Studies, but some are also at the Penn Museum and the Kislak Center. Especially notable is one of the world’s two earliest known fragments of the Haggadah, the Passover liturgical service, from Cairo in the first quarter of the 11th century, says Kiron, who also is director of the Katz Center library.

The Penn collection also includes a mid-12th century list of goods including gold jewelry, bronze vessels, china wares and perfumes traded by an itinerant medieval Jewish merchant named Ben Yiju, who traveled between India and Egypt.

Scribes of the Cairo Geniza

In 2016, Eckstein and Laurie Allen, the Penn Libraries director of digital scholarship, responded to a call for humanities projects by Zooniverse, known more for its crowdsourcing of science-based research, particularly astronomy. The research platform has more than 80 active crowdsourced research projects, with 1.7 million registered users in 234 countries. Zooniverse had received a grant from the Institute of Museum and Library Services (IMLS) to expand the platform to better support galleries, libraries, archives, and museums in unlocking their data and engaging the public through crowdsourcing.

“When we first read this proposal, what really stood out was data set itself, the fragments. They are fascinating. They are interesting to engage with whether you are a Geniza scholar or a medievalist or a historian or someone with no background in any of those things,” says Samantha Blickhan, an IMLS postdoctoral fellow at the Adler Planetarium in Chicago, home to Zooniverse. “It is the type of cultural artifact anyone can appreciate.”

It was precisely because the fragments are difficult to decipher, and the project is challenging to create, that the Zooniverse team chose Penn’s proposal, she says.

“The opportunity to design something like this in a public space and for a public audience is something we couldn’t pass up,” Blickhan says. “One of the main tenets of crowdsourcing is the idea that anybody can engage in real research and engage with academics and researchers and take part in what they are doing.”

It was Kiron’s idea to try to find ways to make reading and transcribing Hebrew manuscripts “less daunting and more fun” to teach the different script types. “Our hope is to develop learning games that engage and train people to read by practicing recognition, repetition, and reinforcement,” he says.

A challenging aspect of this project, Blickhan notes, is it needs to be trilingual—in English, Hebrew, and Arabic. Also, Hebrew and Arabic are read from right to left, creating a site construction challenge.

Eckstein created this chart of scripts from various Geniza fragments, with help from other Geniza scholars, as tool for recognizing and deciphering Hebrew script types.

A two-stage approach

The first phase, launched on August 8, 2017, was designed to sort a first batch of digitized fragments as either Hebrew or Arabic, or in some cases both, as well as to determine whether the scripts were written in an informal or formal style. It included more than 30,000 fragments from the Jewish Theological Seminary through the Princeton Geniza Project and the 650-plus from the Penn Libraries collection.

That first phase ended May 19, and the more than 30,000 fragments that were successfully sorted are now the foundation for the second phase.

The second phase, which begins later this month, invites the public to participate in deciphering and transcribing the sorted fragments.

“Luckily for us the research community in first classification were incredible,” Blickhan says. “The results of the first phase are impressive.”

While there is some skepticism in the academic community about the public contributing to such a scholarly pursuit, Allen and Eckstein say that the work by the public so far has exceeded their expectations. The public has consistently agreed whether texts were Arabic or Hebrew, formal or informal.

“Our volunteers can’t do all of the work of experts or scholars, but the fact that they could do any part of the work was a leap of faith,” Allen says.

How good are they? “They are amazing,” Eckstein answers.

In an “Introductions” thread on the project message board, volunteers share some personal details and include an Arabic professor at Princeton, a pharmacist from Cairo, a geologist from New Zealand, a library science retiree in Florida, and a scientist in Arizona.

Many of the citizen researchers are right here at Penn. Amey Hutchins, a manuscript cataloging librarian, coordinates a lunchtime session open to all library staff to classify fragments. “The Penn staff is just a little part of this,” Hutchins says. “That is how it happens.” 

Heavy-hitting academics are content specialists, including Professor Marina Rustow, director of the Geniza Lab at Princeton; Professor Judith Olszowy-Schlanger, president of the Center for Hebrew and Jewish Studies at Oxford; and Professor Moshe Lavee, the co-head of the Interdisciplinary Centre for Genizah Research and Education at Haifa.

“It is a collaborative effort between the crowd and these institutions, but also with the input of the best scholars in the business, and it is really exciting,” Noel says.

On the site’s “Talk Boards,” or community message boards, researchers can discuss the work with each other and these experts. “There aren’t many easy paths for scholars to learn from non-scholars, and visa-versa, but this project makes that possible,” Allen says.

In phase two, the citizen researchers have an even more formidable challenge in transcribing the texts. The easier-to-read fragments are the first to be offered for analysis, and the website provides several online tools to help in the transcription.

In one of the most pioneering aspects of the project, Eckstein worked with Zooniverse to create 20 different “keyboards” of the ancient letters in Hebrew for reference, using images of hundreds of the handwritten letters she isolated from the Geniza fragments as examples.

“The thought is that if the crowd can transcribe Geniza fragments, which are in languages many people don’t know and in scripts hardly anybody knows,” Noel says, “then the crowd can do anything.”

There are many options for participation. If people aren’t ready to tackle the transcription task, they can instead look for key identifying words that can give a sense of what a document is about. “We are asking them to start looking for sight words, and perhaps identify the genre, and work at it like scholars would,” Allen says.

“Then you have a puzzle you can work your way into,” Eckstein says.

Anyone can give it a try. “It’s not for one person to get perfection, but for the crowd to get close,” Noel says. “This is what the digital world can do for you. It is the type of engagement with the public that simply otherwise would not be possible.”

Keeping it going

Another batch of fragments will enter phase one for sorting—5,000 fragments from Cambridge and another 15,000 from the John Rylands Library at the University of Manchester. Other universities and institutions are getting in the pipeline, as well.

“In a dream world, this 30,000 is only the beginning, and we could transcribe them all,” Allen says. “But if we can get some percentage transcribed we will have made an amazing contribution.”

The fragments and all the work from the project are available to the public on the OPenn site, free of charge. “The notion of creating cultural heritage for everyone to enjoy in an open way is something that Penn Libraries is hugely committed to,” Noel says.

“Using medieval manuscripts, which are marks of lives well spent centuries ago as a locus for interaction for people from all sorts of different cultures and walks of life to engage on a common endeavor to understand their past,” he continues. “What is a nobler cause than that in the humanities?”