Penn, Georgia Collaboration Awarded $14.6 Million to Expand Pathogen Database, Expedite Worldwide Research

PHILADELPHIA -– Researchers from the University of Pennsylvania and the University of Georgia have been awarded a five-year, $14.6 million contract from the National Institute of Allergy and Infectious Diseases, part of the US National Institutes of Health, to expand and extend work on the Eukaryotic Pathogen Genome Database Resource, http://EuPathDB.org. This revolutionary open database enables scientists to examine genes, genomes, isolates, and other attributes related to a variety of important human pathogens. By helping to identify potential vaccine antigens and drug targets, EuPathDB facilitates the search for effective diagnostics and therapeutics.

This award continues NIH funding for a production database system integrating diverse genomic-scale datasets. EuPathDB has been expanded several times based on its success in expediting infectious disease research. The latest release supports a total of 27 species, providing bioinformatics tools for researchers targeting biodefense and emerging and re-emerging pathogens.

Originally developed for Plasmodium falciparum, a microbe responsible for the most severe form of human malaria, EuPathDB has been expanded several times based on its success in expediting infectious disease research. The latest release supports a total of 27 species, providing bioinformatics tools for researchers targeting biodefense and emerging and re-emerging pathogens.

The database also targets:

• Pathogens that threaten public water supplies, including Cryptosporidium, Giardia and Toxoplasma, with additional components dedicated to Entamoeba and Microsporidia to follow over the coming months.
• Opportunistic infections associated with AIDS and other immunosuppressed conditions, including Cryptosporidium, Microsporidia and Toxoplasma.
• The congenital pathogen Toxoplasma gondii, a leading source of neurological birth defects. Toxoplasma and Neospora caninum are also economically important as sources of congenital infection in farm animals.
• Trichomonas, a widespread cause of vaginitis in women.
• The parasites responsible for kala azar (Leishmania), African sleeping sickness (Trypanosoma brucei), and Chagas disease (Trypanosoma cruzi), which have been incorporated into this resource with support from the Bill & Melinda Gates Foundation.

The EuPathDB database is one of four Pathogen Bioinformatics Resource Centers supported by the NIH and is directed by principal investigator David S. Roos, E. Otis Kendall Professor of Biology in the School of Arts and Sciences at the University of Pennsylvania. Co-investigators include Christian Stoeckert of the School of Medicine at Penn and Jessica Kissinger of the University of Georgia. Roos and Stoeckert are also affiliated with the Penn Center for Bioinformatics and the Penn Genome Frontiers Institute, and Kissinger with the Center for Tropical and Emerging Global Diseases.

Understanding the genes of an organism and how they are expressed is a critical first step in preventing or treating disease. EuPathDB provides researchers with a database that catalogues every accessible step in the chronicle of disease pathogenesis. This database and its component web sites have been used by more than 42,000 scientists over the past six months, from more than 100 countries worldwide. Meeting presentations and workshops help to ensure effective use of this resource by the scientific community.

Advances in genome technology have dramatically increased both the scale and scope of information now available for human pathogens. For example, the first Plasmodium parasite genome sequence was completed in 2002 after six years of work and a cost of $35 million. Scientists can now sequence additional strains of the parasite in just a few days, for a few thousand dollars but the raw data for a single genome can generate terabytes of data, easily overwhelming a personal computer. Additional large-scale datasets supported by EuPathDB include DNA sequence polymorphisms from the wider population, chromosomal modifications, comprehensive studies on RNA transcription and protein expression, analysis of protein-protein interactions and metabolic pathways.

Consider a researcher working to develop a malaria vaccine. First, this scientist must identify which genes are active when the parasite is living in a human host, rather than when it lives in the mosquito. They must then determine which of those genes encodes protein antigens likely to be recognized by the immune system. By taking all of these factors into account – plus many more – the researcher can narrow the many thousands of genes in the parasite genome down to a few dozen candidates for further testing.

“It has been remarkable to witness the rapid growth of biomedical research in recent years, fueled by the genomic revolution” says Roos, “and it is particularly gratifying to see the impact of bioinformatics tools such as EuPathDB. By integrating diverse sources of information -- all the genes in the genome, all the proteins in the cell, all patient responses in a population -- these databases offer great promise for improved human health.”

###