New Mathematical Model Explains Variability in Mutation Rates Across the Human Genome
It turns out that the type, how frequent, and where new mutations occur in the human genome depends on which DNA building blocks are nearby, found researchers from the Perelman School of Medicine at the University of Pennsylvania in an advanced online study published this week inNature Genetics.
“We developed a mathematical model to estimate the rates of mutation as a function of the nearby sequences of DNA ‘letters’ -- called nucleotides -- in the human genome,” said senior author Benjamin F. Voight, PhD, an assistant professor in the department of Systems Pharmacology and Translational Therapeutics and the department of Genetics. “This new model not only provides clues into the process of mutation, but also helps discover possible genetic risk factors that influence complex human diseases, such as autism spectrum disorder.”
This study focuses on the probability that any given nucleotide in the human genome -- one of the four letters (A, C, G or T for adenine, cytosine, guanine or thymine) of the DNA alphabet -- is changed. Voight focused on the simplest type of mutation, a “point” mutation in which a single letter is changed in a given sequence. Most of these changes -- often called single nucleotide polymorphisms (SNPs), or “snips” -- are usually not harmful to the functioning of the human body. Nevertheless, Voight examined why some sequences are more prone to mutate, whereas others are not.
“The crux of the paper examines the dependency of mutation rate on which nucleotides are one, two, or three bases away from either side of a SNP,” Voight said. “We already know about one situation in which this placement matters: DNA sequences in the genome where methyl groups are attached to the cytosine nucleotide, also known as CpG sites, are hotspots for mutation. But are there other types of local sequences that matter beyond these?”
To address this question, Voight and graduate student Varun Aggarwala, a doctoral candidate from the Genomics and Computational Biology graduate group, devised a mathematical model applicable to SNP data found in humans. Their approach took advantage of publicly available data from thousands of human subjects sampled from across the globe, namely from the 1000 Genomes Project. These individuals were sequenced as part of an international initiative to characterize the genetic variation that naturally occurs in human populations.