Machine learning (ML) programs computers to learn the way we do—through the continual assessment of data and identification of patterns based on past outcomes. ML can quickly pick out trends in big datasets, operate with little to no human interaction and improve its predictions over time. Due to these abilities, it is rapidly finding its way into medical research.
People with breast cancer may soon be diagnosed through ML faster than through a biopsy. ML may also help paralyzed people regain autonomy using prosthetics controlled by patterns identified in brain scan data. ML research promises these and many other possibilities to help people lead healthier lives. But while the number of ML studies grow, the actual use of it in doctors’ offices has not expanded.
The limitations lie in medical research’s small sample sizes and unique datasets. This small data makes it hard for machines to identify meaningful patterns. The more data, the more accuracy in ML diagnoses and predictions. For many diagnostic uses, massive numbers of subjects in the thousands would be needed, but most studies use smaller numbers in the dozens of subjects.
But there are ways to find significant results from small datasets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate significance in a dataset that in reality may be just random outliers.
This tactic, known as P-hacking or feature hacking in ML, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper doesn’t translate to a doctor’s ability to diagnose or treat us. These statistical mistakes, oftentimes done unknowingly, can lead to dangerous conclusions.
To help scientists avoid these mistakes and push ML applications forward, Konrad Kording, a Penn Integrates Knowledge University Professor with appointments in the the Department of Neuroscience in the Perelman School of Medicine and in the Departments of Bioengineering and Computer and Information Science in the School of Engineering and Applied Science, is leading an aspect of a large, NIH-funded program known as CENTER – Creating an Educational Nexus for Training in Experimental Rigor. Kording will lead Penn’s cohort by creating the Community for Rigor, which will provide open-access resources on conducting sound science. Members of this inclusive scientific community will be able to engage with ML simulations and discussion-based courses.
“The reason for the lack of ML in real-world scenarios is due to statistical misuse rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and many times we can track that back to their use of statistics.”
To make meaningful advancements in the field of ML in biomedical research, it will be necessary to raise awareness of these issues, help researchers understand how to identify them and limit them, and create a stronger culture around scientific rigor in the research community.
Kording aims to communicate that just because incorporating machine learning into biomedical research can introduce room for bias doesn’t mean scientists should avoid it. They just need to understand how to use it in a meaningful way.
The Community for Rigor aims to address challenges of the field with specific plans to create a module on machine learning in biomedical research that will guide participants through datasets and statistical tests and pinpoint exact locations where bias is commonly introduced.
This story is by Melissa Pappas. Read more at Penn Engineering Today.