The key to fixing AI bias and copyright infringement

Penn Engineering’s Michael Kearns, National Center Professor of Management & Technology, questions whether model disgorgement can potentially solve a number of problems related to AI.

By now, the challenges posed by generative AI are no secret. Models like OpenAI’s ChatGPT, Anthropic’s Claude and Meta’s Llama have been known to “hallucinate,” inventing potentially misleading responses, as well as divulge sensitive information, like copyrighted materials.

One potential solution to some of these issues is “model disgorgement,” a set of techniques that force models to purge themselves of content that leads to copyright infringement or biased responses.

Michael Kearns.
Michael Kearns, National Center Professor of Management & Technology. (Image: Courtesy of Penn Engineering)

In a paper in Proceedings of the National Academy of Sciences (PNAS), Michael Kearns, National Center Professor of Management & Technology in Computer and Information Science (CIS), and three fellow researchers at Amazon share their perspective on the potential for model disgorgement to solve some of the issues facing AI models today.

Kearns explains how model disgorgement is different from efforts to ensure data privacy, like Europe’s General Data Protection Regulation. “Laws like the GDPR are less clear about what happens before your data is deleted. Your data was used to train a predictive model, and that predictive model is still out there, operating in the world. That model will still have been trained on your data even after your data is deleted from Facebook’s servers. This can lead to a number of problems,” Kearns says. “It’s not that model disgorgement is different from efforts to ensure data privacy, it’s more that model disgorgement techniques can be used in certain situations where current approaches to privacy like the GDPR fall short.”

Kearns offers some examples of model disgorgement techniques and how they work. “One conceptually straightforward solution is retraining from scratch. This is clearly infeasible given the scale and size of these networks and the compute time and resources it takes to train them. At the same time, retraining is kind of a gold standard—what you would like to achieve in a more efficient, scalable way,” he explains. “Another algorithmic approach is training under the constraint of differential privacy: adding noise to the training process in a way that minimizes the effects of any particular piece of training data, while still letting you use the aggregate properties of the data set.”

Kearns adds, “The great success story of the internet has come from basically the lack of rules. You pay for the lack of rules, in the areas that we’re discussing here today. Most people who think seriously about privacy and security would probably agree with me that a lot of the biggest problems in those topics come from the lack of rules, the design of the internet, but that’s also what made it so accessible and successful.”

This story is by Ian Scheffler. Read more at Penn Engineering.