This is an excerpt adapted from “The Ethical Algorithm: The Science of Socially Aware Algorithm Design” by Michael Kearns and Aaron Roth of the School of Engineering and Applied Science and published by Oxford University Press.
In December 2018, the New York Times obtained a commercial dataset containing location information collected from phone apps whose nominal purpose is to provide mundane things like weather reports and restaurant recommendations. Such datasets contain precise locations for hundreds of millions of individuals, each updated hundreds of times a day. Commercial buyers of such data will generally be interested in aggregate information, but the data is recorded by individual phones. It is superficially anonymous, without names attached, but there is only so much anonymity you can promise when recording a person’s every move.
From this data, the New York Times was able to identify a 46-year-old math teacher named Lisa Magrin. She was the only person who made the daily commute from her home in upstate New York to the middle school where she works, 14 miles away. And once someone’s identity is uncovered in this way, it’s possible to learn a lot more about them. The Times followed Lisa’s data trail to Weight Watchers, to a dermatologist’s office, and to her ex-boyfriend’s home. Just a couple of decades ago, this level of intrusive surveillance would have required a private investigator or a government agency. Now, it is simply the by-product of widely available commercial datasets.
It’s not only privacy that has become a concern as data gathering and analysis proliferate: Algorithms aren’t simply analyzing the data that we generate with our every move, they are also being used to actively make decisions that affect our lives. When you apply for a credit card, your application may never be examined by a human being. Instead, an algorithm pulling in data about you from many different sources might automatically approve or deny your request.
In many states, algorithms based on what is called machine learning are also used to inform bail, parole, and criminal sentencing decisions. All this raises questions not only of privacy but also of fairness as well as a variety of other basic social values including safety, transparency, accountability, and even morality.
If we are going to continue to generate and use huge datasets to automate important decisions, we have to think seriously about some weighty topics. These include limits on the use of data and algorithms and the corresponding laws, regulations, and organizations that would determine and enforce those limits. But we must also think seriously about addressing the concerns scientifically—about what it might mean to encode ethical principles directly into the design of the algorithms that are increasingly woven into our daily lives.
You might be excused for some skepticism about imparting moral character to an algorithm. An algorithm, after all, is just a human artifact or tool, like a hammer, and who would entertain the idea of an ethical hammer? Of course, a hammer might be put to an unethical use—as an instrument of violence, for example—but this can’t be said to be the hammer’s fault. Anything ethical about the use or misuse of a hammer can be attributed to the human being who wields it.
But algorithms—especially those deploying machine learning—are different. They are different both because we allow them a significant amount of agency to make decisions without human intervention and because they are often so complex and opaque that even their designers cannot anticipate how they will behave in many situations.
Unlike a hammer, it is usually not so easy to blame a particular misdeed of an algorithm directly on the person who designed or deployed it. There are many instances in which algorithms leak sensitive personal information or discriminate against one demographic or another. But how exactly do these things happen? Are violations of privacy and fairness the result of incompetent software developers or, worse yet, the work of evil programmers deliberately coding racism and back doors into their programs?
The answer is a resounding no, but the real reasons for algorithmic misbehavior are perhaps even more disturbing than human incompetence or malfeasance, which we are at least more familiar with and have some mechanisms for addressing. Society’s most influential algorithms, from Google search and Facebook’s News Feed to credit scoring and health risk assessment algorithms, are generally developed by highly trained engineers who are carefully applying well-understood design principles. The problems actually lie within those very principles, most specifically those of machine learning.
The standard and most widely used algorithms in machine learning are simple, transparent, and principled. But the models they produce—the outputs of such algorithms—can be complicated and inscrutable, especially when the input data is itself complex and the space of possible models is immense. And this is why the human being deploying the model won’t fully understand it: The opacity of machine learning, and the problems that can arise, are really emergent phenomena that result when straightforward algorithms are allowed to interact with complex data to produce complex predictive models.
For example, it may be that a model trained to predict collegiate success, when used to make admissions decisions, happens to falsely reject qualified black applicants more often than qualified white applicants. Why? Because the designer didn’t anticipate it. He or she didn’t tell the algorithm to try to equalize the false rejection rates between the two groups, so it didn’t.
In its standard form, machine learning won’t give you anything “for free” that you didn’t explicitly ask for, and may in fact often give you the opposite of what you wanted. Put another way, the problem is that rich model spaces such as neural networks may contain many “sharp corners” that provide the opportunity to achieve their objective at the expense of other things we didn’t explicitly think about, such as privacy or fairness.
The result is that the complicated, automated decision-making that can arise from machine learning has a character of its own, distinct from that of its designer. The designer may have had a good understanding of the algorithm that was used to find the decision-making model, but not of the model itself. To make sure that the effects of these models respect the societal norms that we want to maintain, we need to learn how to design these goals directly into our algorithms.
Take privacy, for example: If we are worried that data-driven products will reveal compromising information about someone, we can use a recent, powerful tool known as differential privacy that provably prevents this. The idea is to add “noise” to computations in a way that preserves our broad algorithmic goals but obscures the data of any particular individual.
For instance, rather than computing the exact fraction of Google Chrome users who have navigated to a particular embarrassing website, Google might instead “corrupt” that fraction by adding a small random number. The corruption can be small enough that a statistician can be highly confident about the population-level statistics while still guaranteeing that nobody can be certain about the browsing habits of any specific person. There is a real tension between overall accuracy and the degree of such privacy we can promise, but such trade-offs are inevitable.
The Google example is more than hypothetical: Google, Apple, and a number of other tech companies now collect certain usage statistics subject to the protections of differential privacy. The difficult (but unavoidable) trade-off between privacy and accuracy has predictably made the deployment of this technology controversial amongst product teams that would prefer to have unfettered access to the data.
This is just an early example of the kind of negotiation we will have to engage in more broadly as a society, and very soon, because tradeoffs like this are a hard, fundamental truth about embedding ethical norms into algorithms. Just as with privacy, if we want an algorithm to respect a particular quantitative notion of fairness (for example, equalizing the false rejection rates between white and black college applicants), that will inevitably come at a cost, often to raw accuracy. We are beginning to understand the science of how to map out and navigate these tradeoffs, but that’s just the necessary first step before society can grapple with how it wants to manage them.
Adapted from “The Ethical Algorithm: The Science of Socially Aware Algorithm Design” by Michael Kearns and Aaron Roth. Copyright © 2020 by Michael Kearns and Aaron Roth and published by Oxford University Press. All rights reserved.
To learn more about AI, read Penn Today’s three-part series about how algorithms and automation are shaping the modern world: “Bots, biases, and binge watching.” Kearns and Roth were also featured in a recent podcast conversation about AI.
Michael Kearns is the National Center Professor of Management & Technology in the Department of Computer and Information Science in the School of Engineering and Applied Science at the University of Pennsylvania and the founding director of the Warren Center for Network and Data Sciences.
Aaron Roth is the Class of 1940 Bicentennial Term Associate Professor of Computer and Information Science in the School of Engineering and Applied Science at the University of Pennsylvania.