Working at the intersection of data science and public policy

Ken Steif’s new book, ‘Public Policy Analytics: Code & Context for Data Science in Government,’ available online and in print, provides guidance for how governments and policymakers can use data and algorithms to solve complex service-delivery problems.

an aerial view of city streets at night
Ken Steif discusses his new book “Public Policy Analytics: Code & Context for Data Science in Government,” the challenges of incorporating data science into government decision-making, and how the pandemic has impacted the field of public policy analytics.

As the world marks one year since COVID-19 upended everyday life, governments around the world have often touted that they are “guided by data” or are “following the science” as they close or open different sectors of the economy. In practice, however, many of these decisions, from how schools have reopened to travel quarantine policies, have looked very different, highlighting the challenges of using and interpreting data to make decisions that impact entire cities or regions.

In his new book, “Public Policy Analytics: Code & Context for Data Science in Government,” Ken Steif, associate professor of practice in the Department of City and Regional Planning and the program director of Penn’s Master of Urban Spatial Analytics (MUSA) program, sheds light on this complex topic. Designed for policymakers without a technical background as well as for budding data scientists hoping to learn how to build data-driven tools, the book is a first-of-its-kind guide for technical topics and highlights the challenges of using data to address broader issues of equity and bureaucracy.

Penn Today spoke with Steif about the book, the challenges of incorporating data science into government decision-making, and how the pandemic has impacted the field of public policy analytics.

What was the impetus for writing this book?

For 10 years, I have been teaching students in Penn’s Master of Urban Spatial Analytics how to develop data-driven decision-making tools for government. Before coming to MUSA, many of our students are discouraged by their high school and college professors from taking STEM classes. Told to stick to the humanities, these students develop anxiety around coding and statistics.

The impetus for this book and for my teaching is to dispel with these anxieties, empowering students to write code and learn analytics by leveraging their interest in solving complex public policy problems. The book helps by providing a set of code examples and use cases they can use to copy and paste their way to meaningful solutions in domains ranging from housing, criminal justice, health and human services, environment, transportation, and more. I worked hard to provide my students with this resource; it only seems reasonable to share it more broadly.

Are there challenges specific to the use of data science in government and policymaking compared to other fields, such as scientific research or business analytics?

Data science is about data-driven decision-making. It is about service delivery. In business, if a new algorithm increases revenue, that becomes the new standard for decision-making. In government, there are economic interests, but most bottom lines are far more nuanced, like fairness, equity, politics, and bureaucracy.

An engineer can optimize for revenue, but it takes a social scientist to optimize for these other bottom lines. In this way, I argue that data science is akin to planning. For an algorithm to effectively deliver a government resource, it must be developed with great empathy. How has a government service traditionally been delivered? Was that strategy effective? Were some groups given more access than others? Have we been transparent about these issues?

This book also includes open access datasets and code that accompany each chapter. What is the importance in having both a conceptual framework as well as practical examples and exercises?

Over the last 15 years, one of the most transformational movements in government has been open data—the release of free and open-source government administrative data in a machine-readable format.

When leveraged by committed civic technologists, open data has helped scale government innovation at an unprecedented pace. I hope this book fills one related shortcoming—what I call ‘open analytics’. All governments collect the same administrative data and share the same service delivery use cases, like homelessness prevention, drug treatment, child welfare, public health, etc. This common set of use cases means that one agency can develop analytics and supporting materials, share the code, and enable other governments to replicate the solution in their own community. I hope providing code and context in this book helps jump start the open analytics movement.

a headshot of ken steif next to the cover of his book titled public policy analytics code and context for data science in government
Steif’s new book is a first-of-its-kind guide for technical topics in analytics and also highlights the challenges of using data to address broader issues of equity and bureaucracy in government decision-making.

One of the ideas you discuss in the book is algorithmic fairness. Could you explain this concept and its importance in the context of public policy analytics?

Structural inequality and racism is the foundation of American governance and planning. Race and class dictate who gets access to resources; they define where one lives, where children go to school, access to health care, upward mobility, and beyond.

If resource allocation has historically been driven by inequality, why should we assume that a fancy new algorithm will be any different? This theme is present throughout the book. Those reading for context get several in-depth anecdotes about how inequality is baked into government data. Those reading to learn the code, get new methods for opening the algorithmic black box, testing whether a solution further exasperates disparate impact across race and class.

In the end, I develop a framework called ‘algorithmic governance,’ helping policymakers and community stakeholders understand how to tradeoff algorithmic utility with fairness.

From your perspective, what are the biggest challenges in integrating tools from data science with traditional planning practices?

Planning students learn a lot about policy but very little about program design and service delivery. Once a legislature passes a $50 million line item to further a policy, it is up to a government agency to develop a program that can intervene with the affected population, allocating that $50 million in $500, $1,000 or $5,000 increments.

As I show in the book, data science combined with government’s vast administrative data is good at identifying at-risk populations. But doing so is meaningless unless a well-designed program is in place to deliver services. Thus, the biggest challenge is not teaching planners how to code data science but how to consider algorithms more broadly in the context of service delivery. The book provides a framework for this by comparing an algorithmic approach to service delivery to the ‘business-as-usual’ approach.

Has COVID-19 changed the way that governments think about data science? If so, how?

Absolutely—speaking of ‘service delivery,’ data science can help governments allocate limited resources. The COVID-19 pandemic is marked entirely by limited resources: From testing, PPE, and vaccines to toilet paper, home exercise equipment, and blow-up pools (the latter was a serious issue for my 7-year-old this past summer).

Government failed at planning for the allocation of testing, PPE, and vaccines. We learned that it is not enough for government to invest in a vaccine; it must also plan for how to allocate vaccines equitably to populations at greatest risk. This is exactly what we teach in Penn’s MUSA Program, and I was disappointed at how governments at all levels failed to ensure that the limited supply of vaccine aligned with demand.

We see this supply/demand mismatch show up time and again in government, from disaster response to the provision of health and human services. I truly believe that data can unlock new value here, but, again, if government is uninterested in thinking critically about service delivery and logistics, then the data is merely a sideshow.

What do you hope people gain by reading this book?

There is no equivalent book currently on the market. If you are an aspiring social data scientist, this book will teach you how to code spatial analysis, data visualization, and machine learning in R, a statistical programming language. It will help you build solutions to address some of today’s most complex problems.

If you are a policymaker looking to adopt data and algorithms into government, this book provides a framework for developing powerful algorithmic planning tools, while also ensuring that they will not disenfranchise certain protected classes and neighborhoods.

“Public Policy Analytics: Code & Context for Data Science in Government” is currently available for preorder from CRC Press and is also available online. Open access datasets and code can also be found on GitHub.