Working at the intersection of data science and public policy

Ken Steif’s new book, ‘Public Policy Analytics: Code & Context for Data Science in Government,’ available online and in print, provides guidance for how governments and policymakers can use data and algorithms to solve complex service-delivery problems.

As the world marks one year since COVID-19 upended everyday life, governments around the world have often touted that they are “guided by data” or are “following the science” as they close or open different sectors of the economy. In practice, however, many of these decisions, from how schools have reopened to travel quarantine policies, have looked very different, highlighting the challenges of using and interpreting data to make decisions that impact entire cities or regions.

In his new book, “Public Policy Analytics: Code & Context for Data Science in Government,” Ken Steif, associate professor of practice in the Department of City and Regional Planning and the program director of Penn’s Master of Urban Spatial Analytics (MUSA) program, sheds light on this complex topic. Designed for policymakers without a technical background as well as for budding data scientists hoping to learn how to build data-driven tools, the book is a first-of-its-kind guide for technical topics and highlights the challenges of using data to address broader issues of equity and bureaucracy.

Penn Today spoke with Steif about the book, the challenges of incorporating data science into government decision-making, and how the pandemic has impacted the field of public policy analytics.

What was the impetus for writing this book?

For 10 years, I have been teaching students in Penn’s Master of Urban Spatial Analytics how to develop data-driven decision-making tools for government. Before coming to MUSA, many of our students are discouraged by their high school and college professors from taking STEM classes. Told to stick to the humanities, these students develop anxiety around coding and statistics.

The impetus for this book and for my teaching is to dispel with these anxieties, empowering students to write code and learn analytics by leveraging their interest in solving complex public policy problems. The book helps by providing a set of code examples and use cases they can use to copy and paste their way to meaningful solutions in domains ranging from housing, criminal justice, health and human services, environment, transportation, and more. I worked hard to provide my students with this resource; it only seems reasonable to share it more broadly.

Are there challenges specific to the use of data science in government and policymaking compared to other fields, such as scientific research or business analytics?

Data science is about data-driven decision-making. It is about service delivery. In business, if a new algorithm increases revenue, that becomes the new standard for decision-making. In government, there are economic interests, but most bottom lines are far more nuanced, like fairness, equity, politics, and bureaucracy.

An engineer can optimize for revenue, but it takes a social scientist to optimize for these other bottom lines. In this way, I argue that data science is akin to planning. For an algorithm to effectively deliver a government resource, it must be developed with great empathy. How has a government service traditionally been delivered? Was that strategy effective? Were some groups given more access than others? Have we been transparent about these issues?

This book also includes open access datasets and code that accompany each chapter. What is the importance in having both a conceptual framework as well as practical examples and exercises?

Over the last 15 years, one of the most transformational movements in government has been open data—the release of free and open-source government administrative data in a machine-readable format.

When leveraged by committed civic technologists, open data has helped scale government innovation at an unprecedented pace. I hope this book fills one related shortcoming—what I call ‘open analytics’. All governments collect the same administrative data and share the same service delivery use cases, like homelessness prevention, drug treatment, child welfare, public health, etc. This common set of use cases means that one agency can develop analytics and supporting materials, share the code, and enable other governments to replicate the solution in their own community. I hope providing code and context in this book helps jump start the open analytics movement.

One of the ideas you discuss in the book is algorithmic fairness. Could you explain this concept and its importance in the context of public policy analytics?

Structural inequality and racism is the foundation of American governance and planning. Race and class dictate who gets access to resources; they define where one lives, where children go to school, access to health care, upward mobility, and beyond.

If resource allocation has historically been driven by inequality, why should we assume that a fancy new algorithm will be any different? This theme is present throughout the book. Those reading for context get several in-depth anecdotes about how inequality is baked into government data. Those reading to learn the code, get new methods for opening the algorithmic black box, testing whether a solution further exasperates disparate impact across race and class.

In the end, I develop a framework called ‘algorithmic governance,’ helping policymakers and community stakeholders understand how to tradeoff algorithmic utility with fairness.

From your perspective, what are the biggest challenges in integrating tools from data science with traditional planning practices?

Planning students learn a lot about policy but very little about program design and service delivery. Once a legislature passes a $50 million line item to further a policy, it is up to a government agency to develop a program that can intervene with the affected population, allocating that $50 million in $500, $1,000 or $5,000 increments.

As I show in the book, data science combined with government’s vast administrative data is good at identifying at-risk populations. But doing so is meaningless unless a well-designed program is in place to deliver services. Thus, the biggest challenge is not teaching planners how to code data science but how to consider algorithms more broadly in the context of service delivery. The book provides a framework for this by comparing an algorithmic approach to service delivery to the ‘business-as-usual’ approach.

Has COVID-19 changed the way that governments think about data science? If so, how?

Absolutely—speaking of ‘service delivery,’ data science can help governments allocate limited resources. The COVID-19 pandemic is marked entirely by limited resources: From testing, PPE, and vaccines to toilet paper, home exercise equipment, and blow-up pools (the latter was a serious issue for my 7-year-old this past summer).

Government failed at planning for the allocation of testing, PPE, and vaccines. We learned that it is not enough for government to invest in a vaccine; it must also plan for how to allocate vaccines equitably to populations at greatest risk. This is exactly what we teach in Penn’s MUSA Program, and I was disappointed at how governments at all levels failed to ensure that the limited supply of vaccine aligned with demand.

We see this supply/demand mismatch show up time and again in government, from disaster response to the provision of health and human services. I truly believe that data can unlock new value here, but, again, if government is uninterested in thinking critically about service delivery and logistics, then the data is merely a sideshow.

What do you hope people gain by reading this book?

There is no equivalent book currently on the market. If you are an aspiring social data scientist, this book will teach you how to code spatial analysis, data visualization, and machine learning in R, a statistical programming language. It will help you build solutions to address some of today’s most complex problems.

If you are a policymaker looking to adopt data and algorithms into government, this book provides a framework for developing powerful algorithmic planning tools, while also ensuring that they will not disenfranchise certain protected classes and neighborhoods.

“Public Policy Analytics: Code & Context for Data Science in Government” is currently available for preorder from CRC Press and is also available online. Open access datasets and code can also be found on GitHub.

Credits

Writer

Erica K. Brockmeier

More from

Stuart Weitzman School of Design

Data Science

Urban Planning

Q&A

Recent Articles

People gather around a large map placed on the floor.

Global

From a desert to an oasis: Penn engages in ambitious greening effort in the Sahel

Students from the Weitzman School of Design journeyed to Senegal to help with a massive ecological and infrastructural greening effort as part of their coursework. The Dakar Greenbelt aims to combat desertification and promote sustainable urban growth.

People looking at the After Modernism exhibit at the Arthur Ross Gallery.

Arts, Humanities, & Social Sciences

The practice of art collection as a collaboration

As part of an undergraduate course, Penn faculty and students curated an Arthur Ross Gallery exhibition of works from the Neumann family’s extensive collection of modern and contemporary art.

Scientists holding a model of something (forthcoming)

Campus & Community

Penn Center for Innovation celebrates 10 years

The University’s nexus for technology transfer supports researchers in their innovative efforts, from CAR T to mRNA advancements that have dramatically reshaped the world.

The exterior of the Vagelos building lit up with dramatic lighting.

Technology

An illuminating celebration to a brighter, greener future

Members of the Penn community celebrated an energy research milestone: the unveiling of the new Vagelos Laboratory for Energy Science and Technology.