Q&A: Integrated Data Systems Needed for More Actionable Policy Proposals

The COVID-19 pandemic has helped economists and epidemiologists understand they need each other’s expertise to improve their models and the policy proposals they generate. Data from new technologies and disparate sources could result in more actionable policy recommendations that address the cross-divisional causes of issues such as health disparities.

Authors:

Joshua E. Porterfield, PhD

April 20, 2022

The interplay between epidemiology and economics has been a key topic throughout the Pandemic Data Initiative’s Q&A series, which has featured many Johns Hopkins University faculty (Drs. Lauren Gardner, Nicholas Papageorge, and Daniel Polsky) on the National Bureau of Economics Research working paper: Modeling to Inform Economy-Wide Pandemic Policy: Bringing Epidemiologists and Economists Together.

Now, Dr. Michael Darden, associate professor of economics at the Johns Hopkins Carey Business School, joins the ranks of his co-authors to provide perspective on applying the data lessons that epidemiologists and economists have learned together to develop actionable policy proposals in a post-pandemic future marked by cross-divisional collaboration and integrated data systems.

How should epidemiological data and economic data affect policy decisions?

I wouldn’t distinguish between epidemiological data and economic data — each field uses a wide variety of data. Both disciplines make policy recommendations by using data to inform their respective models, it’s just that our models have historically been developed to do very different things. To simplify, epidemiology models focus on infectious disease transmission and outcomes; economic models focus on overall individual welfare, with a focus on individual behavior. In the initial stages of the COVID-19 pandemic, communication between epidemiologists and economists was quite poor, but I think it has improved significantly. My perspective is that model integration — or at least cross-discipline communication — is really important in preparing for the next pandemic. This communication will pinpoint data collection that is meaningful for policy.

‘Data need to be integrated.’

It's a shame that we could not have been on that same page in March 2020, and from an economist's perspective, I was frustrated that individual behavior research was prioritized less. Indeed, the former National Institutes of Health director, Dr. Francis Collins, said that the biggest failure of the NIH was that they were not funding more behavioral research. We have seen many ways in which behavior is important to COVID-19 outcomes and health, from vaccine hesitancy to masking to social distancing. All of these things are ultimately choices that people make, so understanding the choice process is crucial. To really understand the choice process, data have to be integrated across multiple dimensions, from the labor market to the legal system to political polarization and social cohesion.

What is the importance of consensus between epidemiological and economic models?

Total consensus is not the goal. Rather, I think the goal of economists and epidemiologists should be to illuminate the trade-offs associated with different policy proposals. Epidemiology models will talk about how infectious disease spreads through the population. Economic models will talk about the larger welfare implications of things like lockdowns. As long as policymakers are presented with a menu of choices, and the trade-offs between items on that menu are made clear, we've done our job.

Are there any data streams that would improve policy proposals from these models?

Historically, we have only observed individuals when they interact with our systems and institutions. This is a constant problem in much of health economics — we, as researchers, often only learn about people when they actually go to the doctor or get tested in a way that registers in official statistics. That makes it hard to understand things like population risk or total disease burden. With low socioeconomic status individuals, who lack access to care in the first place, it makes it really difficult to precisely calculate the magnitude of disparities because of this selection problem. Historically, our data systems haven’t observed people when they stay at home.

The selection problem is improving because of technology. A perfect example is the plethora of recent papers that employ anonymous cell phone data. When you install an app on your phone and rush through all of the agreements, you're allowing the cell companies to track your location and usage data. Then, third-party companies collect all of that information and compile datasets on mobility and individual patterns of use. Privacy is a big concern with this, and the legal framework around these data is evolving, but this is hugely helpful for research. We can now see how people are actually responding to risk in real time.

‘In the future I want more data that is technology-driven and outside of formal institutions.’

However, these data are not open, and transparency is a real issue. You have to purchase access to them from third party intermediaries, which brings it back to the NIH — we need funding for behavioral research that pushes these boundaries. Researchers need federally-funded opportunities to use innovative new data. Wider access to these types of data will bring transparency and replicability.

How can technological advancements in data improve policy prescriptions from research?

Data-driven policy prescriptions will only be as good as the underlying data. And while technology has improved the data available to researchers, traditional sources of data are becoming worse. Survey non-response has become a major impediment to the traditional research methods used to understand the world. People are simply, by and large, unwilling to fill out surveys anymore. The Census Bureau is increasingly relying on imputation methods to fill in gaps in data, which is generally fine in aggregate. But, if you're interested in, for example, questions about disparities that require conditional averages from certain subpopulations, the data can get really messy. You’re zooming into the data so much that the conditional averages become untrustworthy.

Additionally, technology helps us work towards integrated data, which have historically been difficult because political and institutional interests have prevented their construction. But integrated data are key to understanding social phenomena. For example, health disparities may be a function of environmental factors, systemic racism, historical housing policies, labor market disparities, and more. Causal evidence on only one of these dimensions can be helpful, but integrated data allow us to evaluate the relative importance of different contributors to disparate outcomes.

‘When we talk about actionable policy, we need data that crosses many fields.’

Scandinavian countries, like Denmark and Sweden, have offered us a glimpse into how integrated data can improve policy. They have systems that are integrated across healthcare, labor market activity, and different social indicators. You get a much richer picture of a person's whole life course and what choices they make. The trade-off that we've made in the U.S., and possibly this is what hinders implementation of policy proposals, is prioritizing privacy. We can keep learning about four million Danes, which of course is very helpful, or we could actually do something similar here. My hope is that we'll see more and more data integration as the legal framework evolves to protect privacy.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.