A key question for many interventions’ impact is how long the intervention changes some output counterfactually, or how long the intervention washes out. This is often the case for work to change policy: the cost-effectiveness of efforts to pass animal welfare ballot initiatives, nuclear non-proliferation policy, climate policy, and voting reform, for example, will depend on (a) whether those policies get repealed and (b) whether they would pass anyway. Often there is an explicit assumption, e.g., that passing a policy is equivalent to speeding up when it would have gone into place anyway by X years.[1] [2] As people routinely note when making these assumptions, it is very unclear what assumption would be appropriate.
In a new paper (my economics “job market paper”), I address this question, focusing on U.S. referendums but with some data on other policymaking processes:
Policy choices sometimes appear stubbornly persistent, even when they become politically unpopular or economically damaging. This paper offers the first systematic empirical evidence of how persistent policy choices are, defined as whether an electorate’s or legislature’s decisions affect whether a policy is in place decades later. I create a new dataset that tracks the historical record of more than 800 state policies that were the subjects of close referendums in U.S. states since 1900. In a regression discontinuity design, I estimate that passing a referendum increases the chance a policy is operative 20, 40, or even 100 years later by over 40 percentage points. I collect additional data on U.S. Congressional legislation and international referendums and use existing data on state legislation to document similar policy persistence for a range of institutional environments, cultures, and topics. I develop a theoretical model to distinguish between possible causes of persistence and present evidence that persistence arises because policies’ salience declines in the aftermath of referendums. The results indicate that many policies are persistently in place—or not—for reasons unrelated to the electorate’s current preferences.
Below I’ll pull out some key takeaways that I think are relevant to the EA community and in some cases did not make it into the paper.
Overview of Results and Methods
My strategy in the paper involves comparing how many policies that barely passed or barely failed in U.S. state-level referendums are in place over time. I collect data on all referendums whose vote outcome is within 2.5 percentage points of the threshold for passage (typically 50%) since 1900 in a subset of U.S. states. I then do what’s called a regression discontinuity design, which allows me to estimate the effect of passing a referendum on whether it is in place later on.
The headline result from the paper is below. Many referendums that barely fail eventually pass in the first few years or decades afterward, and then this levels off. At 100 years later, just under 80% of the barely passed ones are in place compared to just under 40% of the barely failed ones. Importantly, the hazard rate—the rate at which this effect declines over time—is much lower in the later years, meaning that if you were to extrapolate this out beyond 100 years, the effect at 200 years would be expected to be significantly more than 40% * 40%.
Something relevant to EAs that I don’t focus on in the paper is how to think about the effect of campaigning for a policy given that I focus on the effect of passing one conditional on its being proposed. It turns out there’s a method (Cellini et al. 2010) for backing this out if we assume that the effect of passing a referendum on whether the policy is in place later is the same on your first try is the same as on your Nth try. Using this method yields an estimate of the effect of running a successful campaign on later policy of around 60% (Appendix Figure D20).
The assumption required for these point estimates to be unbiased is fairly strong, but what should be less controversial is that the effect of campaigning to pass a referendum given that nobody else is pursuing it is much larger than the effect of passing one that has already been proposed. Concretely, my mainline estimate in the above graph tells us what the effect is of pushing an existing referendum over the edge. If the proponents of a referendum plan to try again and again, this lowers the effect. Instead, we might be interested in the effect of Open Philanthropy funding a ballot measure campaign, assuming they never again attempt it. The latter is likely much larger than the effects previously presented.
One thing you might wonder about is whether this happens because some policies are unimportant, so nobody does anything about them. I look at this in a few ways, including subjective judgments of a policy’s impact and projections of a policy’s fiscal impact. I don’t see any differences here. I also look at what happens if we drop policies that arguably became obsolete, and this only makes the effect larger (because such policies are often struck from the books).
When we look across policy topics and policies’ political orientation, things look strikingly similar:
You might also wonder whether there are slippery slope (or backlash) effects where a policy leads to some sort of reaction. I discuss this in the paper (Figure 5 and Appendix Figure D13). There are suggestive signs of this happening to a small degree, though with the important caveat that I only look at effects on the same policy (e.g., does banning battery cages lead to bans of enriched cages) rather than the broader universe of policies (e.g., does banning battery cages lead to other animal welfare policies).
I won’t go into depth here on why this is happening, but the story that best fits the data is based on a decline in policies’ salience over time. Political actors’ interest in a policy goes up and down, and after the period of a referendum passes, people stop thinking about it because of a regression-to-the-mean effect. I don’t find evidence that people change their minds in response to a policy or that policies create their own support over time.
As a final piece, I look at legislation and non-U.S. referendums. The samples are much smaller and end earlier, but the pattern is similar to that for U.S. state-level referendums:
Notes Particular to the EA Community
Policy Changes Seem to Matter (Much) Longer than EAs Have Assumed
As noted in my introduction, EA organizations often have to make assumptions about how long a policy intervention matters in calculating cost-effectiveness. Typically people assume that passing a policy is equivalent to having it in place for around five years more or moving the start date of the policy forward by around five years. These results suggest that this is off by more than an order of magnitude. If you look at my estimates above, the half-life of a policy is about 50 years, but even this is probably not the right statistic to use. Since the hazard rate is much lower after the first few decades, the average number of extra years a policy is in place by virtue of passing is probably at least 100 years. See the implications section of the paper for some discussion of this.
Policies are so persistent that the impacts of policy interventions might depend more on technological reasons why they could become obsolete than political ones (e.g., a policy to solve an x-risk might stop mattering because the risk is resolved otherwise).
Neglectedness Matters
One interesting result of the paper is that neglectedness seems is key to whether a policy change matters for a long time. For policies that can be expected to attract more interest after the referendum passes, I see less persistence. It is not a hugely dramatic effect, but it could make a difference on the margin or in extreme cases. This seems to lend some support to the EA practice of paying attention to neglectedness.
Comparing Persistence: Can We Compare Policy to Other Social Changes?
One of the questions I can imagine people having in this community is whether this favors policy work relative to other work. I think this is pretty unclear because we don’t have comparable estimates for, e.g., the longevity of corporate policies or cultural changes. It would be reasonable to take this as an update in favor of policy work to the extent it is more persistence than expected. My best guess would be that culture is less persistent than policy (see, e.g., Table 3 of Giuliano and Nunn 2021), and I’d guess similarly for corporate policy (see, e.g., Table 8 of Flammer 2015 and Table VIII of Cuñat et al. 2012) but have a lot of uncertainty.
- ^
“Throughout this report, I assume that each ballot initiative has a speed-up time of four years.” https://docs.google.com/document/d/1p7xqop2FlIF8Kw45za0NnJPwvUA70Mb1UzjijMRKRr8/edit
- ^
“Our highly uncertain realistic estimate is that through their work, CATF brought regulation on US coal plants forward by 18 months… In summary, we believe that by proposing RED when they did, CfRN at least brought RED forward by a year, and most likely brought it forward by 2 years, though this estimate may be conservative.” https://www.founderspledge.com/downloads/fp-climate-change
This is great. I have been eagerly awaiting a final version of this. One thing I found when evaluating policy change was that even with very conservative assumptions about how long policy persists (some of which you cite), the cost-effectiveness can look really good. It’s very nice to have some more realistic data. It seems to significantly strengthen the case for policy over more direct stuff. Though of course there is also the downside that bad policy would also persist for quite a long time
(Nothing useful to contribute but wanted to say this seems very nice!)
Agreed, this looks fantastic, @zdgroff!
Thanks a lot! And good luck on the job market to you—let’s connect when we’re through with this (or if we have time before then).
It’s fun to see job market candidates posting summaries here! (@basil.halperin I just saw your paper on MR.) It’s a great venue for a high-level summary. Good luck to you both!
Good luck, both! Are there any other economist EAs on the job market this year?
Yep, at the risk of omitting others, Lukas Freund as well.
I would also like to hear this, as a well wisher :)
I’m curating this post. I love the way you have done this link-post, pulling out sections of interest to the EA community. Always helpful to see order of magnitude updates to EA BOTECs.
Neat work. I wouldn’t be surprised if this ends up positively updating my view on the cost-effectiveness of advocacy work.
What’s your take on possibility someone could empirically tackle a related issue we also tend to do a lot of guessing at—the likelihood of $X million spent advocacy in a certain domain leading to reform.
I think there are probably ways to tackle that but don’t have anything shovel-ready. I’d want to look at the general evidence on campaign spending and what methods have been used there, then see if any of those would apply (with some adaptations) to this case.
Thanks!
So, directionally if not literally, are you suggesting that in policy BOTECs, rather than assuming a policy will happen eventually and have indefinite impacts, so we only need add in how many extra years of impact occurred by the intervention succeeding now rather than later—we should be including a metric “how many years will this have impact for” and assigning ~100 years. And then take your data suggesting 80% of policies that barely passed were still in place 100 years later, but 40% of those that barely failed are. So should we be doing something like: That 100 year value (Probability of passing * 80%) - That 100 year value (probability of failing * 40%)
Yes, basically (if I understand correctly). If you think a policy has impact X for each year it’s in place, and you don’t discount, then the impact of causing it to pass rather than fail is something on the order of 100 * X. The impact of funding a campaign to pass it is bigger, though, because you presumably don’t want to count the possibility that you fund it later as part of the counterfactual (see my note above about Appendix Figure D20).
Some things to keep in mind:
Impacts might change over time (e.g., a policy stops mattering in 50 years even if it’s in place). If you think, e.g., transformative AI will upend everything, that might be what you need to think about here in terms of how long this policy change matters.
I’m looking at whether this policy or a version of it will be in place. It’s possible policies will be substituted for in some way in ways that make things wash out somewhat. (For instance, we don’t pass one animal welfare policy, but we pass some policy to shrink the farming sector.) I think this effect is small given the lack of differences by policy topic—this should be much more of an issue for some topics than others—but see the next point.
There are some hints of less persistence for policies where there’s more room for negotiation/more ways to dial it up and down. See my reply to Erich Grunewald lower down—for taxes and Congressional legislation, it seems like the effect on whether some possibly weaker version of the policy eventually passes might wash out.
Great post!
I think this overstates the effects of policy interventions. In cost-effectiveness analyses of policy interventions, the five years you are referring to usually respect a period over which a certain amount of annual counterfactual benefits apply. In this type of models, the choice of the length of the period and annual counterfactual benefits are not independent[1]. The longer the period, the lower the annual counterfactual benefits. I suspect EA organisations may often be choosing a shorter period to account for the caveats you commented about:
Ideally, one would model the effect decaying over time, instead of being constant for a certain period, and then dropping to 0.
Yes, it’s a good point that benefits and length of the period are not independent, and I agree with the footnote too.
I would note that the factors I mentioned there don’t seem like they should change things that much for most issues. I could see using 50-100 years rather than, e.g., 150 years as my results would seem to suggest, but I do think 5-10 years is an order of magnitude off.
Could you elaborate on why you think multiplying your results by a factor of 0.5 would be enough? Do you think it would be possible to study the question with empirical data, by looking not only into how much time the policy changes persisted counterfactually, but also into the target outcomes (e.g. number of caged hens for policy around animal welfare standards)? I am guessing this would be much harder, but that there are some questions in this vicinity one could try to answer more empirically to get a sense of how much the persistence estimates you got have to be adjusted downwards.
Just to play devil’s advocate (without harmful intentions :-), what are the largest limitations or disclaimers that we should keep in mind regarding your results or methods?
See my reply to Neil Dullaghan—I think that gives somewhat of a sense here. Some other things:
I don’t have a ton of observations on any one specific policy, so I can’t say much about whether some special policy area (e.g., pollution regulation) exhibits a different pattern.
I look at whether this policy, or a version of it, is in place. This should capture anything that would be a direct and obvious substitute, but there might be looser substitutes that end up passing if you fail to pass an initial policy. The evidence I do have on this suggests it’s small, but I still wonder about it.
My method is about close votes. I try to think about what it means for things that are less close, and I think it basically generalizes, but it gets tricky to think about the impact of, e.g., funding a campaign to move a policy from being unpopular and neglected to popular and on the ballot.
I am really really surprised 5 years is the typical assumption. My conservative guess would have been ~30 years persistence on average for a “referendum-sized” policy change.
Related, I’m surprised this paper is a big update for some people. I suppose that attests to the power of empirical work, however uncertain, for illuminating the discussion on big picture questions.
P.S. I imagine you’re too busy to respond, but I’d be curious to hear if these findings surprised you / what updates you made as a result
I didn’t write down a prior. I think if I had, it would have been less persistence. I think I would have guessed five years was an underestimate. (I think probably many people making that assumption would also have guessed it was an underestimate but were airing on the side of conservatism.)
Looks great! Good luck on the market Zach!
Nice work. Do you have an intuitions about whether the same patterns also apply to federal regulations in the US?
A few things:
I do find these patterns when I look at a few different types of policies (referendums, legislation, state vs. Congress, U.S. vs. international), so there’s some reason to think it’s not just state referendums.
There’s a paper on the repeals of executive orders that finds an even lower rate of repeals there, but that doesn’t tell us the counterfactual (i.e., would someone else have done this if the president in question did not).
There’s suggestive evidence that when policies are more negotiable, there’s less persistence. In my narrative/case study look at Congress, I find that failed policies often pass in a weaker form later on. There’s a similar result for taxes.
So my guess would be there is a somewhat qualitatively similar pattern here, but there’s probably a somewhat greater chance of winding up with a weaker form of the regulation you want later on than there is for failed referendum policies.
Sorry if I missed this in your post, but how many policies did you analyse that were passed via referendum vs. by legislation? How many at the state level vs. federal US vs. international?
Very interesting.
1. Did you notice an effect of how large/ambitious the ballot initiative was? I remember previous research suggesting consecutive piecemeal initiatives were more successful at creating larger change than singular large ballot initiatives.
2. Do you know how much the results vary by state?
3. How different do ballot initiatives need to be for the huge first advocacy effect to take place? Does this work as long as the policies are not identical or is it more of a cause specific function or something in between? Does it have a smooth gradient or is it discontinuous after some tipping point?
I look at some things you might find relevant here. I try to measure the scale of the impact of a referendum. I do this two ways. I have just a subjective judgment on a five-point scale, and then I also look at predictions of the referendum’s fiscal impact from the secretary of state. Neither one is predictive. I also look at how many people would be directly affected by a referendum and how much news coverage there was before the election cycle. These predict less persistence.
This is something I plan to do more, but they can’t vary that much because when I look at variables that vary across states (e.g., requirements to get on the ballot), I don’t see much of a difference.
I’m not totally sure what your question is, but I think you might be interpreting my results as saying that close referendums are especially persistent. I’m only focusing on close referendums because it’s a natural experiment—I’m not saying there’s something special about them otherwise. I’m just estimating the effect of passing a marginal referendum on whether the policy is in place later on. I can try to think about whether this holds for things that are not close by looking at states with supermajority requirements or by looking at legislation, and it looks like things are similar when they’re not as close.
Interesting. Are there any examples of what we might consider a relatively small policy changes that received huge amounts of coverage? Like for something people normally wouldn’t care about. Maybe these would be informative to look at compared to more hot button issues like abortion that tend to get a lot of coverage. I’m also curious if any big issues somehow got less attention than expected and how this looks for pass/fail margins compared to other states where they got more attention. There are probably some ways to estimate this that are better than others.
I see.
I was interpreting it as “a referendum increases the likelihood of the policy existing later.” My question is about the assumptions that lead to this view and the idea that it might be more effective to run a campaign for a policy ballot initiative once and never again. Is this estimate of the referendum effect only for the exact same policy (maybe an education tax but the percent is slightly higher or lower) or similar policies (a fee or a subsidy or voucher or something even more different)? How similar do they have to be? What is the most different policy that existed later that you think would still count?
Does this study tell us much about the counterfactual advancement of policies that pass the threshold by more significant margins, like a few percentage points or even double digit percentage points? Presumably those are more popular, so more likely to be passed eventually anyway. Some might still be popular but neglected because they aren’t high priorities in politics, though, e.g. animal welfare.
[Edited to add the second sentence of the paragraph beginning, “Putting these together.”]
The primary result doesn’t speak to this, but secondary results can shed some light on it. Overall, I’d guess persistence is a touch less for policies with much more support, but note that the effect of proposing a policy on later policy is likely much larger than the effect of passing a policy conditional on its having been proposed.
The first thing to note is that there are really two questions here we might want to ask:
What is the effect of passing a policy change, conditional on its having been proposed, when its support is not marginal?
What is the effect of proposing a policy change when its support is not marginal?
I’ll speak to (1) first.
The main thing we can do is look at states that require a supermajority to pass a referendum in Appendix Figure E15. This does not directly answer (1) because, while it allows us to look at referendums whose support is well above 50%, it is looking at cases where you need more than 50% to revisit the referendum. Nevertheless, it gives us some information. First, things look similar for the most part. Second, it looks like maybe there’s a higher chance that supermajority referendums pass later on, especially in the first decade, though it’s very noisy statistically. Third, repeal is slightly less common, though this is again noisy and also confounded with the higher difficulty of repealing one of these.
In the latest version of the paper, I include a simulation (Section 5.1) that allows me to simulate some relevant experiments, though these are currently not in the paper. In my simulation, I can simulate the effect of passing an initiative (a referendum by petition) with varying levels of support. It is generally the case that, for policies that get proposed and have support above 55%, persistence is about 25% smaller at 100 years for the reason you give: these policies are more likely to pass eventually. (I do this by simulating a world where, holding voter support constant, I randomly assign policies to be passed or not.)
Putting these together, I think it would be reasonable to think either that the effect of passage is similar for policies with widespread support, or that it is somewhat smaller. You can also look at the discussion of state legislation in section 6, which does not rely on close votes (though plausibly selects for things being marginal by focusing on adoptions of policies by states where similar states lack those policies).
Turning to (2), we should expect the effect of proposing a policy on whether that policy is in effect later on to be much larger than the effect of passing a policy conditional on its being proposed. Appendix Figure E20 (formerly D20) is one attempt to get at this and suggests the effect of successfully proposing a policy is ~50% larger than the effect of passing a proposed one. One could also imagine simulating this—but that exercise requires some unclear assumptions, so I’m inclined to go with Appendix Figure E20 here.*
One underlying theme in all of this is that the people who propose policies are very much in the driver’s seat. Persistence largely appears to be a result of the fact that small numbers of people can set policies based on whether they decide to pursue policy changes or not.
*One could also imagine simulating this, but the problem there is that the vast majority of policies one could conceive of probably have approximately nobody who cares about them (e.g., minor tweaks to the language of some law, declaring that pistachio is the best ice cream flavor and offering an infinitesimal subsidy for it). My calibration has the policies that get proposed as being in the tail of the distribution in terms of how much people care about them. As a result, if we look at policies that don’t get proposed, basically nobody would ever bother trying to repeal or revisit them.
Nice paper!
This is probably me being stupid, but I’m not sure I understand this. Are you saying more neglected areas are those that would typically see less interest after a referendum and, according to your findings, are therefore those that have more persistent effects? So the takeaway is to focus on more neglected policy areas?
If that’s the correct interpretation, can we reasonably be sure that more neglectedness = less interest after a referendum? Presumably if there is a referendum, at that point the policy area may no longer be that neglected?
I interpreted this in the same way as you. Presumably the referendum will lead to a short-term uptick in popular interest, but maybe our model could be that there is some baseline ‘interestingness’ of an issue that public interest reverts to soon after the referendum? It perhaps depends on what the process for getting something on a referendum is. If it is hard and requires very many people to already endorse the proposed policy, then almost by definition referendums aren’t very neglected. But if a few actors can get a referendum, then the topic may still be quite neglected.
Yeah, Jack, I think you’re capturing my thinking here (which is an informal point for this audience rather than something formal in the paper). I look at measures of how much people were interested in a policy well before the referendum or how much we should expect them to be interested after the referendum. It looks like both of these predict less persistence. So the thought is that things that generally are less salient when not on the ballot are more persistent.
This was fascinating and I can’t help but think “surely someone must have done this before?”. But it seems not! Not sure if you have time to comment and it is not important for me, I am just curious: Would this not be easy to publish? I am asking as I do not see evidence that this was accepted into a journal. I would imagine this paper being cited frequently due to its novelty and usefulness (not to speak of the credentials of the institution you belong to!). Great work!
Easy Q to answer so doesn’t take much time! In economics, the norm is not to publish your job market paper until after the market for various reasons. (That way, you don’t take time away from improving the paper, and the department that hires you gets credit.)
We will see before long how it publishes!
Interesting post, thank you!
Your analysis on Congress/National level legislation is particularly telling as it seems to show that if you can survive the next political cycle/election, then the probability of the legislation remaining in place becomes almost static between 40-60% (apart from a drop at the 20-35 year mark).
This fits anecdotally with my experience and aligns with the reality that actually changing legislation can be incredibly tricky, especially on policies that are controversial within a political party. See the U.K.s 0.7% international aid legislation as an example.
Does your research look at any potential practical predictors (beyond neglectfulness) of a policy sticking for longer (such as complexity of language, cross-party support, integration in larger bill)? I’m out so haven’t read the full paper, but do just point me to that if it’s in there and I’ll look later!
I do look at predictors a bit—though note that it’s not about what makes it harder to repeal but rather about what makes a policy change/choice influential decades later.
The main takeaway is there aren’t many predictors—the effect is remarkably uniform. I can’t look at things around the structure of the law (e.g., integration in a larger bill), but I’d be surprised if something like complexity of language or cross-party support made a difference in what I’m looking at.