A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform
Epistemic status: Dwelling on the negatives.
Hello AstralCodexTen readers! You might also enjoy this series on estimating value, or my forecasting newsletter. And for the estimation language used throughout this post, see Squiggle.—Nuño.
Summary
From 2013 to 2021, Open Philanthropy donated $200M to criminal justice reform. My best guess is that, from a utilitarian perspective, this was likely suboptimal. In particular, I am fairly sure that it was possible to realize sooner that the area was unpromising and act on that earlier on.
In this post, I first present the background for Open Philanthropy’s grants on criminal justice reform, and the abstract case for considering it a priority. I then estimate that criminal justice grants were distinctly worse than other grants in the global health and development portfolio, such as those to GiveDirectly or AMF.
I speculate about why Open Philanthropy donated to criminal justice in the first place, and why it continued donating. I end up uncertain about to what extent this was a sincere play based on considerations around the value of information and learning, and to what extent it was determined by other factors, such as the idiosyncratic preferences of Open Philanthropy’s funders, human fallibility and slowness, paying too much to avoid social awkwardness, “worldview diversification” being an imperfect framework imperfectly applied, or it being tricky to maintain a balance between conventional morality and expected utility maximization. In short, I started out being skeptical that a utilitarian, left alone, spontaneously starts exploring criminal justice reform in the US as a cause area, and to some degree I still think that upon further investigation, though I still have significant uncertainty.
I then outline my updates about Open Philanthropy. Personally, I updated downwards on Open Philanthropy’s decision speed, rationality and degree of openness, from an initially very high starting point. I also provide a shallow analysis of Open Philanthropy’s worldview diversification strategy and suggest that they move to a model where regular rebalancing roughly equalizes the marginal expected values for the grants in each cause area. Open Philanthropy is doing that for its global health and development portfolio anyways.
Lastly, I brainstorm some mechanisms which could have accelerated and improved Open Philanthropy’s decision-making and suggest red teams and monetary bets or prediction markets as potential avenues of investigation.
Throughout this piece, my focus is aimed at thinking clearly and expressing myself clearly. I understand that this might come across as impolite or unduly harsh. However, I think that providing uncertain and perhaps flawed criticism is still worth it, in expectation. I would like to note that I still respect Open Philanthropy and think that it’s one of the best philanthropic organizations around.
Open Philanthropy staff reviewed this post prior to publication.
Index
Background information
What is the case for Criminal Justice Reform?
What is the cost-effectiveness of criminal justice grants?
Why did Open Philanthropy donate to criminal justice in the first place?
Why did Philanthropy keep donating to criminal justice?
What conclusions can we reach from this?
Systems that could have optimized Open Philanthropy’s impact
Conclusion
Background information
From 2013 to 2021, Open Philanthropy distributed $199,574,123 to criminal justice reform [0]. In 2015, they hired Chloe Cockburn as a program officer, following a “stretch goal” for the year. They elaborated on their method and reasoning on The Process of Hiring our First Cause-Specific Program Officer.
In that blog post, they described their expansion into the criminal justice reform space as substantially a “bet on Chloe”. Overall, the post was very positive about Chloe (more on red teams below). But the post expressed some reservations because “Chloe has a generally different profile from the sorts of people GiveWell has hired in the past. In particular, she is probably less quantitatively inclined than most employees at GiveWell. This isn’t surprising or concerning—most GiveWell employees are Research Analysts, and we see the Program Officer role as calling for a different set of abilities. That said, it’s possible that different reasoning styles will lead to disagreement at times. We think of this as only a minor concern.” In hindsight, it seems plausible to me that this relative lack of quantitative inclination played a role in Open Philanthropy making comparatively suboptimal grants in the criminal justice space [1].
In mid-2019, Open Philanthropy published a blog post titled GiveWell’s Top Charities Are (Increasingly) Hard to Beat. It explained that, with GiveWell’s expansion into researching more areas, Open Philanthropy expected that there would be enough room for more funding for charities that were as good as GiveWell’s top charities. Thus, causes like Criminal Justice Reform looked less promising.
In the months following that blog post, Open Philanthropy donations to Criminal Justice reform spike, with multi-million, multi-year grants going to Impact Justice ($4M), Alliance for Safety and Justice ($10M), National Council for Incarcerated and Formerly Incarcerated Women and Girls ($2.25M), Essie Justice Group ($3M), Texas Organizing Project ($4.2M), Color Of Change Education Fund ($2.5M) and The Justice Collaborative ($7.8M).
Initially, I thought that might be because of an expectation of winding down. However, other Open Philanthropy cause areas also show a similar pattern of going up in 2019, perhaps at the expense of spending on Global Health and Development for that year:
In 2021, Open Philanthropy spun out its Criminal Justice Reform department as a new organization: Just Impact. Open Philanthropy seeded Just Impact with $50M. Their parting blog post explains their thinking: that Global Health and Development interventions have significantly better cost-effectiveness.
What is the case for Criminal Justice Reform?
Note: This section briefly reviews my own understanding of this area. For a more canonical source, see Open Philanthropy’s strategy document on criminal justice reform.
There are around 2M people in US prisons and jails. Some are highly dangerous, but a glance at a map of prison population rates per 100k people suggests that the US incarcerates a significantly larger share of its population than most other countries:
Outlining a positive vision for reform is still an area of active work. Still, a first approximation might be as follows:
Criminals should be punished in proportion to an estimate of the harm they have caused, times a factor to account for a less than 100% chance of getting caught, to ensure that crimes are not worth it in expectation. This is in opposition to otherwise disproportionate jail sentences caused by pressures on politicians to appear tough on crime. In addition, criminals then work to provide restitution to the victim, if the victim so desires, per some restorative justice framework [2].
In a best-case scenario, criminal justice reform could achieve somewhere between a 25% reduction in incarceration in the short-term and a 75% reduction in the longer term, bringing the incarceration rate down to only twice that of Spain [4], while maintaining the crime rate constant. Say that $2B to $20B, or 10x to 100x the amount that Open Philanthropy has already spent, would have a 1 to 10% chance of succeeding at that goal [5].
What is the cost-effectiveness of criminal justice grants?
Estimation strategy
In this section, I come up with some estimates of the impact of criminal justice reform, and compare them with some estimates of the impact of GiveWell-style global health and development interventions.
Throughout, I am making the following modelling choices:
I am primarily looking at the impact of systemic change
I am looking at the first-order impacts
I am using subjective estimates
I am primarily looking at the impact of systemic change because many of the largest Open Philanthropy donations were aiming for systemic change, and their individual cost-effectiveness was extremely hard to estimate. For completeness, I do estimate the impacts of a standout intervention as well.
I am looking at the first-order impacts on prisoners and GiveWell recipients, rather than at the effects on their communities. My strong guess is that the story the second-order impacts would tell—e.g., harms to the community from death or reduced earnings in the case of malaria, harms from absence and reduced earnings in the case of imprisonment)—wouldn’t change the relative values of the two cause areas.
After presenting my estimates, I discuss their limitations.
Simple model for systemic change
Using those what I consider to be optimistic assumptions over first-order effects, I come up with the following Squiggle model:
initialPrisonPopulation = 1.5M to 2.5M # Data for 2022 prison population has not yet been published, though this estimate is perhaps too wide.
reductionInPrisonPopulation = 0.25 to 0.75
badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
counterfactualAccelerationInYears = 5 to 50
probabilityOfSuccess = 0.01 to 0.1 # 1% to 10%.
counterfactualImpactOfGrant = 0.5 to 1 ## other funders, labor cost of activism
estimateQALYs = initialPrisonPopulation * reductionInPrisonPopulation * badnessOfPrisonInQALYs * counterfactualAccelerationInYears * probabilityOfSuccess * counterfactualImpactOfGrant
cost = 2B to 20B
costPerQALY = cost / estimateQALYs
costPerQALY
That model produces the following distribution:
This model estimates that criminal justice reform buys one QALY [6](quality-adjusted life year) for $76k, on average. But the model is very uncertain, and its 90% confidence interval is $1.3k to ~$290k per QALY. It assigns a 50% chance to it costing less than ~$19k. For a calculation that instead looks at more marginal impact, see here.
EDIT 22/06/2022: Commenters pointed out that the mean of cost / estimateQALYs
in the chart above isn’t the right quantity to look at in the chart above.mean(cost)/mean(estimateQALYs)
is probably a better representation of “expected cost per QALY. That quantity is $8160/QALY for the above model. If one looks at 1/mean(estimateQALYs/cost)
, this is $5k per QALY. Overall I would instead recommend looking at the 90% confidence intervals, rather at the means. See this comment thread for discussion. I’ve added notes below each model.
Simple model for a standout criminal justice reform intervention
Some grants in criminal justice reform might beat systemic reform. I think this might be the case for closing Rikers, bail reform, and prosecutorial accountability:
Rikers is a large and particularly bad prison.
Bail reform seems like a well-defined objective that could affect many people at once.
Prosecutorial accountability could get a large multiplier over systemic reform by focusing on the prosecutors in districts that hold very large prison populations.
For instance, for the case of Rikers, I can estimate:
initialPrisonPopulation = 5000 to 10000
reductionInPrisonPopulation = 0.25 to 0.75
badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
counterfactualAccelerationInYears = 5 to 20
probabilityOfSuccess = 0.07 to 0.5
counterfactualImpactOfGrant = 0.5 to 1 ## other funders, labor cost of activism
estimatedImpactInQALYs = initialPrisonPopulation * reductionInPrisonPopulation * badnessOfPrisonInQALYs * counterfactualAccelerationInYears * probabilityOfSuccess * counterfactualImpactOfGrant
cost = 5000000 to 15000000
costPerQALY = cost / estimatedImpactInQALYs
costPerQALY
Simple model for GiveWell charities
Against Malaria Foundation
Using a similar estimation for the Against Malaria Foundation:
costPerLife = 3k to 10k
lifeDuration = 30 to 70
qalysPerYear = 0.2 to 1 ## feeling unsure about this.
valueOfSavedLife = lifeDuration * qalysPerYear
costEffectiveness = costPerLife/valueOfSavedLife
costEffectiveness
Its 90% confidence interval is $90 to ~$800 per QALY, and I likewise validated this with Simple Squiggle. Notice that this interval is disjoint with the estimate for criminal justice reform of $1.3k to $290k.
GiveDirectly
One might argue that AMF is too strict a comparison and that one should instead compare criminal justice reform to the marginal global health and development grant. Recently, my colleague Sam Nolan quantified the uncertainty in GiveDirectly’s estimate of impact. He arrived at a final estimate of ~$120 to ~$960 per doubling of consumption for one year.
The conversion between a doubling of consumption and a QALY is open to some uncertainty. For instance:
GiveWell estimates it about equal based on the different weights given to saving people of different ages—a factor of ~0.8 to 1.3, based on some eye-balling from this spreadsheet.
GiveWell recently updated their weighings to give a DALY (similar to a QALY) a value of around ~2 doublings of income.
Commenters pointed out that few people would trade half their life to double their income, and that for them a conversion factor around 0.2 might be more appropriate. But they are much wealthier than the average GiveDirectly recipient.
Using a final adjustment of 0.2 to 1.3 QALYs per doubling of consumption (which has a mean of 0.6 QALYs/doubling), I come up with the following model an estimate:
costPerDoublingOfConsumption = 118.4 to 963.15
qalysPerDoublingOfConsumption = 0.2 to 1.3
costEffectivenesss=costPerDoublingOfConsumption/qalysPerDoublingOfConsumption
costEffectivenesss
This has a 90% confidence interval between $160 and $2700 per QALY.
Discussion
My estimate for the impact of AMF ($90 to $800 per QALY) does not overlap with my estimate for systemic criminal justice reform ($1.3k to $290k per QALY). I think this is informative, and good news for uncertainty quantification: even though both estimates are very uncertain—they range 2 and 3 orders of magnitude, respectively—we can still tell which one is better.
When comparing GiveDirectly ($160 and $2700 per QALY; mean of $900/QALY) against one standout intervention in the space ($200 to $19K per QALY, with a mean of $5k/QALY), the estimates do overlap, but GiveDirectly is still much better in expectation.
EDIT 22/06/2022. Using the better mean, the above paragraph would be: When comparing GiveDirectly ($160 and $2700 per QALY; mean of $690/QALY) against one standout intervention in the space ($200 to $19K per QALY, with a mean of $837/QALY), the estimates do overlap, but GiveDirectly is still better in expectation.
One limitation of these estimates is that they only model first-order effects. GiveWell does have some estimates of second-order effects (avoiding malaria cases that don’t lead to death, longer-term income increases, etc.) However, for the case of criminal justice interventions, these are harder to estimate. Nonetheless, my strong sense is that the second-order effects of death from malaria or cash transfers are similar to or greater than the second-order effects of temporary imprisonment, and don’t change the relative value of the two cause areas all that much.
Some other sources of model error might be:
QALYs being an inadequate modelling choice: QALYs intuitively have a bound of 1 QALY/year, and might not be the right way to think about certain interventions.
I ignored the cost to the US of keeping someone in prison, as opposed to how that money could have been spent otherwise
I didn’t model the increased productivity of someone outside prison
I didn’t estimate recidivism or increased crime from lower incarceration
I didn’t estimate the cost of pushback, such as lobbying for opposite policies
My estimates of the cost of reform were pretty optimistic.
Of these, I think that not modelling the cost to the US of keeping someone in prison, and not modelling recidivism are one of the weakest aspects of my current model. For a model which tries to incorporate these, see the appendix. So overall, there is likely a degree of model error. But I still think that the small models point to something meaningful.
We can also compare the estimates in this post with other estimates. A lengthy report commissioned by Open Philanthropy on the impacts of incarceration on crime mostly concludes that marginal reduction in crime through more incarceration is non-existent—because the effects of reduced crime while prisoners are in prison are compensated by increased crime when they get out, proportional to the length of their sentence. But the report reasons about short-term effects and marginal changes, e.g., based on RCTs or natural experiments, rather than considering longer-term incentive landscape changes following systemic reform. So for the purposes of judging systemic reform rather than marginal changes, I am inclined to almost completely discount it [7]. That said, my unfamiliarity with the literature is likely one of the main weaknesses of this post.
Open Philanthropy’s own initial casual cause estimations are much more optimistic. In a 2020 interview with Chloe Cockburn, she mentions that Open Philanthropy estimates criminal justice reform to be around 1/4th as valuable as donations to top GiveWell charities, but that she is personally higher based on subjective factors [8].
For illustration, here are a few grants that I don’t think meet the funding bar of being comparable to AMF or GiveDirectly, based on casual browsing of their websites:
The last one struck me as being both particularly bad and relatively easy to evaluate: A letter costs $2.5, about the same as deworming several kids at $0.35 to $0.97 per deworming treatment. But sending a letter intuitively seems significantly less impactful.
Conversely, larger grants, such as, for instance, a $2.5M grant to Color Of Change, are harder to casually evaluate. For example, that particular grant was given to support prosecutorial accountability campaigns and to support Color Of Change’s work with the film Just Mercy. And because the grant was 50% of Color of Change’s budget for one year, I imagine it also subsidized its subsequent activities, such as the campaigns currently featured on its website [10], or the $415k salary of its president [11]. So to the extent that the grant’s funds were used for prosecutorial accountability, they may have been more cost-effective, and to the extent that they were used for other purposes, less so. Overall, I don’t think that estimating the cost-effectiveness of larger grants as the cost-effectiveness of systemic change would be grossly unfair.
Why did Open Philanthropy donate to criminal justice in the first place?
Epistemic status: Speculation.
I will first outline a few different hypotheses about why Open Philanthropy donated to criminal justice, without regard to plausibility:
The Back of the Envelope Calculation Hypothesis
The Value of Information Hypothesis
The Leverage Hypothesis
The Strategic Funder Hypothesis
The Progressive Funders Hypothesis
The “Politics is The Mind Killer” Hypothesis
The Non-Updating Funders Hypothesis
The Moral Tension Hypothesis
I obtained this list by talking to people about my preliminary thoughts when writing this draft. After outlining them, I will discuss which of these I think are most plausible.
The Back of the Envelope Calculation Hypothesis
As highlighted in Open Philanthropy blog posts, early on, it wasn’t clear that GiveWell was going to find as many opportunities as it later did. It was plausible that the bar could have gone down with time. If so, and if one has a rosier outlook on the tractability and value of criminal justice reform, it could plausibly have been competitive with other areas.
For instance, per Open Philanthropy’s estimations:
Each grant is subject to a cost-effectiveness calculation based on the following formula:
Number of years averted x $50,000 for prison or $100,000 for jail [our valuation of a year of incarceration averted] / 100 [we aim to achieve at least 100x return on investment, and ideally much more] - discounts for causation and implementation uncertainty and multiple attribution of credit > $ grant amount. Not all grants are susceptible to this type of calculation, but we apply it when feasible.
That is, Open Philanthropy’s lower bound for funding criminal justice reform was $500 to $1,000 per year of prison/jail avoided. Per this lower bound, criminal justice reform would be roughly as cost-effective as GiveDirectly. But this bound is much more optimistic than my estimates of the cost-effectiveness of criminal justice reform grants above.
The Value of Information Hypothesis
In 2015, when Open Philanthropy hadn’t invested as much into criminal justice reform, it might have been plausible that relatively little investment might have led to systematic reform. It might have also been plausible that, if found promising, an order of magnitude more funding could have been directed to the cause area.
Commenters in a draft pointed out a second type of information gain: Open Philanthropy might gain experience in grantmaking, learn information, and acquire expertise that would be valuable for other types of giving. In the case of criminal justice reform, I would guess that the specific cause officers—rather than Open Philanthropy as an institution—would gain most of the information. I would also guess that the lessons learnt haven’t generalized to, for instance, pandemic prevention funding advocacy. So my best guess is that the information gained would not make this cause worth it if it otherwise would not have been. But I am uncertain about this.
The Leverage Hypothesis
Even if systemic change itself is not cost-effective, criminal justice reform and adjacent issues attract a large amount of attention anyway. By working in this area, one could gain leverage, for instance:
Leverage over other people’s attention and political will, by investing early in leaders who will be in a position to channel somewhat ephemeral political wills.
Leverage over the grantmaking in the area, by seeding Just Impact
The Strategic Funder Hypothesis
My colleagues raised the hypothesis that Open Philanthropy might have funded criminal justice reform in part because they wanted to look less weird. E.g., “Open Philanthropy/the EA movement has donated to global health, criminal justice reform, preventing pandemics and averting the risks of artificial intelligence” sounds less weird than “...donated to global health, preventing pandemics and averting the risks of artificial intelligence”.
The Progressive Funders Hypothesis
Dustin Moskovitz and Cari Tuna likely have other goals beyond expected utility maximization. Some of these goals might align with the mores of the current left-wing of American society. Or, alternatively, their progressive beliefs might influence and bias their beliefs about what maximizes utility.
On the one hand, I think this would be a mistake. Cause impartiality is one of EA’s major principles, and I think it catalyzes an important part of what we’ve found out about doing good better. But on the other hand, these are not my billions. On the third hand, it seems suboptimal if politically-motivated giving were post-hoc argued to be utility-optimal. If this was the case, I would really have appreciated if their research would have been upfront about this.
The “Politics is The Mind Killer” Hypothesis
In the domain of politics, reasoning degrades, and principal-agent problems arise. And so another way to look at the grants under discussion is that Open Philanthropy flew too close to politics, and was sucked in.
To start, there is a selection effect of people who think an area is the most promising going into it. In addition, there is a principal-agent problem where people working inside a cause area are not really incentivized to look for arguments and evidence that they should be replaced by something better. My sense is that people will tend to give very, very optimistic estimates of impact for their own cause area.
These considerations are general arguments, and they could apply to, for instance, community building or forecasting, with similar force. Though perhaps the warping effects would be stronger for cause areas adjacent to politics.
The Moral Tension Hypothesis
My sense is that Open Philanthropy funders lean a bit more towards conventional morality, whereas philosophical reflection leans more towards expected utility maximization. Managing the tension between these two approaches seems pretty hard, and it shouldn’t be particularly surprising that a few mistakes were made from a utilitarian perspective.
Discussion
In conversation with Open Philanthropy staff, they mentioned that the first three hypotheses —Back of the Envelope, Value of Information and Leverage—sounded most true to them. In conversation with a few other people, mostly longtermists, some thought that the Strategic Funders and the Progressive Funders hypothesis were more likely.
I would make a distinction between what the people who made the decision were thinking at the time, and the selection effects that chose those people. And so, I would think that early on, Open Philanthropy leadership mainly was thinking about back-of-the-envelope calculations, value of information, and leverage. But I would also expect them to have done so somewhat constrainedly. And I expect some of the other hypotheses—particularly the “progressive funders hypothesis”, and the “moral tension hypothesis”—to explain those constraints at least a little.
I am left uncertain about whether and to what extent Open Philanthropy was acting sincerely. It could be that criminal justice reform was just a bet that didn’t pay off. But it could also be the case that some factor put the thumb on the scale and greased the choice to invest in criminal justice reform. In the end, Open Philanthropy is probably heterogenous; it seems likely that some people were acting sincerely, and others with a bit of motivated reasoning.
Why did Open Philanthropy keep donating to criminal justice?
Epistemic status: More speculation
The Inertia Hypothesis
Open Philanthropy wrote about GiveWell’s Top Charities Are (Increasingly) Hard to Beat in 2019. They stopped investing in criminal justice reform in 2021, after giving an additional $100M to the cause area. I’m not sure what happened in the meantime.
In a 2016 blog post explaining worldview diversification, Holden Karnofsky writes:
Currently, we tend to invest resources in each cause up to the point where it seems like there are strongly diminishing returns, or the point where it seems the returns are clearly worse than what we could achieve by reallocating the resources—whichever comes first
Under some assumptions explained in that post, namely that the amounts given to each cause area are balanced to ensure that the values of the marginal grants to each area are similar, worldview diversification would be approximately optimal even from an expected value perspective [12]. My impression is that this monitoring and rebalancing did not happen fast enough in the case of criminal justice reform.
Incongruous as it might ring to my ears, it is also possible that optimizing the allocation of an additional $100M might not have been the most valuable thing for Open Philanthropy’s leadership to have been doing. For instance, exploring new areas, convincing or coordinating with additional billionaires or optimizing other parts of Open Philanthropy’s portfolio might have been more valuable.
The Social Harmony Hypothesis
Firing people is hard. When you structured your bet on a cause area as a bet on a specific person, I imagine that resolving that bet as a negative would be awkward [14].
The Soft Landing Hypothesis
Abruptly stopping funding can really be detrimental for a charity. So Open Philanthropy felt the need to give a soft roll-off that lasts a few years. On the one hand, this is understandable. But on the other hand, it seems that Open Philanthropy might have given two soft landings, one of $50M in 2019, and another $50M in 2021 to spin-off Just Impact.
The Chessmaster Hypothesis
There is probably some calculation or some factor that I am missing. There is nothing disallowing Open Philanthropy from making moves based on private information. In particular, see the discussion on information gains above. Information gains are particularly hard for me to estimate from the outside.
What conclusions can we reach from this?
On Open Philanthropy’s Observe–Orient–Decide–Act loops
Open Philanthropy took several years and spent an additional $100M on a cause that they could have known was suboptimal. That feels like too much time.
They also arguably gave two different “golden parachutes” when leaving criminal justice reform. The first, in 2019, gave a number of NGOs in the area generous parting donations. The second, in 2021, gave the outgoing program officers $50 million to continue their work.
This might make similar experimentation—e.g., hiring a program officer for a new cause area, and committing to it only if it goes well—much more expensive. It’s not clear to me that Open Philanthropy would have agreed beforehand to give $100M in “exit grants”.
On Moral Diversification
Open Philanthropy’s donations to criminal justice were part of its global health and development portfolio, and, thus, in theory, not subject to Open Philanthropy’s worldview diversification framework. But in practice, I get the impression that one of the bottlenecks for not noticing sooner that criminal justice reform was likely suboptimal, might have had to do with worldview diversification.
In Technical Updates to Our Global Health and Wellbeing Cause Prioritization Framework, Peter Favaloro and Alexander Berger write:
Overall, having a single “bar” across multiple very different programs and outcome measures is an attractive feature because equalizing marginal returns across different programs is a requirement for optimizing the overall allocation of resources
Prior to 2019, we used a “100x” bar based on the units above, the scalability of direct cash transfers to the global poor, and the roughly 100x ratio of high-income country income to GiveDirectly recipient income. As of 2019, we tentatively switched to thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the top charities recommended by GiveWell
We’re also updating how we measure the DALY burden of a death; our new approach will accord with GiveWell’s moral weights, which value preventing deaths at very young ages differently than implied by a DALY framework. (More)
This post focuses exclusively on how we value different outcomes for humans within Global Health and Wellbeing; when it comes to other outcomes like farm animal welfare or the far future, we practice worldview diversification instead of trying to have a single unified framework for cost-effectiveness analysis. We think it’s an open question whether we should have more internal “worldviews” that are diversified over within the broad Global Health and Wellbeing remit (vs everything being slotted into a unified framework as in this post).
Speaking about Open Philanthropy’s portfolio rather than about criminal justice, instead of strict worldview diversification, one could compare these different cause areas as best one can, strive to figure out better comparisons, and set the marginal impact of grants in each area to be roughly equal. This would better approximate expected value maximization, and it is in fact not too dissimilar to (part of the) the original reasoning for worldview diversification. As explained in the original post, worldview diversification makes the most sense in some contexts and under some assumptions: diminishing returns to each cause, and similar marginal values to more funding.
But somehow, I get the weak impression that worldview diversification (partially) started as an approximation to expected value, and ended up being more of a peace pact between different cause areas. This peace pact disincentivizes comparisons between giving in different cause areas, which then leads to getting their marginal values out of sync.
Instead, I would like to see:
further analysis of alternatives to moral diversification,
more frequent monitoring of whether the assumptions behind moral diversification still make sense,
and a more regular rebalancing of the proportion of funds assigned to each cause according to the value of their marginal grants [13].
On Open Philanthropy’s Openness
After a shallow investigation and reading a few of its public writings, I’m still unsure why exactly Open Philanthropy invested a relatively large amount into this cause area. My impression is that there are some critical details about this that they have not yet written about publicly.
Open Philanthropy’s Rationality
I used to implicitly model Open Philanthropy as a highly intelligent unified agent to which I should likely defer. I now get the impression that there might be a fair amount of politicking, internal division, and some suboptimal decision-making.
I think that this update was larger for me than it might be for others, perhaps because I initially thought very highly of Open Philanthropy. So others who started from a more moderate starting point should make a more minor update, if any.
I still believe that Open Philanthropy is likely one of the best organizations working in the philanthropic space.
Systems that could improve Open Philanthropy’s decision-making
While writing this piece, the uncomfortable thought struck me that if someone had realized in 2017 that criminal justice was suboptimal, it might have been difficult for them to point this out in a way which Open Philanthropy would have found useful. I’m also not sure people would have been actively incentivized to do so.
Once the question is posed, it doesn’t seem hard to design systems that incentivize people to bring potential mistakes to Open Philanthropy’s attention. Below, I consider two options, and I invite commenters to suggest more.
Red teaming
When investing substantial amounts in a new cause area, putting a large monetary bounty on red teams seems a particularly cheap intervention. For instance, one could put a prize on the best red teaming, and a larger bounty on a red teaming output, leading to a change in plans. The recent Criticism Contest is a one-off example which could in theory address Open Philanthropy.
Forecasting systems
Per this recent writeup, Open Philanthropy has predictions made and graded by each cause’s officer, who average about 1 prediction per $1 million moved. The focus of their prediction setup seems to be on learning from past predictions, rather than on using prediction setups to inform decisions before they are made. And it seems like staff tend to make predictions on individual grants, rather than on strategic decisions.
This echoes the findings of a previous report on Prediction Markets in the Corporate Setting: organizations are hesitant to use prediction setups in situations where this would change their most important decisions, or where this would lead to social friction. But this greatly reduces the usefulness of predictions. And in fact, we do know that Open Philanthropy’s prediction setup failed to avoid the pitfalls outlined in this post.
Instead, have a forecasting system which is not restricted to Open Philanthropy staff, which has real-money bets, and which a focuses on using predictions to change decisions, rather than on learning after the fact. Such a system would ask things such as:
whether a key belief underlying the favourable assessment of a grant will later be estimated to be false
whether Open Philanthropy will regret having donated a given grant, or
whether Open Philanthropy will regret some strategic decision, such as going into a cause area, or having set-up such-and-such disbursement schedule,
These questions might be operationalized as:
“In year [x], what probability will [some mechanism] assign to [some belief]?”
“In year [x], what will Open Philanthropy’s best estimate of the value for grant [y] be?” + “In year [x], what will be Open Philanthropy’s bar for funding be?”.
Or, even simpler still, asking directly or “in year [x], will Open Philanthropy regret having made grant [y]?”,
“in year [x], will Open Philanthropy regret having made decision [y]?”,
There would be a few challenges in creating such a forecasting system in a way that would be useful to Open Philanthropy:
It would be difficult to organize this at scale.
If open to the public, and if Open Philanthropy was listening to them, it might be easy and desirable to manipulate them.
If structured as a prediction market, it might not be worth it to participate unless the market also yielded interest.
If Open Philanthropy had enough bandwidth to create a forecasting system, it would also have been capable of monitoring the criminal justice reform situation more closely (?)
It would be operationally or legally complex
Prediction markets are mostly illegal in the US
In 2018, the best way to structure this may have been as follows: Open Philanthropy decides on a probability and a metric of success and offers a trusted set of advisors to bet against the metric being satisfied. Note that the metric can be fuzzy, e.g., “Open Phil employee X will estimate this grant to have been worth it”.
With time, advisors who can predict how Open Philanthropy will change its mind would acquire more money and thus more independent influence in the world. This isn’t bullet-proof—for instance, advisors would have an incentive to make Open Philanthropy be wrong so that they can bet against them—but it’d be a good start.
Note that the pathway to impact of making monetary bets wouldn’t only be to change Open Philanthropy’s decisions—which past analysis suggests would be difficult—but also to transfer wealth to altruistic actors that have better models of the world.
In July 2022, there still aren’t great forecasting systems that could deal with this problem. The closest might be Manifold Markets, which allows for the fast creation of different markets and the transfer of funds to charities, which gives some monetary value to their tokens. In any case, because setting up such a system might be laborious, one could instead just offer to set such a system up only upon request.
I am also excited about a few projects that will provide possibly scalable prediction markets, which are set to launch in the next few months and could be used for that purpose. My forecasting newsletter will have announcements when these projects launch.
Conclusion
Open Philanthropy spent $200M on criminal justice reform, $100M of which came after their own estimates concluded that it wasn’t as effective as other global health and development interventions. I think Open Philanthropy could have done better.
And I am left confused about why Open Philanthropy did not in fact do better. Part of this may have been their unique approach of worldview diversification. Part of this may have been the political preferences of their funders. And part of this may have been their more optimistic Fermi estimates. I oscillate between thinking “I, a young grasshopper, do not understand”, and “this was clearly suboptimal from the beginning, and obviously so”.
Still, Open Philanthropy did end up parting ways with their criminal justice reform team. Perhaps forecasting systems or red teams would have accelerated their decision-making on this topic.
Acknowledgements
Thanks to Linch Zhang, Max Ra, Damon Pourtahmaseb-Sasi, Sam Nolan, Lawrence Newport, Eli Lifland, Gavin Leech, Alex Lawsen, Hauke Hillebrandt, Ozzie Gooen, Aaron Gertler, Joel Becker and others for their comments and suggestions.
This post is a project by the Quantified Uncertainty Research Institute (QURI). The language used to express probabilities distributions used throughout the post is Squiggle, which is being developed by QURI.
Appendix: Incorporating savings and the cost of recidivism.
Epistemic status: These models are extremely rough, and should be used with caution. A more trustworthy approach would use the share of the prison population by type of crime, the chance of recidivism for each crime, and the cost of new offenses by type. Nonetheless, the general approach might be as follows:
// First section: Same as before
initialPrisonPopulation = 1.8M to 2.5M # Data for 2022 prison population has not yet been published, though this estimate is perhaps too wide.
reductionInPrisonPopulation = 0.25 to 0.75
badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
accelerationInYears = 5 to 50
probabilityOfSuccess = 0.01 to 0.1 # 1% to 10%.
estimateQALYs = initialPrisonPopulation * reductionInPrisonPopulation * badnessOfPrisonInQALYs * accelerationInYears * probabilityOfSuccess
cost = 2B to 20B
costEffectivenessPerQALY = cost / estimateQALYs
// New section: Costs and savings
numPrisonersFreed = initialPrisonPopulation * reductionInPrisonPopulation * accelerationInYears * probabilityOfSuccess
savedCosts = numPrisonersFreed * (14k to 70k)
savedQALYsFromCosts = savedCosts / 50k
probabilityOfRecidivism = 0.3 to 0.7
numIncidentsUntilCaughtAgain = 1 to 10 // uncertain; look at what percentage of different types of crimes are reported and solved.
costPerIncident = 1k to 50k
lostCostsFromRecidivism = numPrisonersFreed * probabilityOfRecidivism * costPerIncident
lostQALYsFromRecidivism = lostCostsFromRecidivism/50k
costPerQALYIncludingCostsAndIncludingRecidivism = truncateLeft(cost / (estimateQALYs + savedQALYsFromCosts - lostQALYsFromRecidivism), 0)
// ^ truncateLeft needed because division is very numerically unstable.
// Display
// costPerQALYIncludingCostsAndIncludingRecidivism
// ^ increase the number of samples to 10000 and uncomment this line
A review from Open Philanthropy on the impacts of incarceration on crime concludes by saying that “The analysis performed here suggests that it is hard to argue from high-credibility evidence that at typical margins in the US today, decarceration would harm society”. But “high-credibility evidence” does a lot of the heavy lifting: I have a pretty strong prior that incentives matter, and the evidence is weak. In particular, the evidence provided is a) mostly at the margin, and b) mostly using evidence based on short-term change. So I’m slightly convinced that for small changes, the effect in the short term—e.g., within one generation—is small. But if prison sentences are marginally reduced in length or in quantity, I still end up with the impression that crime would marginally rise in the longer term, as crimes become marginally more worth it. Conversely, if sentences are reduced more than in the margin, common sense suggests that crime will increase, as observed in, for instance, San Francisco (note:or not; see this comment and/or this investigation.)
Footnotes
[0]. This number is $138.8 different than the $138.8M given in Open Philanthropy’s website, which is probably not up to date with their grants database.
[1]. Note that this paragraph is written from my perspective doing a postmortem, rather than aiming to summarize what they thought at the time.
[2]. Note that restorative justice is normally suggested as a total replacement for punitive justice. But I think that pushing back punitive justice until it is incentive compatible and then applying restorative justice frameworks would also work, and would encounter less resistance.
[3]. Subjective estimate based on the US having many more guns, a second amendment, a different culture, more of a drug problem.
[4]. Subjective estimate; I think it would take 1-2 orders of magnitude more investment than the already given $2B.
[5]. Note that QALYs refers to a specific construct. This has led people to come up with extensions and new definitions, e.g., the WALY (wellbeing-adjusted), HALY (happiness-adjusted), DALY (disability-adjusted), and SALY (suffering-adjusted) life years. But throughout this post, I’m stretching that definition and mostly thinking about “QALYs as they should have been”.
[6]. Initially, Squiggle was making these calculations using monte-carlo simulations. However, operations multiplying and dividing lognormals can be done analytically. I extracted the functionality to do so into Simple Squiggle, and then helped the main Squiggle branch compute the model analytically.
Simple Squiggle does validate the model as producing an interval of $1.3k to $290k. To check this, feed `1000000000 * (2 to 20) / ((1000000 * 1.5 to 2.5) * 0.25 to 0.75 * 0.2 to 6 * 5 to 50 * 0.01 to 0.1 * 0.5 to 1 )` into it
[7]. To elaborate on this, as far as I understand, to estimate the impact of incarceration, the reports’ best source of evidence are randomized trials or natural experiments, e.g., harsher judges randomly assigned, arbitrary threshold changes resulting from changes in guidelines or policy, etc. But these methods will tend to estimate short-term changes, rather than longer term (e.g., intergenerational) changes.
And I would give substantial weight to lighter sentencing in fact making it more worth it to commit crime. See Lagerros’ Unconscious Economics.
This topic also has very large number of degrees of choice (e.g., see p. 133 on lowballing the cost of murder on account of it being rare), which I am inclined to be suspicious about.
The report has a “devil’s advocate case”. But I think that it could have been much harsher, by incorporating hard-to-estimate long-term incentive changes.
[8]. Excerpt, with some light editing to exclude stutters:
With a lot of hedging and assumptions and guessing, I think that we can show that we were at around 250x, versus GiveWell, which is at more like 1000x [9]. So according to Open Philanthropy, if you’re just like, what’s the place where I can put my dollar that does the most good, you should give to GiveWell, I think.
That said, I would say well, first of all, if you feel that now’s the time, now’s a particular unique and important time to be working on this when there is a lot of traction, that puts a thumb on the scale more towards this. Deworming was very important 10 years ago, will be very important in 10 years. I think that’s different than this issue, where you have this moments where we can actually make a lot of change, where a boost of cash is good.
And then second, that there is a lot that’s not captured in that 250x.
And then third, that 250x is based on the assumption that a year of freedom from prison is worth $50k, and a year of freedom from jail is worth $100k. I think a jail bed gone empty for a year could be worth $250k, for example.
So, I’m telling you this, I don’t say this to normal people, I have no idea what I’m talking about. But for EA folks, I think we’re closer to 1000x than I’ve been able to show thus far. But if you want to be like “I’m helping the most that I can be certain about” yeah, for sure, go give your money to deworming, that’s still probably true.
[9]: “1000x” (resp. 250x) refers to being 1000 times (resp. 250 times) more cost-effective than giving a dollar to someone with $50k of annual income; see here.
[10]. As I was writing this, it featured campaigns calling for common carriers to drop Fox, and for Amazon and Twitch to carry out racial equity audits. But these have since cycled through.
[11]. It rose from $216k in 2016 to $415k in 2019. Honestly I’m not even sure this unjustified; he could probably be a very highly paid political consultant, and a high salary is in fact a strong signal that his funders think that he shouldn’t be.
[12]. This excludes considerations around how much to donate each year.
[13]. A side effect of spinning off Just Impact with a very sizeable initial endowment is that the careers of the Open Philanthropy officers involved appear to continue progressing. Commenters pointed out that this might make it easier to hire talent. But coming from a forecasting background which has some emphasis in proper scoring rules, this seems personally unappealing.
[14]. Technically, according to the shape of the values of their grants and the expected future shape, not just the values of the marginal grant.
I also considered suggesting a ruthless Hunger Games-style fight between the representatives of different cause areas, with the winner getting all the resources regardless of diminishing returns. But I concluded that this was likely not possible in practice, and also that the neartermists would probably be in better shape.
- Winners of the EA Criticism and Red Teaming Contest by 1 Oct 2022 1:50 UTC; 226 points) (
- What I learned from the criticism contest by 1 Oct 2022 13:39 UTC; 170 points) (
- Announcing Squiggle: Early Access by 3 Aug 2022 0:23 UTC; 147 points) (
- Select Challenges with Criticism & Evaluation Around EA by 10 Feb 2023 23:36 UTC; 111 points) (
- Posts from 2022 you thought were valuable (or underrated) by 17 Jan 2023 16:42 UTC; 87 points) (
- Probability distributions of Cost-Effectiveness can be misleading by 18 Jul 2022 17:42 UTC; 70 points) (
- Estimation for sanity checks by 21 Mar 2023 0:13 UTC; 64 points) (
- Monthly Overload of EA—July 2022 by 1 Jul 2022 16:22 UTC; 55 points) (
- Summaries are underrated by 1 Sep 2022 22:34 UTC; 53 points) (
- Announcing Squiggle: Early Access by 3 Aug 2022 19:48 UTC; 51 points) (LessWrong;
- Forecasting Newsletter: June 2022 by 12 Jul 2022 12:35 UTC; 49 points) (
- Evaluating large-scale movement building: A better way to critique Open Philanthropy’s criminal justice reform by 2 Sep 2022 7:24 UTC; 40 points) (
- New Cause area: The Meta-Cause [Cause Exploration Prize] by 11 Aug 2022 17:21 UTC; 30 points) (
- Forecasting Newsletter: July 2022 by 8 Aug 2022 8:03 UTC; 30 points) (
- 2 Dec 2023 10:24 UTC; 10 points) 's comment on Doing Good Effectively is Unusual by (
- 18 Jun 2024 17:10 UTC; 9 points) 's comment on [Linkpost] An update from Good Ventures by (
- 21 Feb 2023 2:29 UTC; 7 points) 's comment on There are no coherence theorems by (
- 28 Sep 2022 19:58 UTC; 7 points) 's comment on Any recommendation for how to explain EA funds grants to friend? by (
- How focused do you think EA is on topics of race and gender equity/justice, human rights, and anti-discrimination? What do you think are factors that shape the community’s focus? by 29 May 2023 9:56 UTC; 3 points) (
- 25 Jul 2022 23:42 UTC; 2 points) 's comment on It’s OK not to go into AI (for students) by (
- 18 Jun 2022 8:47 UTC; 1 point) 's comment on Michael Nielsen’s “Notes on effective altruism” by (
I previously gave a fair bit of feedback to this document. I wanted to quickly give my take on a few things.
Overall, I found the analysis interesting and useful. However, I overall have a somewhat different take than Nuno did.
On OP:
- Aaron Gertler / OP were given a previous version of this that was less carefully worded. To my surprise, he recommended going forward with publishing it, for the sake of community discourse. This surprised me and I’m really thankful.
- This analysis didn’t get me to change my mind much about Open Philanthropy. I thought fairly highly of them before and after, and expect that many others who have been around would think similarly. I think they’re a fair bit away from being an “idealized utilitarian agent” (in part because they explicitly claim not to be), but still much better than most charitable foundations and the like.
On this particular issue:
- My guess is that in the case of criminal justice reform, there were some key facts of the decision-making process that aren’t public and are unlikely to ever be public. It’s very common in large organizations for compromises to be made for various political or social reasons, for example. I’ve previously written a bit about similar things [here](https://twitter.com/ozziegooen/status/1456992079326978052).
- I think Nuno’s quantitative estimates were pretty interesting, but I wouldn’t be too surprised if other smart people would come up with numbers that are fairly different. For those reading this, I’d take the quantitative estimates with a lot of uncertainty.
- My guess is that a “highly intelligent idealized utilitarian agent” probably would have invested a fair bit less in criminal justice reform than OP did, if at all.
On evaluation, more broadly:
- I’ve found OP to be a very intimidating target of critique or evaluation, mainly just because of their position. Many of us are likely to want funding from them in the future (or from people that listen to them), so the risk of getting people at OP upset is very high. From a cost-benefit position, publicly critiquing OP (or other high-status EA organizations) seems pretty risky. This is obviously unfortunate; these groups are often appreciative of feedback, and of course, they are some of the most useful groups to get feedback. (Sometimes prestigious EAs complain about getting too little feedback, I think this is one reason why).
- I really would hate for this post to be taken as “ammunition” by people with agendas against OP. I’m fairly paranoid about this. That wasn’t the point of this piece at all. If future evaluations are mainly used as “ammunition” by “groups with grudges”, then that makes it far more hazardous and costly to publish them. If we want lots of great evaluations, we’ll need an environment that doesn’t weaponize them.
- Similarly to the above point, I prefer these sorts of analysis and the resulting discussions to be fairly dispassionate and rational. When dealing with significant charity decisions I think it’s easy for some people to get emotional. “$200M could have saved X lives!”. But in the scheme of things, there are many decisions like this to make, and there will definitely be large mistakes made. Our main goals should be to learn quickly and continue to improve in our decisions going forward.
- One huge set of missing information is OP’s internal judgements of specific grants. I’m sure they’re very critical now of some groups they’ve previously funded (in all causes, not just criminal justice). However, it would likely be very awkward and unprofessional to actually release this information publicly.
- For many of the reasons mentioned above, I think we can rarely fully trust the public reasons for large actions by large institutions. When a CEO leaves to “spend more time with family”, there’s almost always another good explanation. I think OP is much better than most organizations at being honest, but I’d expect that they still face this issue to an extent. As such, I think we shouldn’t be too surprised when some decisions they make seem strange when evaluating them based on their given public explanations.
I really appreciate this comment; it feels like it’s drawing from models deeper than my own.
It’s interesting that you say that given what is in my eyes a low amount of content in this comment. What is a model or model-extracted part that you liked in this comment?
Some of my models feel like they have a mix of reasonable stuff and wanton speculation, and this comment sort of makes it a bit more clear which of the wanton speculation is more reasonable, and which is more on the deep end.
For instance:
.
.
.
Well this is still confusing to me
Seems obviously true and in fact a continued premise of your post is that there are key facts absent that could explain or fail to explain one decision or the other. Is this particularly true in crminal justice reform? Compared to IDK orgs like AMF (which are hyper transparent by design) maybe, compared to stuff around AI risk I think not.
This is like the same thesis as your post, does not actually convey much information (it is what anyone I assume would have already guessed Ozzie thought).
Yeah I mean, no kidding. But it’s called Open Philanthropy. It’s easy to imagine there exists a niche for a meta-charity with high transaparency and visibility. It also seems clear that Open Philanthropy advertises as a fulfillment of this niche as much as possible and that donors do want this. So when their behavior seems strange in a cause area and the amount of transparency on it is very low, I think this is notable, even if the norm among orgs is to obfuscate internal phenomena. So I don’t rlly endorse any normative takeaway from this point about how orgs usually obfuscate information.
I don’t understand this point. Can you spell it out?
From my perspective, Open Phil’s main legible contribution is a) identifying great donation opportunities, b) recommending Cari Tuna and Dustin Moskovitz to donate to such opportunities, and c) building up an apparatus to do so at scale.
Their donors are specific people, not hypothetical “donors who want transparency.” I assume Open Phil is quite candid/transparent with their actual donors, though of course I don’t have visibility here.
In fairness, the situation is a bit confusing. Open Phil came from GiveWell, which is meant for external donors. In comparison, as Linch mentioned, Open Phil mainly recommends donations just to Good Ventures (Cari Tuna and Dustin Moskovitz). My impression is that OP’s main concern is directly making good grants, not recommending good grants to other funders. Therefore, a large amount of public research is not particularly crucial.
I think the name is probably not quite ideal for this purpose. I think of it more like “Highly Effective Philanthropy”; it seems their comparative advantage / unique attribute is much more their choices of focus and their talent pool, than it is their openness, at this point.
If there is frustration here, it seems like the frustration is a bit more “it would be nice if they could change their name to be more reflective of their current focus”, than “they should change their work to reflect the previous title they chose”.
Sorry I did not realize that OP doesn’t solicit donations from non megadonors. I agree this recontextualizes how we should interpret transparency.
Given the lack of donor diversity, tho, I am confused why their cause areas would be so diverse.
How do you balance your high opinion of OpenPhil with the assumption that there’s information that cannot be made public, and which tips the scale in important decisions? How can you judge OpenPhil’s decisions in this case?
This is almost always the case for large organizations. All CEOs or government officials have a lot of private information that influences their decision making.
This private information does make it much more difficult for external evaluators to evaluate them. However, there’s often still a lot that can be inferred. It’s really important that these evaluators stay humble about their analysis in light of the fact that there’s a lot of private information, but it’s also important that evaluators still try, given the information available.
(Writing from OP’s point of view here.)
We appreciate that Nuño reached out about an earlier draft of this piece and incorporated some of our feedback. Though we disagree with a number of his points, we welcome constructive criticism of our work and hope to see more of it.
We’ve left a few comments below.
*****
The importance of managed exits
We deliberately chose to spin off our CJR grantmaking in a careful, managed way. As a funder, we want to commit to the areas we enter and avoid sudden exits. This approach:
Helps grantees feel comfortable starting and scaling projects. We’ve seen grantees turn down increased funding because they were reluctant to invest in major initiatives; they were concerned that we might suddenly change our priorities and force them to downsize (firing staff, ending projects half-finished, etc.)
Helps us hire excellent program officers. The people we ask to lead our grantmaking often have many other good options. We don’t want a promising candidate to worry that they’ll suddenly lose their job if we stop supporting the program they work on.
Exiting a program requires balancing:
the cost of additional below-the-bar spending during a slow exit;
the risks from a faster exit (difficulty accessing grant opportunities or hiring the best program officers, as well as damage to the field itself).
We launched the CJR program early in our history. At the time, we knew that committing to causes was important, but we had no experience in setting expectations about a program’s longevity or what an exit might look like. When we decided to spin off CJR, we wanted to do so in a way that inspired trust from future grantees and program staff. In the end, we struck what felt to us like an appropriate balance between “slow” and “fast”.[1]
It’s plausible that we could have achieved this trust by investing less money and more time/energy. But at the time, we were struggling to scale our organizational capacity to match our available funding; we decided that other capacity-strained projects were a priority.
*****
Open Phil is not a unitary agent
Running an organization involves making compromises between people with different points of view — especially in the case of Open Phil, which explicitly hires people with different worldviews to work on different causes. This is especially true for cases where an earlier decision has created potential implicit commitments that affect a later decision.
I would avoid trying to model Open Phil (or other organizations) as unitary agents whose actions will match a single utility function. The way we handle one situation may not carry over to other situations.
If this dynamic leads you to put less “trust” in our decisions, we think that’s a good thing! We try to make good decisions and often explain our thinking, but we don’t think others should be assuming that all of our decisions are “correct” (or would match the decisions you would make if you had access to all of the relevant info).
*****
Indeed, part of our reason for seeding Just Impact was that it could go on to raise a lot more money, resulting in a lot of counterfactual impact. That kind of leverage can take funding from below the bar to above it.
*****
This doesn’t accord with our experience. Over six years of working closely with Chloe, we learned a lot about effective funding in policy and advocacy in ways we do expect to accrue to other focus areas. She was also a major factor when we updated our grantmaking process to emphasize the importance of an organization’s leadership for the success of a grant.
It’s possible that we would have learned these lessons otherwise, but given that Chloe was our first program officer, a disproportionate amount of organizational learning came from our early time working with her, and those experiences have informed our practices.
Note that when we launched our programs in South Asian Air Quality and Global Aid Policy, we explicitly stated that we “expect to work in [these areas] for at least five years”. This decision comes from the experience we’ve developed around setting expectations.
So one of the things I’m still confused is about having two spikes in funding, one in 2019 and the other one in 2021, both of which can be interpreted as parting grants:
So OP gave half of the funding to criminal justice reform ($100M out of $200M) after writing GiveWell’s Top Charities Are (Increasingly) Hard to Beat, and this makes me less likely to think about this in terms of exit grant and more in terms of, idk, some sort of nefariousness/shenanigans.
The 2019 ‘spike’ you highlight doesn’t represent higher overall spending — it’s a quirk of how we record grants on the website.
Each program officer has an annual grantmaking “budget”, which rolls over into the next year if it goes unspent. The CJR budget was a consistent ~$25 million/year from 2017 through 2021. If you subtract the Just Impact spin-out at the end of 2021, you’ll see that the total grantmaking over that period matches the total budget.
So why does published grantmaking look higher in 2019?
The reason is that our published grants generally “frontload” payment amounts — if we’re making three payments of $3 million in each of 2019, 2020, and 2021, that will appear as a $9 million grant published in 2019.
In the second half of 2019, the CJR team made a number of large, multi-year grants — but payments in future years still came out of their budget for those years, which is why the published totals look lower in 2020 and 2021 (minus Just Impact). Spending against the CJR budget in 2019 was $24 million — slightly under budget.
So the actual picture here is “CJR’s budget was consistent from 2017-2021 until the spin-out”, not “CJR’s budget spiked in the second half of 2019″.
So this doesn’t really dissolve my curiosity.
In dialog form, because otherwise this would have been a really long paragraph:
NS: I think that the spike in funding in 2019, right after the GiveWell’s Top Charities Are (Increasingly) Hard to Beat blogpost, is suspicious
AG: Ah, but it’s not higher spending. Because of our accounting practices, it’s rather an increase in future funding commitments. So your chart isn’t about “spending” it’s about “locked-in spending commitments”. And in fact, in the next few years, spending-as-recorded goes down because the locked-in-funding is spent.
NS: But why the increase in locked-in funding commitments in 2019. It still seems suspicious, even if marginally less so.
AG: Because we frontload our grants; many of the grants in 2019 were for grantees to use for 2-3 years.
NS: I don’t buy that. I know that many of the grants in 2019 were multi-year (frontloaded), but previous grants in the space were not as frontloaded, or not as frontloaded in that volume. So I think there is still something I’m curious about, even if the mechanistic aspect is more clear to me now.
AG: ¯\_(ツ)_/¯ (I don’t know what you would say here.)
I will push back a bit on this as well. I think it’s very healthy for the community to be skeptical of Open Philanthropy’s reasoning ability, and to be vigilant about trying to point out errors.
On the other hand, I don’t think it’s great if we have a dynamic where the community is skeptical of Open Philanthropy’s intentions. Basically, there’s a big difference between “OP made a mistake because they over/underrated X” and “OP made a mistake because they were politically or PR motivated and intentionally made sub-optimal grants.”
The synthesis position might be something like “some subset of OP made a mistake because they were subconsciously politically or PR motivated and unintentionally made sub-optimal grants.”
I think this is a reasonable candidate hypothesis, and should not be that much of a surprise, all things considered. We’re all human.
FWIW I would be surprised to see you, Linch, make a suboptimal grant out of PR motivation. I think Open Phil is capable of being in a place where it can avoid making noticeably-suboptimal grants due to bad subconscious motivations.
I agree that there’s a difference in the social dynamics of being vigilant about mistakes vs being vigilant about intentions. I agree with your point in the sense that worlds in which the community is skeptical of OP’s intentions tend to have worse social dynamics than worlds in which it isn’t.
But you seem to be implying something beyond that; that people should be less skeptical of OP’s intentions given the evidence we see right now, and/or that people should be more hesitant to express that skepticism. Am I understanding you correctly, and what’s your reasoning here?
My intuition is that a norm against expressing skepticism of orgs’ intentions wouldn’t usefully reduce community skepticism, because community members can just see this norm and infer that there’s probably some private skepticism (just like I update when reading your comment and the tone of the rest of the thread). And without open communication, community members’ level of skepticism will be noisier (for example, Nuño starting out much more trusting and deferential than the EA average before he started looking into this).
I agree with you, but unfortunately I think it’s inevitable that people doubt the intentions of any privately-managed organisation. This is perhaps an argument for more democratic funding (though one could counter-argue about the motivations of democratically chosen representatives).
Did you also think that breadth of cause exploration is important?
It seems that you were conducting shallow and medium-depth investigations since late 2014. So, if there were some suboptimal commitments early on these should have been shown by alternatives that the staff would probably be excited about, since I assume that everyone aims for high impact, given specific expertise.
So, it would depend on the nature of the commitments that earlier decisions created: if these were to create high impact within one’s expertise, then that should be great, even if the expertise is US criminal justice reform, specifically.[1] If multiple such focused individuals exchange perspectives, a set of complementary[2] interventions that covers a wide cause landscape emerges.
If you think that not trusting you is good, because you are liable to certain suboptimal mechanisms established early on, then are you acknowledging that your recommendations are suboptimal? Where would you suggest that impact-focused donors in EA look?
Are you sure that the counterfactual impact is positive, or more positive without your ‘direct oversight?’ For example, it can be that Just Impact donors would have otherwise donated to crime prevention abroad,[3] if another organization influenced them before they learn about Just Impact, which solicits a commitment? Or, it can be that US CJR donors would not have donated to other effective causes were they not first introduced to effective giving by Just Impact. Further, do you think that Just Impact can take less advantage of communication with experts in other OPP cause areas (which could create important leverages) when it is an independent organization?
I appreciate the response here, but would flag that this came off, to me, as a bit mean-spirited.
One specific part:
> If you think that not trusting you is good, because you are liable to certain suboptimal mechanisms established early on, then are you acknowledging that your recommendations are suboptimal? Where would you suggest that impact-focused donors in EA look?
1. He said “less trust”, not “not trust at all”. I took that to mean something like, “don’t place absolute reverence in our public messaging.”
2. I’m sure anyone reasonable would acknowledge that their recommendations are less than optimal.
3. “Where would you suggest that impact-focused donors in EA look” → There’s not one true source that you should only pay attention to. You should probably look at a diversity of sources, including OP’s work.
That makes sense, probably the solution.
A bit of a nit since this is in your appendix, but there are serious issues with this reasoning and the linked evidence. Basically, this requires the claims that:
1. San Francisco reduced sentences
2. There was subsequently more crime
1. Shellenberger at the WSJ writes:
He doesn’t provide a citation, but I’m fairly confident he’s pulling these numbers from this SF Chronicle writeup, which is actually citing a change from 2018-2019 to 2020-2021. So right off the bat Shellenberger is fudging the data.
Second, the aggregated data is misleading because there were specific pandemic-effects in 2020 unrelated to Boudin’s policies. If you look at the DA office’s disaggregated data, there is a drop in filing rate in 2020, but it picks up dramatically in 2021. In fact, the 2021 rate is higher than the 2019 rate both for crime overall, and for the larceny/theft category. So not only is Shellenberger’s claim misleading, it’s entirely incorrect.
You can be skeptical of the DA office’s data, but note that this is the same source used by the SF Chronicle, and thus by Shellenberger as well.
2. Despite popular anecdotes, there’s really no evidence that crime was actually up in San Francisco, or that it occurred as a result of Boudin’s policies.
- Actual reported shoplifting was down from 2019-2020
- Reported shoplifting in adjacent countries was down less than in California as a whole, indicating a lack of “substitution effects” where criminals go where sentences are lighter
- The store closures cited by Shellenberger can’t be pinned on increased crime under Boudin because:
A) Walgreens had already announced a plan to close 200 stores back in 2019
B) Of the 8 stores that closed in 2019 and 2020, at least half closed in 2019, making the 2020 closures unexceptional
C) The 2021 store closure rate for Walgreens is actually much lower than comparable metrics, like the closures of sister company Duane Reader in NYC over the same year, or the dramatic drop in Walgreens stock price. It is also not much higher than the historical average of 3.7 store closures per year in SF.
I have a much more extensive writeup on all of this here:
https://applieddivinitystudies.com/sf-crime-2/
Finally, the problem with the “common sense” reasoning is that it goes both ways. Yes, it seems reasonable to think that less punishment would result in more crime, but we can similarly intuit that spending time in prison and losing access to legal opportunities would result in more crime. Or that having your household’s primary provider incarcerated would lead to more crime. Etc etc. Yes, we are lacking in high quality evidence, but that doesn’t mean we can just pick which priors to put faith in.
Added a note in that sentence of the appendix to point to this comment pending further investigation (which, realistically, I’m not going to do).
Thanks!
Oh wow, I wasn’t expecting the guy to just lie about this.
In general, WSJ reporting on SF crime has been quite bad. In another article they write
Which is just not true at all. Every state has some threshold, and California’s is actually on the “tough on crime” side of the spectrum.
Shellenberger himself is an interesting guy, though not necessarily in a good way.
One context note that doesn’t seem to be reflected here is that in 2014, there was a lot of optimism for a bipartisan political compromise on criminal justice reform in the US. The Koch network of charities and advocacy groups had, to some people’s surprise, begun advocating for it in their conservative-libertarian circles, which in turn motivated Republican participation in negotiations on the hill. My recollection is that Open Phil’s bet on criminal justice reform funding was not just a “bet on Chloe,” but also a bet on tractability: i.e., that a relatively cheap investment could yield a big win on policy because the political conditions were such that only a small nudge might be needed. This seems to have been an important miscalculation in retrospect, as (unless I missed something) a limited-scope compromise bill took until the end of 2018 to get passed.
I’m not aware of any significant other criminal justice legislation that has passed in that time period.[Edit: while this is true at the national level, arguably there has been a lot of progress on CJR at state and local levels since 2014, much of which could probably be traced back to advocacy by groups like those Open Phil funded.]This information strongly supports the “Leverage Hypothesis,” which was cited by Open Phil staff themselves, so I think it ought to be weighted pretty strongly in your updates.
So this is good context. What are your thoughts on why they kept donating?
I don’t have any inside info here, but based on my work with other organizations I think each of your first three hypotheses are plausible, either alone or in combination.
Another consideration I would mention is that it’s just really hard to judge how to interpret advocacy failures over a short time horizon. Given that your first try failed, does that mean the situation is hopeless and you should stop throwing good money after bad? Or does it mean that you meaningfully moved the needle on people’s opinions and the next campaign is now likelier to succeed? It’s not hard for me to imagine that in 2016-17 or so, having seen some intermediate successes that didn’t ultimately result in legislation signed into law, OP staff might have held out genuine hope that victory was still close at hand. Or after the First Step Act was passed in 2018 and signed into law by Trump, maybe they thought they could convert Trump into a more consistent champion on the issue and bring the GOP along with him. Even as late as 2020, when the George Floyd protests broke out, Chloe’s grantmaking recommendations ended up being circulated widely and presumably moved a lot of money; I could imagine there was hope at that time for transformative policy potential. Knowing when to walk away from sustained but not-yet-successful efforts at achieving low-probability, high-impact results, especially when previous attempts have unknown correlations with the probability of future success, is intrinsically a very difficult estimation problem. (Indeed, if someone at QURI could develop a general solution to this, I think that would be a very useful contribution to the discourse!)
I do not believe this explains the funding rationale. If you look at the groups funded (as per my comment), these are not groups interested in bipartisan political compromise. If OP were interested in bipartisan efforts there are surely better and more effective groups to fund in that direction rather than the groups funded here with very particular, and rather strong, political beliefs which cannot in many cases (even charitably) be described as likely to contribute to bipartisan efforts at reform.
Considering the size of these donations and the policy focus of many of these groups, it is useful to look at the campaigns these groups run. The issue is that political association for core EA institutions is not likely to be net positive unless very carefully managed and considered. It is also similarly the case that EA’s should not support policy groups without clear rationale, express aims and an understanding that sponsorship can come with the reasonable assumption from general public, journalists, or future or current members, that EA is endorsing particular political views.
The Color of Change group, to which OP donated 50% of Color of Change’s annual budget (at $2.5 million) for “increasing the salience of prosecutor and bail reform”, describes their work in a variety of mission statements. Some of these are very clearly not EA, such as:
“Achieving meaningful diversity and inclusion behind the scenes in hollywood”
“Ensuring Full and Fair Representation in the 2020 Census”
“Protecting Net Neutrality”
Other mission statements are politically motivated to a degree which is simply unacceptable for a group receiving major funds from an EA org. Under a heading “Right Wing Politics and White Nationalism”—a clear elision of right wing politics with hard-line racism—there appears the mission statement:
“Dismantling right-wing and white nationalist infrastructure/support”
This is a dishonest reading of politics, to be expected only of bad faith interpretations (akin to rendering left-wing politics as synonymous with Stalinism). Worse still than the rhetorical elision of RW and White Nationalism, is the mission itself—which taken at face value—commits the group to dismantling support and infrastructure for right-wing political beliefs. This is not an effective cause, and is—instead—a politically motivated one.
What, then, are the beliefs of the Color of Change? Under the title “Economic Justice”, Color of Change advocates for “Building momentum for progressive tax, labor and education policies”.
Their politics are all the clearer when looking to their current campaigns. These range from racial audits of Amazon, to advocating for the employment of a teacher fired for teaching critical race theory, to letters sent to prosecutors to stop “Anti-Trans Laws”. Indeed, most of the campaigns appear to be letter writing. It is difficult to see how this organisation should be considered for $2.5 million of EA funds.
It should be clear at this point that Color of Change is a left-wing political pressure group designed to reduce right-wing infrastructure and support, and to smear right-wing beliefs (their political opponents) as synonymous with racism. What relevance has this to Effective Altruism?
These political beliefs are common across those funded. Mijente—given $255k in a grant to “support its work on criminal justice reform” describes itself as believing “our people can’t afford 4 more years of despair, fear and growing systematic criminalization. Our plan of attack is to win at the ballot box by mobilizing Latinx voters against Trump.” Regardless of personal opinions on Trump, or his suitability for the office of President, this is not an EA cause area (arguments of x-risk prevention aside as this was apparently for a criminal justice reform grant). Furthermore, as a side note, from the Campaigns section of Mijente’s website, it no longer appears they have an active criminal justice focus despite the $255k OP grant.
Similarly, LatinoJustice, given $500k from OP to encourage “Latinx activists to support criminal justice reform” does not advocate for criminal justice reform but rather states that “The criminal justice system in the U.S. needs not only to be reformed; it needs to be dismantled. It needs to be de-structured. It needs to be decolonized”. This is a political programme, disconnected to the original rationale for the OP grant.
Indeed, OP’s reasoning itself was expressly political for funding of at least one of these groups. The $100k grant to ReFrame was described as a grant for the ReFrame Mentorship, which OP describes as “an intensive training and mentorship program in strategic communications for social justice movement organizers”. This appears to be an expressly political rationale.
This is all to emphasise—not only as the post above does—that these groups were very likely far from effective uses of OP’s grants, but that they were also given expressly to groups with very clear, particular politics. It is difficult to see how any of these groups met the bar of OP funding, but it is all the more concerning that these were essentially a pattern of political campaigning groups, with very particular (but coherent with one another) political beliefs, receiving significant OP funds. This appears more akin to political activism, and less to effective giving. It is surprising and disappointing that OP funds were apparently used in such an ineffective and expressly political direction. The above shifts my estimate strongly towards the Progressive Funders and Mind Killer hypotheses.
“It is also similarly the case that EA’s should not support policy groups without clear rationale, express aims and an understanding that sponsorship can come with the reasonable assumption from general public, journalists, or future or current members, that EA is endorsing particular political views.”
This doesn’t seem right to me—I think anyone who understands EA should explicitly expect more consequentialist grant-makers to be willing to support groups whose political beliefs they might strongly disagree with if they also thought the group was going to take useful action with their funding.
As an observer, I would assume EA funders are just thinking through who has [leverage, influence, is positioned to act in the space, etc.] and putting aside any distaste they might feel for the group’s politics more readily than non-EA funders (e.g. the CJR program also funded conservative groups working on CJR whose views the program director presumably didn’t like or agree with for similar reasons).
“Other mission statements are politically motivated to a degree which is simply unacceptable for a group receiving major funds from an EA org.”
This seems to imply that EA funders should only give funding to groups that pass a certain epistemic purity test or are untouched by political considerations. I think applying EA-like epistemic standards to movement organizations in the US that touch on ~anything political would probably preclude you from funding anything political at all (maybe you’re arguing EA should therefore never fund anything that touches on politics, but that seems likely to be leaving a lot of impact on the table if taken to an extreme).
My guess is that if you looked at grantees in many other OP cause areas, you would see a large spread of opinions expressed by the grantees, many of which don’t follow EA-like epistemic norms. E.g. I understand the FAW grant-making team supports a range of groups who hold views on animal welfare, some of which are ideologically far afield from the internally stated goals of the program. Again, I don’t assume that the OP FAW POs necessarily endorse their views -- I assume they are being funded because the PO believes that those groups are going to do work that is effective, or productively contribute to the FAW movement ecosystem overall (e.g. by playing bad cop to another organization’s good cop with industry).
Hey! Member of the Open Phil grants team here (not officially writing on behalf of OP; just responding based on my experience of how things work here)
I feel that there’s a bit of a misunderstanding here about how our grants process works that I’d like to correct. It seems like the thrust of your argument is that it isn’t effective for us to fund non-EA-aligned organizations, and you list some of our grantees’ activities to support that claim (i.e. Color of Change’s work on diversity in Hollywood). I think you’d have to be making one of two assumptions for this argument to work:
We don’t have the ability to restrict how our grantees use the funds we give them, or
We can restrict our funding to projects aligned with our focus areas, but we can’t properly assess whether an organization’s other work or overall philosophy will spoil/reduce the impact of the work we want to fund.
Responding to the first assumption, we sometimes do restrict how our grantees use our funding, by adding a “purpose restriction” to the grant’s legal documentation. A purpose restriction is exactly what it sounds like—it specifies acceptable uses of our funds and prevents the grantee from spending our money on other projects. This enables us to fund a variety of organizations without having to worry much about the organizations spending lots of our money on things we don’t want to fund.
For example, our $2.5 million grant to Color of Change specifies exactly what we funded them to work on — prosecutorial reform, and advocacy/communication around the film “Just Mercy”. (We’ve also funded film-related projects in other areas, like farm animal welfare and effective altruism.)
While we try our best to restrict our funding to projects we’re excited about, restrictions aren’t perfect and money is fungible in a variety of ways. OP’s Program Officers try to address this, but it’s reasonable to worry that funding an org might encourage some of the work we didn’t fund. However, I think it makes more sense to view this as a tradeoff, not a reason for a blanket policy against funding political organizations. In my limited time at OP, I haven’t seen many instances where an organization we were funding was doing things with other funding that struck me as being actively harmful.
That said, it’s certainly possible that one of our grants has a negative impact that we didn’t intend; this might be inevitable given our hits-based giving model. We try our best to be mindful of these types of tail risks and incorporate them into the expected value assessments we do for our grants and we hope to learn from cases where we really get it wrong.
The second assumption also seems wrong to me. A large part of the job of a Program Officer at any funder is to conduct a comprehensive investigation into potential grantees to determine if they are capable of carrying out the funder’s goals. I don’t have a programmatic role at OP so I can’t comment too much here, other than to say that as far as I can tell OP Program Officers are very good at doing this. This is why I don’t think it’s particularly relevant what Color of Change “believes” as an organization. The relevant question was whether COC would be able to carry out specific CJR work that we would want to fund.
Maybe I’m being a bit uncharitable and ignoring an outside view, like “overtly political organizations are generally unable to implement narrowly designed programs”. That doesn’t seem too plausible to me.
There’s a better argument that says that overtly political organizations inevitably allow their political beliefs to seep into their programmatic work. I don’t know if that’s true, but if it is I think it probably looks more like “An overtly progressive organization trying to reduce the prison population of a large city (a programmatic goal of OP’s) uses some very progressive sounding language in their messaging” than “A progressive organization is too obsessed with ideological purity to accomplish the stated goals of a project”. If the former is true, it’s not clear to me why it’s relevant as long as the project succeeds. If it’s the latter, I think it’s the normal kind of risk to a project’s impact that our POs would be trying to ferret out, though they may not always succeed.
I’m curious as to how you estimate this. ‘reasonable to worry...might encourage some’ is a very mild assessment. In contrast, my intuition is that funging is near 100% for organisations with significant non-restricted funding and a large number of projects. I would expect most CEOs would choose to pursue the activities they are most excited about, and would have few qualms about the somewhat esoteric notion of funging with a grantmaker that they are apparently not very ideologically aligned with anyway.
You could also have:
2.5. Donating money to grantees which have very different goals is likely to incur in an impact penalty, and this impact penalty is likely to be larger the less the grantees understand and share your goals.
So for instance, donating to an organization not sharing your goals reduces their effectiveness through fungibility, as you point out. But it would also reduce their impact by doing a worse job than someone who was more aligned, making it less likely that they will continue pursuing their goals after you cease funding, employing worse strategies to pursue your goals in situations where its hard to model, and sometimes even trying to actively mislead you.
Taking a guess, this alone does seem to me that it could make a 2 to 100x difference.
In practice, this doesn’t seem so salient for something like, say, GiveDirectly, because once you decide that direct transfers are cost-effective, they seem very aligned in making the transfers happen. But the wiggle room for political grants seems much greater.
Note that some of the grants in my discussion section, and discussed in the comment were “discretionary grants”: (Essie Justice group, Justice Strategies, Reframe mentorship). And in addition, many of the criminal justice reform grants also tend to have particularly short writeups.
I agree that EA as a movement (and thus, the organisations in charge of most EA funding) should be somewhat weary in what looks like endorsement of particular political groups (or particular interpretations/applications of them).
I don’t necessarily agree with what you describe as EA vs. non-EA goals, but I don’t have any strong arguments about this.
Still, I’d like to push back on two points:
I’m very unconvinced of this. The identity of the most powerful person in the US has a huge impact and is something that should interest EA a lot. In particular, someone with as extreme views, behaviours and policy ideas as Trump is bound to have an outsized impact (of some sign) that’s meaningful to assess and perhaps try to change. In particular, it may have a large impact on the criminal justice system, which makes dealing with it relevant in this particular case.
I don’t see why you expect these to be disjoint. EA is a political idea, and some popular political philosophies will be more compatible with it than others; Some can be judged to have a much better impact, if adopted by more people, than others. I have barely seen any impact evaluation of political activism projects to base the confidence that they’re not EA-aligned on.
This is not to say anything about the effectiveness and impact of particular activism done by any of these groups.
Thanks for putting this together! I think criticizing funders is quite valuable, and I commend your doing so. My main object-level thought here is I suspect much of the disagreement with the OP funding decision is based around the 1%-10% estimate of a $2B-$20B campaign leading to a 25%-75% decrease in incarceration. Since, per this article, incarceration rates in the U.S. have declined 20% (per person) between 2008 and 2019, your estimates here seem somewhat pessimistic to me.
My guess is at the outset, OP would have predicted a different order of magnitude for both of those numbers (so I would have estimated something closer to $500M-$5B would produce a 5%-50% chance of a 25%-75% decrease), particularly since (as has been mentioned in other comments) it seemed a particularly tractable moment for criminal justice reform (since crime had declined for a long-time, there was seeming bipartisan support for reform, costs of incarceration were high, and U.S. was so far outside the global norm). By my quick read of the math, that change would put the numbers on par with your estimates for global health stuff.
As someone who’s worked on criminal justice reform (on a volunteer basis, not OP-funded though inspired by OP-funded work like Just Leadership’s Close Rikers campaign), two features of the field that are striking to me are: 1. OP’s original vision was to reduce incarceration while also reducing crime—I don’t think the “reduce crime” half ended up being a main goal of the work, which I think has probably made it less politically robust. 2. A lot of criminal justice reform work (including mine) has stemmed from the thesis that empowering the voices of current and formerly incarcerated people politically would be politically beneficial; I think in retrospect this may have been mistaken (or at least an incomplete hypothesis) and that, more broadly, left identity-politics based strategies of the 2010s have not been as politically (especially electorally) successful as I at least had hoped.
On a meta-level, I think estimating impact of past grantmaking is very important, and EAs should do more of it. (I also think something along these lines could theoretically provide a scalable internship program for EA college students, since estimating impact teaches both cause prioritization skills and skills understanding the operations of organizations trying to achieve EA goals).
Edit: I should clarify that I’ve received significant funding from OP (including from their US Policy side that covered their criminal justice work), so I’m naturally biased in its favor
Thanks Josh, I particularly appreciate your quantified estimates of likelihood/impact.
Not sure how that follows; what would be needed for counterfactual/shapley impact would be a further reduction in the absence/reduction of funding. If OP donates $5B and the imprisonment rate goes down another 20%, but would have gone down a 20% (resp 15%) anyways, the impact is 0 (resp 5%).
Yeah it’s unclear how much of the 20% reduction is due to OP’s work or would happen counterfactually. My main point with that number is that reductions of that size are very possible, which implies assuming a 1-10% chance of that level of impact at a funding level 10-100x OP’s amount is overly conservative (particularly since I think OP was funding like 25% of American CJR work—though that number may be a bit off).
Another quick back of the envelope way to do the math would be to say something like: assume 1. 50% of policy change is due to deliberate advocacy, 2. OP’s a funder of average ability that is funding 25% of the field, 3. the 20% 2009-2018 change implies a further 20% change is 50% likely at their level of funding, then I think you get 6.25% (.5*.25*.5) odds of OP’s $25M a year funding level achieving a 20% change in incarceration rates. If I’m looking at your math right (and sorry if not) a 20% reduction for 10 years would be worth like (using point estimates) 4M QALYs (2M people *10 years *1 QALY*20% decrease), which I think would come out to 250K QALYs in expectation (6.25%*4M), which at $25M/year for 10 years would be 1K/QALY—similar to your estimate for GiveDirectly but worse than AMF. (Sorry if any of that math is wrong—did it quickly and haphazardly)
Unlike poverty and disease, many of the harms of the criminal justice system are due to intentional cruelty. People are raped, beaten, and tortured every day in America’s jails and prisons. There are smaller cruelties, too, like prohibiting detainees from seeing visitors in order to extort more money out of their families.
To most people, seeing people doing intentional evil (and even getting rich off it) seems viscerally worse than harm due to natural causes.
I think from a ruthless expected utility perspective, this probably is correct in the abstract, i.e. all else equal, murder is worse than equivalently painful accidental death. However I doubt taking it into account (and even being very generous about things like “illegible corrosion to the social fabric”) would importantly change your conclusions about $/QALY in this case, because all else is not equal.
But, I think the distinction is probably worth making, as it’s a major difference between criminal justice reform and the two baselines for comparison.
This consideration seems like it could go both ways. It is true that most people think that intentionally done harm is worse than non-anthropogenic or accidental harms… but they tend to also believe that it is inherently good to punish criminals!
From a ruthless expected utility perspective, it’s the case that we might want to let some criminals out early, because it harms them to be imprisoned. To my recollection this cost was the largest line item in OP’s cost-benefit analysis. But for those with a more justice-orientated perspective, harming criminals is not a disadvantage: retribution is a core part of the criminal justice system and it is an actively good thing to ensure they get their just deserts.
Thanks for expressing the compassionate case succinctly.
Excellent post. I have a strong prior that academic literature on criminology is biased, so I am more inclined than you to guess that consensus estimates for criminal justice reform not having net negative effects on crime are too optimistic. So my guess for second-order effects is that they make criminal justice reform even less valuable relative to other global health/wellness causes.
Putting that aside, I think one reason Open Phil might have been so favorably inclined to criminal justice reform was the bipartisan consensus that pursuing it was a good idea. The 2010′s were a uniquely good time to purse criminal justice reform (until ~ 2020, when increasing crime rates made criminal justice reform less bipartisan).
Perhaps you could call this the “The Hinge Hypothesis”—during the years that Open Phil made large donations to criminal justice reform efforts, it was a uniquely good time to do so. I think this was a reasonable guess, though I don’t think it brings the QALYs/$ to parity with other global health/wellness goals.
A lot of people, myself included, had relatively weak priors on the effects of marginal imprisonments on crime, and were subsequently convinced by the Roodman report. It might be valuable for people interested in this or adjacent cause areas to commission a redteaming of the Roodman report, perhaps by the CityJournal folks?
That’s an interesting idea. It seems like an effort that would require a lot of subject-matter expertise, so your idea to commision the CJ folks makes sense.
I do wonder if cause areas that rely on academic fields which we have reason to believe may be ideologically biased would generally benefit from some red-teaming process.
Cheers.
Thanks. Yeah, having a negative tail really reduces expected values. E.g., playing with a toy models, having a 25% that the impact is of the same magnitude but negative ~halves the expected impact:
I wonder if this paper, which appears to show that incarceration reduces prisoner mortality relative to non-incarcerted but criminal-justice-involved people, should change your estimates of CJ reform benefits. Given that, it seems plausible that reducing prison stays actually increases mortality for prisoners.
Another interesting thing about this paper is the implication that the previous work on this topic (which used the general population as the control group) was flawed in an obvious way. That should generally lower our opinion of the academic literature on this topic.
I imagine that the benefits of marginally increased mortality wouldn’t be the most important facto here: the vast majority of prisoners would prefer to be outside prison, even if this leads to an (I presume small) increase in mortality.
So I imagine this would have an effect, but for it to not be too large.
I think it is an impressive effect, though I agree people not wanting to be in prison is more important.
Why did you take the mean $/QALY instead of mean QALY/$ (which expected value analysis would suggest)? When I do that I get $5000/QALY as the mean.
I’ve now edited the post to reference mean(QALYs)/mean($). You can find this by ctrl+f for “EDIT 22/06/2022” and under the graph charts.
Note that I’ve used mean($)/mean(QALYS) ($8k) rather than 1/mean(QALYs/$) ($5k), because it seems to me that is more the quantity of interest, but I’m not hugely certain of this.
Another modeling issue is that each individual variable is log-normal rather than normal/uniform. This means that while probability of success is “0.01 to 0.1”, suggesting 5.5% as the “average”, the actual computed average is 4%. This doesn’t make a big difference on its own but it’s important when multiplying together lots of numbers. I’m not sure that converting log-normal to uniform would in general lead to better estimates but it’s important to flag.
Quick point that I’m fairly suspicious of uniform distributions for such uncertainties.
I’d agree that our format of a 90% CI can be deceptive, especially when people aren’t used to it. I imagine it would eventually be really neat to have probability distribution support right in the EA Forum. Until then, I’m curious if there are better ways to write the statistical summaries of many variables.
To me, “0.01 to 0.1” doesn’t suggest that 5.5% is the “mean”, but I could definitely appreciate that others would think that.
Because I didn’t think about it and I liked having numbers which were more interpretable, e.g. 3.4*10^-6 QALYs/$ is to me less interpretable that $290k/QALY, and same with 7.7 * 10^-4 vs $1300/QALY.
Another poster reached out and mentioned he was writing a post about this particular mistake, so I thought I’d leave the example up.
Please feel free to edit it, it will take me a while to actually post anything, I’m still thinking about how to handle this issue in the general case. I think you can keep the interpretability by doing mean(cost)/mean(QALY), but I still don’t know how to handle the probability distribution.
I think by not editing it we risk other people copying the mistake
Done.
Post I mentioned in the comments below now here: Probability distributions of Cost-Effectiveness can be misleading
In this case, the “mistakes” is often a list of things like, “This specific organization was much worse than we thought. The founders happened to have issues A, B, and C, which really hurt the organization’s performance”.
Releasing such information publicly is a big pain and something our culture is not very well attuned to. If OP basically announced information like, “This small group we funded is terrible, in large part because their CEO, George, is very incompetent”, that would be very unusual and there would likely be a large amount of resistance. I imagine OP would get a ton of heat for doing that.
This issue is amplified by the fact that OP is a grantmaking organization much more than a grant recommendation organization. If it posted information publicly about an organization’s competence, it’s not clear which other orgs would actually use that information.
My guess is that internally, people at OP admit a lot of mistakes. But making these mistakes public would create a lot of hazards, just because they contain a lot of information that specific organizations they work with would much prefer to be private.
Hi Nuno, Great post. Really interesting. Well done!
I do a lot of estimating the impact of policy change in my work for Charity Entrepreneurship and I decided to have a look at your Squiggle models and oh boy, do your estimates of costs per policy change seem high to me.
DIFFERENCE IN ESTIMATES
CE have looked at case studies for 100s of different policy change campaigns in LMICs (see various reports here) and tend to find a decent chance of success, and estimates for the cost of policy change (LMIC and HIC) for a new EA charity tend to be around $1-1.5m, with like a 10-50% chance of success.
And existing EA charities seem even more effective than that. The APPG for Future Generations seemed to consistently drive a policy change for every ~$50k. LEEP seems to have driven their first policy change for under ~$50k and seems on track to keep driving changes in that sort of ball park.
Your costs are:
For the Rikers policy change costs were $5m to $15m with a 7-50% chance of success. This is 1 order of magnitude above the CE estimates for new charities and 2 orders of above the track record of existing EA charities.
For the prison reduction policy change costs were $2B to $20B with a 1-10% chance of success. This is 4 orders of magnitude above the CE estimates for new charities and 5 orders of magnitude above the track record of existing EA charities. This seems insanely expensive to me.
POSSIBLE REASONS
The possible reasons for this difference are many:
You think prison reform is a particularly untractable topic to make headway on.
Prison reform needs very very many small policy changes for a number of years to get anywhere, e.g. it needs changes in every state.
Policy work in the US is much much more expensive than policy work in the UK and LMICs.
CE estimates are too low as the data we have is affected by survivorship bias in case studies we can find.
Non-EA charities are many times less effective at driving change than EA run charities.
Even so 4-5 orders of magnitude is a big jump, so I thought it worth pointing out. I would love to hear your views on this. Especially if you think “2.” is true as if so that affects some upcoming CEs decision making.
I do think it is possible you have significantly overestimated the costs here and that could be affecting your conclusions.
Edit: another way of looking at this in reading this your estimate of the effect of the Riker’s change feels more indictive to me as to overall cost-effectiveness of good prison reform work, than your other estimate of spending $2-20bn to see 25-75% of prisoners released (which feels like made up numbers). It seems plausible to me that a good campaign team can keep finding things as effective as the Rikers’ closure, bail reform, etc, without significant diminishing marginal returns.
So my best guess of what’s going on here is that Charity Entrepreneurship (CE) or past EAs with a good track record looked at very targeted policy changes specifically selected for their cost-effectiveness: increasing tobacco taxes in Mongolia, improving fish welfare policies in India, increasing foreign aid in Swiss cities, things like that.
So for instance, for the case of CE, my impression isn’t that you become enamoured with one particular cause, but rather than you have many rounds of progressively more intense research, and, crucially, that you start out from a large pool from ideas which you select from. And so (my impression is that) you end up choosing to push for policies that are all of important, neglected and tractable.
But in the case of criminal justice reform, I think that the neglectedness and importance very much come at the expense against tractability, because “harsh on crime” appeal to at least part of the electorate, because felons are not a popular demographic to defend, because many people have a strong intuition that softer policies lead to more crime, etc. Whereas pushing for, idk, tobacco taxation in LIC, or for salt iodization seems much more straightforward.
So the first part of my answer is that I think that the EA policies you are thinking of might be the product of stronger selection effects. I would also agree with points 1. and 2. in your list, when talking about systemic change. I think this should account for most of the disagreement in perspectives, but let me know if not.
Curious what specifically.
Some more specific and perhaps less important individual points:
Yeah, this comes from estimating what percentage of the effort to closing Rikers Open Phil’s funding contributed, and then looking at the chance of success in recent news:
For funding: I found three grants (1, 2, 3), accounting for ~$5M, and my impression is that Open Phil wasn’t literally the only funder, particularly since this is an ongoing, multi-year effort.
For probability: This was informed by reading news articles about Rikers. This is one particular article that I remember reading at the time about this. On the lower side, politicians keep saying that they want to close the prison, but they keep punting this into the future, and creating additional capacity to house prisoners (e.g., building new prisons) seems hard and prone to delays. On the optimistic side, there are specific commitments, specific dates, specific promises.
I think that you could get better probabilities by, e.g., pooling forecasts in Metaculus, rather than having me as a specific forecasts. And looking back, I would be higher than a 7-50% chance of closing Rikers by 2027 (the current date); maybe 20 to 70%
1 order of magnitude for weak selection effects, 2 orders of magnitude for choosing an unpopular/politicized cause, 1-3 orders of magnitude for America rather than a LIC, 1 order of magnitude for investing in fuzzy systemic change rather than specific policies like closing Rikers, …
See also Effectiveness is a Conjunction of Multipliers. You’d also have to add a +1 for topic of very broad interest, and so on, but the point is that you can get 5 orders of magnitude pretty quickly.
This seems plausible, but empirically, going through OP grants, not many had as clear cut a pathway to impact as Rikers. Some work in Los Angeles. But a big fraction was closer to the systemic estimate than to the Rikers estimate.
Interesting. If this is correct it suggests that impact focused EAs working on well targeted policy campaigns can be orders of multiple magnitude better than just giving to an existing think tank or policy organisation. Which would suggest that maybe big funders and the EA community should do a lot more to help EAs set up and run small targeted policy campaign groups.
This seems right, but I would still expect new groups to be worse than past top EA policy projects (e.g., this ballot initiative) if the selection effects are weakened.
That is, going from “past EA people who have done that have been very effective” to “if we have more EA people who do this, they will be very effective” doesn’t follow, because the first group could only act/has only acted in cases where the policy initiatives seem very promising ex-ante.
@weeatquince and all:
Do you know what the best research (or aggregated subjective beliefs) synthesis we have on the ‘costs of achieving policy change’...
perhaps differentiated by area
and by the economic magnitude of the policy?
My impressions was that Nuno’s
″ $2B to $20B, or 10x to 100x the amount that Open Philanthropy has already spent, would have a 1 to 10% chance of succeeding at that goal”
Seemed plausible, but I suspect that if they had said a 1-3% chance or a 10-50% chance, I might have found these equally plausible. (At least without other benchmarks).
Good to hear it shifted your opinion!
> I can see why releasing information about personal incompetence for instance might be unusual in some cultures; I’m not sure why you can’t build a culture where releasing such information is accepted.
I agree it’s possible, but think it’s a ton of work! Intense cultural change is really tough.
Imagine an environment, for instance, where we had a public ledger of, for every single person:
How much good and bad they’ve done.
Their personal strengths and weaknesses.
Their medical and personal issues.
There would be many positives of having such a list. However, it would also create a lot of problems, especially in the short-term. It would be pretty radical.
I think it’s good for people/orgs to experiment with radical honesty, though OP should probably be late to that. (You’d want to do experiments with smaller groups without so many commitments).
The company Bridgewater is noted as being particularly honest. It seems to produce some good results, but also, it’s an environment that seems terrible for most people. I recommend looking into reviews of it, it’s pretty interesting. (Also, note that Elie and Holden both worked at Bridgewater).
The baseline simple model is
This seems to depend on:
But shouldn’t this be simplified to include fewer variables? In particular:
1. Why do we need
cost
as a variable on it’s own?cost
is basically a choice variable. Presumably the more that is spent, the greater theprobabilityOfSuccess
. The uncertainty surrounds ‘benefit per dollar spent’. But that ‘slope of benefit in cost’ is really only a single uncertainty, not two uncertainties. Wouldn’t it be better to just pick a middle reasonable ‘amount spent’, perhaps the amount that seems ex-ante optimal?2. Acceleration * reductionInPrisonPopulation * probabilityOfSuccess:
These seem likely to be highly (negatively?) correlated to each other, and positively correlated to
cost
. For a given expenditure, if we target a lower ‘reduction in prison population’, or we a slower rate-of-change, I expect a greaterprobabilityOfSuccess
.Would it make sense to instead think of something like ‘reduction in total prison-years as a percent of current prison population?’ Perhaps, feeding into this, some combinations of expenditure, acceleration, reduction percent, and prob(success) that jointly seem plausible?
Answered here.
Manifold would be enthusiastic about setting up such a system for improving grant quality, through either internal or public prediction markets! Reach out (austin@manifold.markets) if you participate in any kind of grantmaking and would like to collaborate.
The primary blocker, in my view, is a lack of understanding on our team on how the grantmaking process operates—meaning we have less understanding of the the use case (and thus how to structure our product) than eg our normal consumer product. A few members of OpenPhil have previously spoken with us; we’d love to work more to understand how we could integrate.
A lesser blocker is the institutional lack of will to switch over to a mostly untested system. I think here “Prediction Markets in the Corporate Setting” is a little pessimistic wrt motives; my sense is that decisionmakers would happily delegate decisions, if the product felt “good enough”—so this kind of goes back to the above point.
My intuitions agree with you fwiw. I’m personally pretty skeptical of rational choice theory in criminology. I would guess criminology to be an unusually poor testing ground for microeconomics. Happy to be corrected otherwise with data.
See <https://www.lesswrong.com/posts/PrCmeuBPC4XLDQz8C/unconscious-economics> for something which might change your intuitions.
I read that post before but I’m not convinced. Reasons against:
anticorrelation between smarts and criminality
selection effects on crime is very weak at current margins, other than incapacitation
consensus-ish in the field that certain punishments is more effective for deterrence than proportionate-in-expectation punishments
Yeah, sorry, to elaborate a bit: I’m not saying that criminals are calculating the expected value calculation (i.e., that they can be modeled per rational choice theory). I’m instead saying that life strategies and habits will be reinforced and spread through e.g., social mimesis if they have a high positive expected reward. So for instance, if crime isn’t prosecuted, I think it will increase, at the speed of a few generations.
I’m not sure how much I trust the literature. In particular, I see it as estimating short level effects (e.g., looking at differences in natural experiments at the level of years, rather than decades). And I can buy that these short level effects can be small. But I don’t see it as being as informative about longer-run effects.
I think that proportionate-in-expectation punishment might work at the level of making more criminal life strategies less valuable, but I agree that certain punishment might be more effective. I don’t think that my position really hinges on this.
This is coming from my own understanding of what is politically feasible. I think that policies that make crime worth it in expectation are likely to be politically very unpopular and/or lead to more crime. So I think that restorative justice approaches would be stronger with a punitive component, even if they overall reduce that punitive component.
Do you think there’s an optimal ‘exchange rate’ between causes (eg. present vs future lives, animal vs human lives), and that we should just do our best to approximate it?
Yes. To elaborate on this, I think that agents should converge on such an exchange as they become more wise and understand the world better.
Separately, I think that there are exchange rates that are inconsistent with each other, and I would already consider it a win to have a setup where the exchange rates aren’t inconsistent.
I wonder if we can back out what assumptions the ‘peace pact’ approach is making about these exchange rates. They are making allocations across cause areas, so they are implicitly using an exchange rate.
Small suggestion: Could someone edit the post so that the hover footnotes work; makes it more readable? (Update: More importantly I’m not sure the footnote numbers are correct.)
Larger suggestion: This is a great post and provides a lot of methodological tools. I’d love to see it continue to be improved. E.g., detailed annotation, here or elsewhere, with the reasoning behind each element of the model.
It might help to turn the modeling part into a separate post linked to this one?
I have some ‘point by point’ questions and comments, particularly on the justification for the elements of the simple models. I don’t want to add them all here because of the clutter, so I will add them as public https://hypothes.is/ notes. Unless anyone has a better suggestion for how to discuss this?
One component of this post was the quantified estimate of AMF’s impact. My estimate was extremely rough. There is now a merely very rough estimate here: <https://observablehq.com/@tanaerao/malaria-nets-analysis> , which looks as follows:
(I’m taking the estimate of lives saved, and then dividing them by life expectancy to get something resembling QALYs. I’m getting the life expectancy figures from Our World In Data, and then adding a bit of uncertainty).
Overall this update doesn’t change this analysis much.
So I don’t actually have to assume that criminals are rational actors; I can also assume that actions which are high expected value will spread through mimesis. See this short post: Unconscious Economics for an elaboration of this point.
But you are right that it smuggles many assumptions.
OPP, formerly GiveWell Labs, recommends grants. Good Ventures and other organizations that use these recommendations award the grants. The hypothesis that a funder was passionate about this cause and that OPP recommended effective solutions within this cause area rather than comparing the cause area to others seems prima facie plausible. However, I think that the influence is not as one-sided. OPP’s leadership, represented by Holden Karnofsky, started considering (US) political advocacy early on. Since there are many ways to develop trusting relationships with political decisionmakers, following up on prominent conversations relating to justice and benevolence, criminal justice could have been actually a great idea.
You should also note that recently, OPP delegated criminal justice reform to Just Impact, led by Chloe Cockburn and Jesse Rothman, the former OPP’s grant advisors in this cause area. This grant should suffice for 3.5 years and perhaps the organization gains its own funding or solves the issue. So, perhaps this decision makes your argument outdated.
Maybe, are you criticizing that OPP commits to causes? There were some doubts[1] about this in 2014. Currently, OPP conducts actually only shallow and medium-depth investigations, so the concern here can seem quite the opposite: where are the in-depth investigations?
This should be the case if you are introducing new information in a way that is welcome by the decisionmakers and their informants. While you could rephrase some of your statements neutrally,[2] the readers will probably go for content. You also agreed to not defame or cause rep loss of anyone involved,[3] so you wrote the entry with this intention. This could also imply that you would engage with any responses of these stakeholders constructively.
In the report, at the first glance, I would focus on the time on parole (p. 17). It can be that an increased time on parole increases criminality.[4] Thus, increasing parole time may be a suboptimal solution.
The experiments’ interpretations on pp. 22–26 seem horrifying, at the first glance.[5] I am reading that deterrence, such as incarcerating someone for a day to discourage them, is bad while incapacitation (such as incarceration?) good.
On the second glance, I am also worried that the table shows the consistency of the studies with the report,[6] not using the studies to inform conclusions. If this only shows that the writers had an extra challenge to fit external biases, that may be ok, but it can also be that the writers are biased toward incarceration and use the perceived readers’ weaknesses and their appeal to authority. Thus, I suggest that the connections of the contributors to the profitable US prison system are overviewed.
On the third glance, the report simply analyzes various criminal justice factors that impact different measurable aspects of criminality, such as re-offences and their frequency/timing, but does not really discuss why people would commit crimes and how this could be prevented in ways other than those directly pertinent to criminal legislature and offenses.
In terms of the cost-effectiveness, reducing crime rate in the US can free significant capital,[7] which can be used cost-effectively, including for deworming programs. When you consider that many US officials would love when they could attribute some crime reduction (people could elect them), then they could be attuned to certain budgetary recommendations from the crime reduction organization.
For example, if incarceration is reduced by 10% in the US,[8] then, 600 million could be realistically annually spent on EA-related programs including deworming. [9]
Have you considered that deworming may be a perpetual need while influencing a decision that motivates a sustainable systemic change a permanent solution? This could justify spending on advocacy, in general.
Furthermore, for change to happen, perhaps various media should be engaged. For example, letters should be complemented by local group support, by pro bono legal advisory that uses case studies to inform legal changes, by public awareness/electorate support that includes movies, by making sure that any new legislation is not discriminatory on the basis of race or otherwise, that judges are bought in, that any evidence-based appeal is complemented by US families’ emotional advocacy, etc. So, it can be that grants awarded at any given time always complemented current activities to optimize for marginal cost-effectiveness.
Is it that in EA, the marginal cost-effectiveness[10] is generally considered while factors such as the president’s pay are included in the analysis? For example, generally, a lawyer can be paid this salary. Legal manager who makes state and federal changes can be perceived to take a pay cut.
I apologize for I skipped your hypotheses.
It may be that at this point, the best course of action is reminding Just Impact to enter any conversations with the government with the awareness of favors return, the extent that reduced crime can increase budget, and the question on where they could see the greatest impact with some of this funding?[11]
For example, “and to what extent it was determined by other factors, such as the idiosyncratic preferences of Open Philanthropy’s funders, human fallibility and slowness, paying too much to avoid social awkwardness,” can be safely omitted since it is implied that if any reasoning is flawed, it is due to biases.
Also, by stating “I end up uncertain about to what extent this was a sincere play based on considerations around the value of information and learning,” you mean that the decision could have seemed valuable initially but with learning over time it showed as not so marginally cost-effective? Or that it should have been clear since the beginning that the org will learn little by focusing on criminal justice reform? Any hypothesis can be stated explicitly.
5. D. vii. (6)
This could show that any human bias was in favor of releasing criminals on parole and the report aimed to mitigate it.
It may be because I am reading them wrong.
In particular, see the assessment of Abrams (2012) (p. 23) that states “Strongly compatible after reanalysis.” This could point out that an external standard was being met by the study interpretation.
From 2.3 million to 2.1 million.
0.23million∗$26,000/year≈6billion/year. If the crime reduction organization influences 10%, additional 600 million can be spent.
alongside with opportunity/timing, leverage, and other factors
While deworming was used as an example, other expanded moral circle considerations can work. For example, I estimated, using references shared by a GiveDirectly employee, that targeting individuals at risk of criminal involvement who go through otherwise effective programs but turn to crime only because they do not have access to microfinance could multiply the cost-effectiveness many times. This also considers that the criminal activity of a single individual can reduce the wellbeing of entire communities while cash transfers targeted based on poverty rates only modest spillovers.
I find your comment a bit hard to read; but a quick answer to one particular point:
Footnote 3 links to this document, which contains the following clause:
But this document is for the cause exploration prize, not for the criticism price. Also, I would be very hesitant to agree to such a condition.
Oh, sorry, I thought you are responding to the Cause Exploration.
It’s an interesting hypothesis, but I don’t think deworming is a perpetual need? I don’t think I took deworming pills growing up, and I doubt most Forum readers did.
Framed another way, I don’t think we should have a strong prior belief that if we subsidize health interventions for X years, this means they’ll need to be continuously subsidized by the globally rich, while “systematic” policy changes in the West are successful as one-offs.
That is true, infrastructure can be build and infections eliminated. That is echoed by the WHO (only some schools are recommended treatment while “sanitation and access to safe water [and] hygiene” can reduce transmission).
I possibly underestimated the facility of reducing contamination and overestimated the inevitability of avoiding contaminants. According to the above sources, sanitation, hygiene, and refraining from using animal fertilizer can reduce contamination. Further, wearing shoes and refraining from spending time in possibly contaminated water reduces the risk of infection by avoiding contaminants. Thus, only people that cannot reasonably avoid spending time in water, such as water-intensive crop farmers, who are in areas with high infection prevalence are at risk while solutions in other situations are readily available. Since farming can be automated, risk can be practically eliminated entirely.
I agree. According to Dr. Gabby Liang, the SCI Foundation currently works on “capacity development within the [program country] ministries.” This suggest that international assistance can be eventually phased out. It can also be that most health programs develop capacity and thus make lasting changes, even if they are not specifically targeted at that. Policy changes that are a result of organized advocacy, on the other hand, may be temporary/fragile, also since they are not based on the institution’s reasoning or research. So, I could agree with the greater effectiveness of SCI than the cited letter writing (but still would need more information to have either perspective).
Disclosure: I worked with Open Phil’s CJR team for ~4 months in 2020-2021 and was in touch with them for ~6 months before that.
I’m very concerned by the way this post blends speculative personal attacks with legitimate cost effectiveness questions.
Chloe and Jesse are competent and committed people working in a cause area that does not meet the 1000x threshold currently set by GiveWell top charities. If it were easy to cross that bar, these charities would not be the gold standard for neartermist, human-focused giving. Open Phil chose to bet on CJR as a cause area, conduct a search, and hire Chloe anyway.
I genuinely believe policy- and politics-focused EAs could learn a lot from the CJR team’s movement building work. Their strengths in political coordination and movement strategy are underrepresented in EA.
I bought the idea that we could synthesize knowledge from different fields and coordinate to solve the world’s most pressing problems. That won’t happen if we can’t respectfully engage with people who think or work differently from the community baseline.
We can’t significantly improve the world without asking hard questions. We can ask hard questions without dismissing others or assuming that difference implies inferiority.
[I only got back on the forum to reply to this post.]
So here is the thing, Chloe and her team’s virtues and flaws are amplified by virtue of them being in charge of millions. And so I think that having good models here requires mixing speculative judgments about personal character with cost-effectiveness estimates.
At this point I can either:
Not develop good models of the world
Develop ¿good? models but not share them
Develop them and share them
Ultimately I went with option 3, though I stayed roughly three months in option 2. It’s possible this wasn’t optimal. I think the deciding factor here was having two cost-effectiveness estimates which ranged over 2-3 orders of magnitude and yet were non-overlapping. But I could just have published those alone. But I don’t think they can stand alone, because the immediate answer is that Open Philanthropy knows something I don’t, and so the rest of the post is in part an exploration of whether that’s the case.
I don’t disagree with the meat of this paragraph. Though note that Jesse Rothman is not working on criminal justice reform any more, I think (see the CEA teams page)
I imagine this is one of the reasons why CEA hired Jesse Rothman/Jesse Rothman chose to be hired to work on EA groups.
Yes, but sometimes you can’t answer the hard questions without being really unflattering. For instance, assume for a moment that my cost effectiveness estimates are roughly correct. Then there were moments where Chloe could have taken the step of saying “you know what, actually donating $50M to GiveDirectly or to something else would be more effective than continuing my giving through JustImpact”. This would have been pretty heroic, and the fact that she failed to be heroic is at least a bit unflattering.
I’m not sure how this translates to your “assuming inferiority” framing. People routinely fail to be heroic. Maybe it’s too harsh and unattainable a standard. On the other hand, maybe holding people and organizations to that standard will help them become stronger, if they want to. I think that’s what I implicitly believe.
Hi, can you give an example of a speculative personal attack in the post that you’re referring to?
How about:
I read this as a formal and softened way of saying “Chloe made avoidably bad grants because she wouldn’t do the math”. Different people will interpret the softening differently: it can come across either as “hey maybe this could have been a piece of what happened?” or “this is totally what I think happened, but if I say it bluntly that would be rude”.
Yeah, well, I haven’t thought about this case much, so maybe there’s some good counterargument, but I think of personal attacks as “this person’s hair looks ugly” or “this person isn’t fun at parties”, not “this person is not strong in an area of the job that I think is key”. Professional criticism seems quite different from personal attacks, and I hold different norms around how appropriate it is to bring up in public contexts.
Sure, it’s a challenge to someone to be professionally criticized, and can easily be unpleasant, but it’s not irrelevant or off-topic and can easily be quite valuable and important.
Can you give specific examples of this, which might help to communicate these advantages and support your comment?
Thanks for the comment! I think it’s really useful to hear concerns and have public discussions about them.
As stated earlier, this post went through a few rounds of revisions. We’re looking to strike balances between publishing useful evaluative takes while not being disrespectful or personally upsetting.
I think it’s very easy to go too far on either side of this. It’s very easy to not upset anyone, but also not say anything, for instance.
We’re still trying to find the best balances, as well as finding the best ways to achieve both candor and little offense.
I’m sorry that this came of as having personal attacks.
> Chloe and Jesse are competent and committed people working in a cause area that does not meet the 1000x threshold currently set by GiveWell top charities
Maybe the disagreement is partially in the framing? I think this post was agreeing with you that it doesn’t seem to match the (incredibly high) bar of GiveWell top charities. I think many people came at this thinking that maybe criminal justice did meet that bar, so this post was mostly about flagging that in retrospect, it didn’t seem to.
For what it’s worth, I’d definitely agree that it is incredibly difficult to meet that bar. There are lots of incredibly competent people who couldn’t do that.
If you would have recommendations for ways this post and future evaluations could improve, I’d of course be very curious.
Hi there -
Thanks for your response and sorry for my lag. I can’t go into program details due to confidentiality obligations (though I’d be happy to contribute to a writeup if folks at Open Phil are interested), but I can say that I spent a lot of time in the available national and local data trying to make a quantitative EA case for the CJR program. I won’t get into that on this post, but I still think the program was worthwhile for less intuitive reasons.
On the personal comments:
I think this post’s characterization of Chloe and OP, particularly of their motivations, is unfair. The CJR field has gotten a lot of criticism in other EA spaces for being more social justice oriented and explicitly political. Some critiques of the field are warranted (similar to critiques of ineffective humanitarian & health interventions) but I think OP avoided these traps better than many donors. The team funded bipartisan efforts and focused on building the infrastructure needed to accelerate and sustain a new movement. Incarceration in the US exploded in the ’70s as the result of bipartisan action. The assumption that the right coalition of interests could force similarly rapid change in the opposite direction is fair, especially when analyzed against case studies of other social movements. It falls in line with a hits-based giving strategy.
Why I think the program was worthwhile:
The strategic investments made by the CJR team set the agenda for a field that barely existed in 2015 but, by 2021, had hundreds of millions of dollars in outside commitments from major funders and sympathetic officials elected across the US. Bridgespan (a data-focused social impact consulting group incubated at Bain) has used Open Phil grantees’ work to advise foundations, philanthropists, and nonprofits across the political spectrum on their own CJR giving. I’ve met some of the folks who worked on Bridgespan’s CJR analysis. I trust their epistemics and quantitative skills.
I don’t think we’ve seen the CJR movement through to the point where we could do a reliable postmortem on consequences. I’ve seen enough to say that OP’s team has mastered some very efficient methods for driving political will and building popular support.
OP’s CJR work could be particularly valuable as a replicable model for other movement building efforts. If nothing else, dissecting the program from that lens could be a really productive conversation.
Other notes
I disagreed with the CJR team on *a lot*. But they’re good people who were working within a framework that got vetted by OP years ago. And they’re great at what they do. I don’t think speculating on internal motivations is helpful. That said, I would wholeheartedly support a postmortem focused on program outcomes.
I came to the US scene from the UK and was very surprised by the divide (animosity) between SJ-aligned and EA-aligned work. I ended up disengaging with both for a while. I’m grateful for the wonderful Oxford folks for reminding me why I got involved in EA the first place.
Sitting at a table full of people with very different backgrounds / skill sets / communication styles requires incredible amounts of humility on all sides. I actively seek out opportunities to learn from people who disagree with me, but I’ve missed out on some incredible learning opportunities because I failed at this.
Thanks so much for sharing that, it adds a lot of context to the conversation.
I really, really hope this post doesn’t act anything like “the last word” on this topic. This post was Nuno doing his best with only a few weeks of research based on publicly-accessible information (which is fairly sparse, and I could understand why). The main thing he was focused on was simple cost-effectiveness estimation of the key parameters, compared to GiveWell top charities, which I agree is a very high bar.
I agree work on this scale really could use dramatically more comprehensive analysis, especially if other funders are likely to continue funding effectiveness-maximizing work here.
One small point: I read this analysis much more as suggesting that “CJR is really tough to make effective compared to top GiveWell charities, upon naive analyses” than anything like “the specific team involved did a poor job”.
Does “1000x” refer to something in particular, or are you just saying that the GiveWell top charities set a high bar?
I understood it as the combination of the 100x Multiplier discussed by Will MacAskill in Doing Good Better (referring to the idea that cash is 100x more valuable for somebody in extreme poverty than for someone in the global top 1%), and GiveWell’s current bar for funding set at 8x GiveDirectly. This would mean that Open Philanthropy targets donation opportunities that are at least 800x (or more like 1000x on average) more impactful than giving that money to a rich person.
Yeah, see this Open Philanthropy post. Or think about the difference in funding for an additional dollar to someone living on $500/year vs an additional dollar given to someone living on $50k, given log utility.