Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
DOUBLE COUNTING
Similar to not costing others work, you can end up in situations where the same impact is counted multiple times across all the charities involved, giving an inflated picture of the total impact.
Eg. If Effective Altruism (EA) London runs an event and this leads to an individual signing the Giving What We Can (GWWC) pledge and donating more the charity, both EA London and GWWC and the individual may take 100% of the credit in their impact measurement.
Just a quick note that ‘double counting’ can be fine, since the counterfactual impact of different groups acting in concert doesn’t necessarily sum to 100%.
See more discussion here: https://forum.effectivealtruism.org/posts/fnBnEiwged7y5vQFf/triple-counting-impact-in-ea
Also note that you can also undercount for similar reasons. For instance, if you have impact X, but another org would have had done X otherwise, you might count your impact as zero. But that ignores that by doing X, you free up the other org to do something else high impact.
I think I’d prefer to frame this issue as something more like “how you should assign credit as a donor in order to have the best incentives for the community isn’t the same as how you’d calculate the counterfactual impact of different groups in a cost-effectiveness estimate”.
I’d also point to Shapley values.
When you notice that counterfactual values can sum up to more than 100%, I think that the right answer is to stop optimizing for counterfactual values.
It’s less clear cut, but I think that optimizing for Shapley value instead is a better answer—though not perfect.
I think of Shapley values as just one way of assigning credit in a way to optimise incentives, but from what I’ve seen, it’s not obvious it’s the best one. (In general, I haven’t seen any principled way of assigning credit that always seems best.)
Good point, thanks! :)
Most cost-effectiveness analyses by EA orgs (and other charities) use a ratio of costs to effects, or effects to costs, as the main—or only—outcome metric, e.g. dollars per life saved, or lives affected per dollar. This is a good start, but it can be misleading as it is not usually the most decision-relevant factor.
If the purpose is to inform a decision of whether to carry out a project, it is generally better to present:
(a) The probability that the intervention is cost-effective at a range of thresholds (e.g. there is a 30% chance that it will avert a death for less than my willingness-to-pay of $2,000, 50% at $4,000, 70% at $10,000...). In health economics, this is shown using a cost-effectiveness acceptability curve (CEAC).
(b) The probability that the most cost-effective option has the highest net benefit (a term that is roughly equivalent to ‘net present value’), which can be shown with a cost-effectiveness acceptability frontier (CEAF). It’s a bit hard to get one’s head around, but sometimes the most cost-effective intervention has lower expected value than an alternative, because the distribution of benefits is skewed.
(c) A value of information analysis to assess how much value would be generated by a study to reduce uncertainty. As we found in our evaluation of Donational, sometimes interventions that have a poor cost-effectiveness ratio and a low probability of being cost-effective nevertheless warrant further research; and the same can be true of interventions that look very strong on those metrics.
See Briggs et al. (2012) for a general overview of uncertainty analysis in health economics, Barton et al. (2008) for CEACs, CEAFs and expected value of perfect information, and Wilson (2014) for a practical guide to VOI analyses (including the value of imperfect information gathered from studies).
Of course, these require probabilistic analyses that tend to be more time-consuming and perhaps less transparent than deterministic ones, so simpler models that give a basic cost-effectiveness ratio may sometimes be warranted. But it should always be borne in mind that they will often mislead users as to the best course of action.
I haven’t read the articles you linked, but I’m wondering:
(a) If the outcome of a CEA is a probability distribution like the one below, we can see that there is a 5% probability that it costs less than $1,038 to avert a death, 30.1% probability that it costs less than $2,272, etc. Isn’t that the same?
(b)
Is that because of the effect that I call “Optimizer’s curse” in my article?
Please don’t feel like you have to answer if you don’t know the answers off the top of your head or it’s complex to explain. I don’t really need these answers for anything, I’m just curious. And if I did need the answers, I could find them in the links :)
Undervaluing Diversification: Optimizing for highest Benefit-Cost ratios will systematically undervalue diversification, especially when the analyses are performed individually, instead of as part of a portfolio-building process.
Example 1: Investing in 100 projects to distribute bed-nets correlates the variance of outcomes in ways that might be sub-optimal, even if they are the single best project type. The consequent fragility of the optimized system has various issues, such as increased difficulty embracing new intervention types, or the possibility that the single “best” intervention is actually found to be sub-optimal (or harmful,) destroying the reputation of those who optimized for it exclusively, etc.
Example 2: The least expensive way to mitigate many problems is to concentrate risks or harms. For example, on cost-benefit grounds, the best site for a factory is the industrial areas, not the residential areas. This means that the risks of fires, cross-contamination, and knock-on-effects of any accidents increase because they are concentrated in small areas. Spreading out the factories somewhat will reduce this risk, but the risk externality is a function of the collective decision to pick the lowest cost areas, not any one cost-benefit analysis.
Additional concern: Optimizing for low social costs as measured by economic methods will involve pushing costs on the poorest people, because they typically have the lowest value-to-avoid-harm.
Any deterministic analysis (using point estimates, rather than probability distributions, as inputs and outputs) is unlikely to be accurate because of interactions between parameters. This also applies to deterministic sensitivity analyses: by only changing a limited subset of the parameters at a time (usually just one) they tend to underestimate the uncertainty in the model. See Claxton (2008) for an explanation, especially section 3.
This is one reason I don’t take GiveWell’s estimates too seriously (though their choice of outcome measure is probably a more serious problem).
I tend to think this is also true of any analysis which includes only one way interactions or one way causal mechanisms, and ignores feedback loops and complex systems analysis. This is true even if each of parameters is estimaed using probability distributions.
I think this would make a great reference checklist for anyone developing CEEs to go through as they write up their CEE and indirect effects sections.
Thanks for writing this.
Could you give an example of this one, please?
I understand these are two different things, but am wondering exactly what problems you are seeing this equivocation causing. Is this a risk-aversion issue?
Yes, the distinction is important for people who want to make sure they had at least some impact (I’ve met some people like that). Also, after reading GiveWell’s CEA, you might be tempted to say “I donated $7000 to AMF so I saved two lives.” Interpreting their CEA this way would be misleading, even if it’s harmless. Maybe you saved 0, maybe you saved 4 (or maybe it’s more complicated because AMF, GiveWell, and whoever invented bednets should get some credit for saving those lives as well, etc.).
Another related problem is that probabilities in CEAs are usually subjective Bayesian probabilities. It’s important to recognize that such probabilities are not always on equal footing. E.g., I remember how people used to say things like “I think this charity has at least 0.000000001% chance of saving the world. If I multiply by how many people I expect to ever live… Oh, so it turns out that it’s way more cost-effective than AMF!” I think that this sort of reasoning is important but it often ignores the fact that the 0.000000001% probability is not nearly as robust as probabilities GiveWell uses. Hence you are more likely to fall for the Optimizer’s Curse. In other words, choosing between AMF and the speculative charity here feels choosing between eating at a restaurant with one 5 star Yelp review and eating at a restaurant with 200 Yelp reviews averaging 4.75 star (wording stolen from Karnofsky (2016). I’d choose the latter restaurant.
Also, an example where the original point came up in practice can be seen in this comment.
brainstorming / regurgitating some random additional ideas -
Goodhart’s law—a charity may from the outset design itself or self-modify itself around Effective Altruist metrics, thereby pandering to the biases of the metrics and succeeding in them despite being less Good than a charity which scored well on the same metrics despite no prior knowledge of them. (Think of the difference between someone who has aced a standardized test due to intentional practice and “teaching to the test” vs. someone who aced it with no prior exposure to standardized tests—the latter person may possess more of the quality that the test is designed to measure). This is related to “influencing charities” issue, but focusing on the potential for defeating of the metric itself, rather than direct effects of the influence.
Counterfactuals of donations (other than the matching thing)- a highly cost effective charity which can only pull from an effective altruist donor pool might have less impact than a slightly less cost effective charity which successfully redirects donations from people who wouldn’t have donated to a cost effective charity (this is more of an issue for the person who controls talent, direction, and other factors, not the person who controls money).
Model inconsistency—Two very different interventions will naturally be evaluated by two very different models, and some models may inherently be harsher or more lenient on the intervention than others. This will be true even if all the models involved are as good and certain as they can realistically be.
Regression to the mean—The expected value of standout candidates will generally regress to the mean of the pool from which they are drawn, since at least some of the factors which caused them to rise to the top will be temporary (including legitimate factors that have nothing to do with mistaken evaluations)
Good points. (Also, I believe am personally required to upvote posts that reference Goodhart’s law.)
But I think both regression to the mean and Goodhart’s law are covered, if perhaps too briefly, under the heading “Estimates based on past data might not be indicative of the cost-effectiveness in the future.”
Another issue is if multiple charities are working on the same issue, and cooperating, there might be times when a particular charity actively chooses to take less cost-effective actions in order to improve movement wide cost-effectiveness. This happens frequently with the animal welfare corporate campaigns. For example:
Charity A has 100 good volunteers in City A, where Company A is headquartered. To run a campaign against them would cost Charity A $1000, and Company A uses 10M chickens a year. Or, they could run a campaign against Company B in a different city where they have fewer volunteers for $1500.
Charity B has 5 good volunteers in City A, but thinks they could secure a commitment from Company B in City B, where they have more volunteers, for $1000. Company B uses 1M chickens per year. Or, by spending more money, they could secure a commitment from Company A for $1500.
Charities A and B are coordinating, and agree that Companies A and B committing will put pressure on a major target (Company C), and want to figure out how to effectively campaign.
They consider three strategies (note—this isn’t how the cost-effectiveness would work for commitments since they impact chickens for longer than a year, etc, but for simplicity’s sake):
Strategy 1: They both campaign against both targets, at half the cost it would be for them to campaign on their own, and a charity evaluators views the victories as split evenly between them.
Charity A cost-effectiveness: (5M + 0.5M Chickens / $500 + $750) = 4,400 chickens / dollar
Charity B is also 4,400 chickens / dollar.
$2500 total spent across all charities
Strategy 2: Charity A targets Company A, and Charity B targets Company B
Charity A: 10,000 chickens / dollar
Charity B: 1,000 chickens / dollar
$2000 total spent across all charities
Strategy 3: Charity A targets Company B, Charity B targets Company A
Charity A: 667 chickens / dollar
Charity B: 6696 chickens / dollar
$3,000 total spent across all charities
These charities know that a charity evaluator is going to be looking at them, and trying to make a recommendation between the two based on cost-effectiveness. Clearly, the charities should choose Strategy 2, because the least money will be spent overall (and both charities will spend less for the same outcome). But if the charity evaluator is fairly influential, Charity B might push hard for less ideal Strategies 1 or 3, because those make its cost-effectiveness look much better. Strategy 2 is clearly the right choice for Charity B to make, but if they do, an evaluation of their cost-effectiveness will look much worse.
I guess a simple way of putting this is—if multiple charities are working on the same issue, and have different strengths relevant at different times, it seems likely that often they will make decisions that might look bad for their own cost-effectiveness ratings, but were the best thing to do / right decision to make.
Also, on the matching funds note—I personally think it would be better to assume matching funds are truly match rather than not. I’ve fundraised for maybe 5 nonprofits, and out of probably 20+ matching campaigns in that period, maybe 2 were not truly matches. Additionally, often nonprofits will ask major donors to match funds as a way to encourage the major donor to give more (e.g. “you could give $20k like you planned, or you could help us run our 60k year end fundraiser by matching 30k” type of thing). So I’d guess that for most matching campaigns, the fact that it is a matching campaign means there will be some multiplier on your donation, even if it is small. Maybe it is still misleading then? But overall a practice that makes sense for nonprofits to do.
Re: Bias towards measurable results
A closely related issue is justification-bias, where expectations that the cost-benefit analysis be justified leads t0 exclusion of disputed values. One example of this is the US Army Corps of Engineers, which produces Cost-Benefit analyses that are then given to congress for funding. Because some values (ecological diversity, human enjoyment, etc.) are both hard to quantify, and the subject of debate between political groups, including them leaves the analysis open to far more debate. The pressure to exclude them leads to their implicit minimization.
Do you have any thoughts on how we should change our current approach, if at all, to using and interpreting CEEs in light of these issues?
Not really. I just think that we should be careful when using CEEs. Hopefully, this post can help with that. I think it contains little new info for people who have been working with CEEs for a while. I imagine that these are some of the reasons why GiveWell and ACE give CEEs only limited weight in recommending charities.
Maybe I’d like some EAs to take CEEs less literally, understand that they might be misleading in some way, and perhaps analyze the details before citing them. I think that CEEs should start conversations, not end them. I also feel that early on some non-robust CEEs were overemphasized when doing EA outreach, but I’m unsure if that’s still a problem nowadays.
I first published this post on August 7th. However, after about 10 hours, I moved the post to drafts because I decided to make some changes and additions. Now I made those changes and re-published it. I apologize if the temporal disappearance of the article lead to any confusion or inconvenience.
This was my fault, sorry. I was travelling and ill so I was slow giving feedback on the draft. I belatedly sent Saulius some comments without realising it had just been published, so he took it down it in order to incorporate some of my suggestions.
No need to apologize Derek, I should’ve given you a deadline or at least tell you that I’m about to publish it. Besides, I don’t think anyone shared a link to the article in those 10 hours so no harm done. Thank you very much for all your suggestions and comments.
I wish I’d spent more time reviewing this before publication as I failed to mention some key points. I’ll add some of them as comments.
Thanks for the handy list.
a few quick additional thoughts
1)Perhaps there should be an expanded QALY (QALYX?)… just as a life year of significant suffering may have less value, a life year of increased pleasure or satisfaction would have increased value. Can of worms of course in comparing “units of happiness”
2)Perhaps this could be also equity adjusted. Just as with a dollar given to a rich person would seem to generate less “good” than a dollar given to a poor person, so would units of happiness (assuming one had worked out a reasonable metric of happiness)
3)This is a bit dark, but in considering QALY, there are also society costs to lengthening the life of someone who has some debilitating condition. It seems that saving and lengthening someone’s life with disability should consider these costs. There are of course competing values (optimization of potential good human years vs duty to maximize care for currently disabled), but such an accounting would make this trade-off transparent. Shifting implied calculus here is of course why infanticide of the disabled was common in the past but uncommon in modern societies.
Good points :) You might be interested in this sequence (see the links at the bottom of the summary)
I found this post both very informative and easy to read and understand. I will use these considerations in the future to read cost-effectiveness estimates more critically. Thank you!
I found this post an incredibly helpful introduction to critiquing CE analyses. Thank you.
This post was awarded an EA Forum Prize; see the prize announcement for more details.
My notes on what I liked about the post, from the announcement:
Good article. Various things you mention are examples of bad metrics. Another common kind is metrics involving thresholds, e.g. the number of people below a poverty line. Since they treat all people below, or above, the line as equal to each other, when this is far from the case. (Living on $1/day is far harder than $1.90/day.) This often results in organisations wasting vast amounts of money/effort moving people from just below the line to just above, with little actual improvement, and perhaps ignoring others who could have been helped much more even if they couldn’t be moved across the line.
Great to go through this post again. Thanks, Saulius!
Here is a model of the cost-effectiveness of restricted donations.
“if you donate some bread to hungry civilians in this warzone, then this military group will divert all the excess recourses above subsistence to further its political / military goals”. Guess now you have no way to increase their wellbeing! Just buy more troops for this military organization!
That’s some top tier untrustworthy move. If some charity did that with my donation I would mentally blacklist it for eternity
How about: Not being consistent in whether indirect effects like opportunity costs are counted in impacts or total costs.
For example, say if you donate to a charity, and they hire someone who would have otherwise earned to give. Should we treat those lost donations as additional costs (possibly weighted by relative cost-effectiveness with your donations) or as a negative impact?
Doing cost-benefit analysis instead of cost-effectiveness analysis would put everything in the same terms and make sure this doesn’t happen, but then we’d have to agree on how to convert to or from $.
Have we been generally only treating direct donations towards costs and everything else towards impacts?
Personally, I don’t remember any cost-effectiveness estimate that accounted for things like money lost due to hiring earning-to-givers in any way.
Charity Entrepreneurship has included opportunity costs for cofounders in charity cost-effectiveness analyses towards the charities’ impacts, e.g.:
https://www.getguesstimate.com/models/13821
https://www.getguesstimate.com/models/13828
https://www.getguesstimate.com/models/13985
I think the moral questions that arise when assessing effectiveness are particularly concerning. DALYs and QALYs are likely unreliable for the reasons you mention, though how unreliable exactly is hard to say. It’s possible they’re close to the best approximations we’ll ever have and there is no viable alternative to using them. But the fundamental and inescapable limitations of cost effectiveness analysis remain.
What can we say with confidence about the distribution of suffering in the world? Misery is a subjective experience for which macro measures of poverty are a weak proxy at best. I’m left with the sense that the case for directing EA resources only to the poorest geographies is hardly airtight. From a fairness perspective, the comparison shopping approach to choosing who to help is hard to swallow. Should a person suffering profoundly not receive assistance simply because they were, in a perverse reversal, unlucky enough to be born in the US or UK? This seems like less a widening moral circle than a sort of hollowed out bagel shaped one. I don’t think we’re wise to so doggedly resist intuitions here.
Even in a fully utilitarian calculus it’s unclear how high the total cost of meaningfully benefiting the needy in wealthier parts of the world would actually be if EAs gave it a shot. And the size of the potential benefit is also conceivably very high. Overall it strikes me as an uncharacteristic lack of curiosity and ambition that EAs bring to the question of how we might be able through philanthropy to strengthen the small, medium and large groups we belong to in order to act even more impactfully on a global scale. Shouldn’t we explore the area between hyperlocal EA meta and anti-local EA causes a little more? And by “we” I maybe mean “I” haha. I maybe just haven’t looked deeply enough at the arguments yet.
I’m not as well-read on this topic as I’d like to be so would welcome any paper or book recommendations. Thanks for the detailed & high quality post.
Faced with such uncertainty, shouldn’t we rather hedge our bets and split our support? For example, if project A has cost-effectiveness in the range of 70-80%, and project B has cost-effectiveness in the range 60-90%, wouldn’t it be better (overall) to split the support evenly than to only support project A?
One other example is rural vs urban, it might be more cost-effective to solve a problem (say school attendance) in cities but costlier in rural settings. Just focusing on urban setting is wrong in this context. It seems discriminatory.