List of ways in which cost-effectiveness estimates can be misleading
In my cost-effectiveness estimate of corporate campaigns, I wrote a list of all the ways in which my estimate could be misleading. I thought it could be useful to have a more broadly-applicable version of that list for cost-effectiveness estimates in general. It could maybe be used as a checklist to see if no important considerations were missed when cost-effectiveness estimates are made or interpreted.
The list below is probably very incomplete. If you know of more items that should be added, please comment. I tried to optimize the list for skimming.
How cost estimates can be misleading
Costs of work of others. Suppose a charity purchases a vaccine. This causes the government to spend money distributing that vaccine. It’s unclear whether the costs of the government should be taken into account. Similarly, it can be unclear whether to take into account the costs that patients have to spend to travel to a hospital to get vaccinated. This is closely related to concepts of leverage and perspective. More on it can be read in Byford and Raftery (1998), Karnofsky (2011), Snowden (2018), and Sethu (2018).
It can be unclear whether to take into account the fixed costs from the past that will not have to be spent again. E.g., costs associated with setting up a charity that are already spent and are not directly relevant when considering whether to fund that charity going forward. However, such costs can be relevant when considering whether to found a similar charity in another country. Some guidelines suggest annualizing fixed costs. When fixed costs are taken into account, it’s often unclear how far to go. E.g., when estimating the cost of distributing a vaccine, even the costs of roads that were built partly to make the distribution easier could be taken into account.
Not taking future costs into account. E.g., an estimate of corporate campaigns may take into account the costs of winning corporate commitments, but not future costs of ensuring that corporations will comply with these commitments. Future costs and effects may have to be adjusted for the possibility that they don’t occur.
Not taking past costs into account. In the first year, a homelessness charity builds many houses. In the second year, it finds homeless people to live in those houses. In the first year, the impact of the charity could be calculated as zero. In the second year, it could be calculated to be unreasonably high. But the charity wouldn’t be able to sustain the cost-effectiveness of the second year.
Not adjusting past or future costs for inflation.
Not taking overhead costs into account. These are costs associated with activities that support the work of a charity. It can include operational, office rental, utilities, travel, insurance, accounting, administrative, training, hiring, planning, managerial, and fundraising costs.
Not taking costs that don’t pay off into account. Nothing But Nets is a charity that distributes bednets that prevent mosquito-bites and consequently malaria. One of their old blog posts, Sauber (2008), used to claim that “If you give $100 of your check to Nothing But Nets, you’ve saved 10 lives.” While it may be true that it costs around $10 or less to provide a bednet, and some bednets save lives, costs of bednets that did not save lives should be taken into account as well. According to GiveWell’s estimates, it currently costs roughly $3,500 for a similar charity (Against Malaria Foundation) to save one life by distributing bednets.
Wiblin (2017) describes a survey in which respondents were asked “How much do you think it would cost a typical charity working in this area on average to prevent one child in a poor country from dying unnecessarily, by improving access to medical care?” The median answer was $40. Therefore, it seems that many people are misled by claims like the one by Nothing But Nets.
Failing to take into account volunteer time as costs. Imagine many volunteers collaborating to do a lot of good, and having a small budget for snacks. Their cost-effectiveness estimate could be very high, but it would be a mistake to expect their impact to double if we double their funding for snacks. Such problems would not happen if we valued one hour of volunteer time at say $10 when estimating costs. The more a charity depends on volunteers, the more this consideration is relevant.
Failing to take into account the counterfactual impact of altruistic employees (opportunity cost). There are hidden costs of employing people who would be doing good even if they weren’t employed by a charity. For example:
A person who used to do earning-to-give is employed by a charity, takes a low salary, and stops donating money. The impact of their lost donations should ideally be somehow added to the cost estimate, but it’s very difficult to do it in practice.
A charity hires the most talented EAs and makes them work on things that are not top priority. Despite amazing results, the charity could be doing harm because the talented EAs could have made more impact by working elsewhere.
Ease of fundraising / counterfactual impact of donations. Let’s say you are deciding which charity you should start. Charity A could do a very cost-effective intervention but only people who already donate to cost-effective charities would be interested in supporting it. Charity B could do a slightly less cost-effective intervention but would have a mainstream appeal and could fundraise from people who don’t donate to any charities or only donate to ineffective charities. Other things being equal, you would do more good by starting Charity B, even though it would be less cost-effective. Firstly, Charity B wouldn’t take funding away from other effective charities.. Secondly, Charity B could grow to be much larger and hence do more good (provided that its intervention is scalable).
Cost of evaluation. Imagine wanting to start a small project and asking for funding from many different EA donors and funds. The main cost of the project might be the time it takes for these EA donors and funds to evaluate your proposal and decide whether to fund it.
How effectiveness estimates can be misleading
Indirect effects. For example, sending clothes to Africa can hurt the local textile industry and cause people to lose their jobs. Saving human lives can increase the human population, which can increase pollution and animal product consumption. Some ways to handle indirect effects are discussed in Hurford (2016).
Effects on the long-term future are especially difficult to predict, but in many cases they could potentially be more important than direct effects.
The value of information/learning from pursuing an intervention is usually not taken into account because it’s difficult to quantify. Methods of analyzing it are reviewed in Wilson (2015).
Limited scope. Normally only the outcomes for individuals directly affected are measured, whereas the wellbeing of others (family, friends, carers, broader society, and different species) also matters.
Over-optimizing for a success metric rather than real impact. Suppose a homelessness charity has a success metric of reducing the number of homeless people in an area. It could simply transport local homeless people into another city where they are still left homeless. Despite the fact that the charity would have no positive impact, it would appear to be very cost-effective according to its success metric.
Counterfactuals. Some of the impacts would have happened anyway. E.g., suppose a charity distributes medicine that people would have bought for themselves if they weren’t given it for free. While the effect of the medicine might be large, the real counterfactual impact of the charity is saving the people the money that they would have used to buy that medicine.
Another possibility is that another charity would have distributed the same medicine to the same people, and now that charity uses its resources for something less effective.
Conflating expected value estimates with effectiveness estimates. There is a difference between a 50% chance to save 10 children, and a 100% chance to save 5 children. Estimates sometimes don’t make a clear distinction.
Diminishing/accelerating returns, room for more funding. If you estimate the impact of the charity and divide it by its budget, you get the cost-effectiveness of an average dollar spent by the charity. It shouldn’t be confused with the marginal cost-effectiveness of an additional donated dollar. They can differ for a variety of reasons. For example:
A limited number of good opportunities. A charity that distributes medicine might be cost-effective on average because it does most of the distributions in areas with a high prevalence of the target disease. However, it doesn’t follow that an additional donation to the charity will be cost-effective because it might fund a distribution in an area with a lower prevalence rate.
A charity is talent-constrained (rather than funding-constrained). That is, a charity may be unable to find people to hire for positions that would allow it to use more money effectively.
Fairness and health equity. Cost-effectiveness estimates typically treat all health gains as equal. However, many think that priority should be given to those with severe health conditions and in disadvantaged communities, even if it leads to less overall decline in suffering or illness (Nord, 2005, Cookson et al. (2017), Kamm (2015)).
Morally questionable means. E.g., a corporate campaign or a lobbying effort could be more effective if it employs tactics that involve lying, blackmail, or bribing. However, many (if not most) people find such actions unacceptable, even if they lead to positive consequences. Since cost-effectiveness estimates only inform us about the consequences, they may provide incomplete information for such people.
Subjective moral assumptions in metrics. To compare charities that pursue different interventions, some charity evaluators assign subjective moral weights to various outcomes. E.g., GiveWell assumes that the “value of averting the death of an individual under 5” is 47 times larger than the value of “doubling consumption for one person for one year.” Readers who would use different moral weights may be mislead by results of such estimates if they don’t examine such subjective assumptions and only look at the results. GiveWell explains their approaches to moral weights in GiveWell (2017).
Health interventions are often measured in disability-adjusted life-years (DALYs) or quality-adjusted life-years (QALYs). These can make analyses misleading, especially when people treat them as if they measure all that matters. For example:
DALYs and QALYs give no weight to happiness beyond relief from illness or disability. E.g., an intervention that increases the happiness of mentally healthy people would register zero benefit.
The ‘badness’ of each health state is normally measured by asking members of the general public how bad they imagine them to be, not using the experience of people with the relevant conditions. Consequently, misperceptions of the general public can skew the results. E.g., some scholars claim that people tend to overestimate the suffering caused by most physical health conditions, while underestimating some mental disorders (Dolan & Kahneman, 2007; Pyne et al., 2009; Karimi et al., 2017).
DALYs and QALYs trade off length and quality of life. This allows comparisons of different kinds of interventions, but can obscure important differences (Farquhar and Owen Cotton-Barratt (2015)).
An estimate is for a specific situation and is not generalizable to other contexts. E.g., just because an intervention was cost-effective in one country, doesn’t mean it will be cost-effective in another. See more on this in Vivalt (2019) and an 80,000 Hours Podcast with the author. According to her findings, this is a bigger issue than one might expect.
Estimates based on past data might not be indicative of the cost-effectiveness in the future:
This can be particularly misleading if you only estimate the cost-effectiveness of one particular period which is atypical. For example, you estimate the cost-effectiveness of giving medicine to everyone during an epidemic. Once, the epidemic passes, the cost-effectiveness will be different. This may have happened to a degree with effectiveness estimates of deworming.
If the past cost-effectiveness is unexpected (e.g., very high), we may expect regression to the mean.
Biased creators. It can be useful to think about the ways in which the creator(s) of an estimate might have been biased and how it could have impacted the results. For example:
A charity might (intentionally or not) overestimate its own impact out of the desire to get more funding. This is even more likely when you consider that employees of a charity might be working for it because they are unusually excited about the charity’s interventions. Even if the estimate is done by a third party, it is usually based on the information that a charity provides, and charities are more likely to present information that shows them in a positive light.
A researcher creating the estimate may want to find that the intervention is effective because that would lead to their work being celebrated more.
Publication bias. Estimations that find that some intervention is cost-effective are more likely to be published and cited. This can lead to situations where interventions seem to have more evidence in favor of them than they should because only the estimations that found it to be impactful were published.
Bias towards measurable results. If a charity’s impact is difficult to measure, it may have a misleadingly low estimated cost-effectiveness, or there may be no estimate of its effects at all. Hence, if we choose a charity that has the highest estimated cost-effectiveness, our selection is biased towards charities whose effects are easier to measure.
Optimizer’s Curse. Suppose you weigh ten identical items with very inaccurate scales. The item that is the heaviest according to your results is simply the item whose weight was the most overestimated by the scales. Now suppose the items are similar but not identical. The item that is the heaviest according to the scales is also the item whose weight is most likely an overestimate.
Similarly, suppose that you make very approximate cost-effectiveness estimates of ten different charities. The charity that seems the most cost-effective according to your estimates could seem that way only because you overestimated its cost-effectiveness, not because it is actually more cost-effective than others.
Consequently, even if we are unbiased in our estimates, we might be too optimistic about charities or activities that seem the most cost-effective. I think this is part of the reason why some people find that “regardless of the cause within which one investigates giving opportunities, there’s a strong tendency for giving opportunities to appear progressively less promising as one learns more.” The more uncertain cost-effectiveness estimates are, the stronger the effect of optimizer’s curse is. Hence we should prefer interventions whose cost-effectiveness estimates are more robust. More on this can be read in Karnofsky (2016).
Some results can be very sensitive to one or more uncertain parameters and consequently, seem more robust than they are. To uncover this, a sensitivity analysis or uncertainty analysis can be performed.
To correctly interpret cost-effectiveness estimates, it’s important to know whether time discounting was applied. Time discounting makes current costs and benefits worth more than those occurring in the future because:
There is a desire to enjoy the benefits now rather than in the future.
There are opportunity costs of spending money now. E.g., if the money was invested rather than spent, it would likely be worth more in a couple of years.
Model uncertainty. That is, uncertainty due to necessary simplification of real-world processes, misspecification of the model structure, model misuse, etc. There are probably some more formal methods to reduce model uncertainty, but personally, I find it useful to create several different models and compare their results. If they all arrive at a similar result in different ways, you can be more confident about the result. The more different the models are, the better.
Wrong factual assumptions. E.g., when estimating the cost-effectiveness of distributing bednets, it would be a mistake to assume that all the people who receive them would use them correctly.
Mistakes in calculations. This includes mistakes in studies that an estimate depends on. As explained in Lesson #1 in Hurford and Davis (2018), such mistakes happen more often than one might think.
Complications of estimating the impact of donated money
Fungibility. If a charity does multiple programs, donating to it could fail to increase spending on the program you want to support, even if you restrict your donation. Suppose a charity was planning to spend $1 million of its unrestricted funds on a program. If you donate $1,000 and restrict it to that program, the charity could still spend exactly $1 million on the program and use an additional $1,000 of unrestricted funds on other programs.
Replaceability of donations. It can sometimes be useful to ask yourself: “Would someone else have fulfilled charity X’s funding gap if I hadn’t?” Note that if someone else would have donated to charity X, they may not have donated money to charity Y (their second option). That said, I think it’s easy to think about this too much. Imagine if all donors in EA were only looking for opportunities that no one else would fund. When an obviously promising EA charity asks for money, all donors might wait until the last minute, thinking that some other donor might fund it instead of them. That would cost more time and effort for both the charity and the potential donors. To avoid this, donors need to coordinate.
Taking donation matching literally. A lot of the time, when someone claims they would match donations to some charity, they would have donated the money that would be used for matching anyway, possibly even to the same charity. This is not always the case though (e.g., employers matching donations to any charity).
Influencing other donors. For example:
Receiving a grant from a respected foundation can increase the legitimacy and the profile of a project and make other funders more willing to donate to it.
Sharing news about individual donations can influence friends to donate as well (Hurford (2014)). Note that the strength of this effect partly depends on the charity you donate to.
Donors can make moral trades to achieve outcomes that are better for everyone involved. E.g., suppose one donor wants to donate $1,000 to a gun control charity, and another donor wants to donate $1,000 to a gun rights charity. These donations may cancel each other out in terms of expected impact. Donors could agree to donate to a charity they both find valuable (e.g., anti-poverty), on the condition that the other one does the same.
Influencing the charity. For example:
Charities may try to do more activities that appeal to their existing and potential funders to secure additional funding.
Letting a charity evaluator (e.g., GiveWell, Animal Charity Evaluators) or a fund (e.g., EA Funds) to direct your donation signals to charities the importance of these evaluators. It can incentivize charities to cooperate with evaluators during evaluations and try to be better according to the metrics that evaluators measure.
Funders can consciously influence the direction of a charity they fund. See more on this in Karnofsky (2015).
There are many important considerations about whether to donate now rather than later. See Wise (2013) for a summary. For example, it’s important to remember that if the money was invested, it would likely have more value in the future.
Tax deductibility. If you give while you are earning money, in some countries (e.g. U.S., UK, Canada) your donations to charities that are registered in your country are tax deductible. This means that the government effectively gives more to the same charity. E.g. see deductibility of ACE-recommended charities here. If you are donating money to a charity registered in another country, there might still be ways to make it tax deductible. E.g., by donating through organizations like RC Forward (which is made for Canadian donors), or using Donation Swap.
I’m a research analyst at Rethink Priorities. The views expressed here are my own and do not necessarily reflect the views of Rethink Priorities.
Author: Saulius Šimčikas. Thanks to Ash Hadjon-Whitmey, Derek Foster, and Peter Hurford for reviewing drafts of this post. Also, thanks to Derek Foster for contributing to some parts of the text.
Byford, S., & Raftery, J. (1998). Perspectives in economic evaluation. Bmj, 316(7143), 1529-1530.
Cookson, R., Mirelman, A. J., Griffin, S., Asaria, M., Dawkins, B., Norheim, O. F., … & Culyer, A. J. (2017). Using cost-effectiveness analysis to address health equity concerns. Value in Health, 20(2), 206-212.
Dolan, P., & Kahneman, D. (2008). Interpretations of utility and their implications for the valuation of health. The economic journal, 118(525), 215-234.
Farquhar, S., Cotton-Barratt, O. (2015). Breaking DALYs down into YLDs and YLLs for intervention comparison
GiveWell. (2017). Approaches to Moral Weights: How GiveWell Compares to Other Actors
Hurford, P. (2014). To inspire people to give, be public about your giving.
Hurford, P. (2016). Five Ways to Handle Flow-Through Effects
Hurford, P., Davis, M. A. (2018). What did we take away from our work on vaccines
Kamm, F. (2015). Cost effectiveness analysis and fairness. Journal of Practical Ethics, 3(1).
Karimi, M., Brazier, J., & Paisley, S. (2017). Are preferences over health states informed?. Health and quality of life outcomes, 15(1), 105.
Karnofsky, H. (2011). Leverage in charity
Karnofsky, H. (2015). Key Questions about Philanthropy, Part 1: What is the Role of a Funder?
Karnofsky, H. (2016). Why we can’t take expected value estimates literally (even when they’re unbiased)
Nord, E. (2005). Concerns for the worse off: fair innings versus severity. Social science & medicine, 60(2), 257-263.
Pyne, J. M., Fortney, J. C., Tripathi, S., Feeny, D., Ubel, P., & Brazier, J. (2009). How bad is depression? Preference score estimates from depressed patients and the general population. Health Services Research, 44(4), 1406-1423.
Sauber, J. (2008). Put your money where your heart is
Sethu, H. (2018). How ranking of advocacy strategies can mislead
Snowden, J. (2018). Revisiting leverage
Vivalt, E. (2019). How Much Can We Generalize from Impact Evaluations?
Wiblin, R. (2017). Most people report believing it’s incredibly cheap to save lives in the developing world
Wilson, E. C. (2015). A practical guide to value of information analysis. Pharmacoeconomics, 33(2), 105-121.
Wise, J. (2013). Giving now vs. later: a summary.
I’ve heard the claim that Nothing But Nets used to say that it costs $10 to provide a bednet because it’s an easy number to remember and think about, despite the fact that it costs less. According to GiveWell, on average the total cost to purchase, distribute, and follow up on the distribution of a bednet funded by Against Malaria Foundation is $4.53. ↩︎
Another example of counterfactuals: suppose there is a very cost-effective stall that gives people vegan leaflets. Someone opens another identical stall right next to it. Half of the people who would have gone to the old stall now go to the new one. The new stall doesn’t attract any people who wouldn’t have been attracted anyway so it has zero impact. But if you estimate its effectiveness ignoring this circumstance, it can still be high ↩︎