Thanks, this is a very substantial and interesting post.
I accept GiveWell have a robust defence of their approach. They say they prefer to use cost-effectiveness estimates only as one input in their thinking about charities (with ‘track record’ and ‘certainty of results’ being two other important but hard-to-quantify inputs), and therefore (I infer) don’t want to compare charities head-to-head because the 21.2x of AMF is not the same sort of ‘thing’ as the 9.8x of Malaria Consortium. For sure, Health Economists would agree that there may be factors beyond pure cost-effectiveness to consider when making a decision (e.g. equity considerations, commercial negotiation strategies that companies might employ and so on), but typically this consideration happens after the cost-effectiveness modelling, to avoid falling into the trap I mentioned above where you implicitly state that you are working with two different kinds of ‘thing’ even though they actually compete for the same resources [4].
[...] I really do want to stress how jarring it is to see a cost-effectiveness model which doesn’t actually deliver on the promise of guiding resource utilisation at the margin. An economic model is the most transparent and democratic method we have of determining which of a given set of charities will do the most good, and any attempt to use intuition to plug gaps rather than trying to formalise that intuition undoes a lot of the benefit of creating a model in the first place.
Could you clarify (to a layperson) what the disagreement is here?
My understanding: Say we need to choose between interventions A and B, where A and B have outputs of different types. In order to make a choice, we need to make some assumptions—either explicit or implicit—about how to compare those different types of outputs. Either we can make those assumptions explicitly and bake them into the model, or we can model the interventions with separate units, and then make the assumptions (either explicitly or implicitly).
I take it that your fertility analysis did not do this, but that GiveWell does do some of this (e.g. comparing lives saved to increased consumption), only they then take other things—like track record and strength of evidence—into account in addition to the model’s output. Is the disagreement that you think GiveWell should also include these additional considerations in the cost-effectiveness analysis?
Ah sorry, I think I might have confused the issue a bit with my footnote. I think I’ve managed to conflate two issues in your mind.
The first is exactly as you say; any intervention worth doing has some effects which are easy to model and some which are difficult (maybe impossible) to model. What GiveWell has done is completely reasonable here; modelling what it can and then making assumptions about how important the other things, like track record, are in comparison to the main cost-effectiveness results.
The second issue is the more subtle one that I was driving at. Imagine you are going to buy a new car, and your friend (who knows about cars) says that modern cars are 10x more fuel efficient than the car you currently drive. Speaking very roughly, there are two strategies you could pick from to choose your next car:
Completely ignore your friend, and pick the car that has the best MPG regardless of any other feature. This would be a good strategy if literally all you care about is fuel efficiency, but a bad strategy otherwise (because it is unlikely the most fuel efficient car is also the most comfortable to drive—especially if fuel efficiency and comfort are sort-of tradeoffs)
Treat your friend as having offered a useful rule of thumb, and so have an idea in your head about what ‘good’ fuel efficiency looks like. This is a good strategy if cars aren’t really directly comparable along a straightforward scale—a Ford F-150 isn’t ‘better’ or ‘worse’ than a Prius, it is just a different kind of thing.
Both GiveWell (implicitly) and me in my fertility days (explicitly) argue that QALYs are like cars—you can end up in a situation where you can generate different kinds of QALYs and your best bet is to compare them with a rule of thumb like GiveWell’s 10x multiplier. However I don’t think GiveWell is correct in making this assumption about charities—there is in fact a single measure like MPG which we want to ruthlessly optimise, and therefore we do actually want to it the F-150 and Prius directly against each other.
However my point in the essay is that GiveWell don’t actually have to choose—they can build their model as if they are in the first world and directly compare charities together, and then make their final decision as though they are in the second world and different charities will offer different profiles of benefit on top of their cost-effectiveness. This is pretty much the commonsense way of choosing a car too—you would look at MPG and directly compare cars in this way, but you might then consider other factors. It would be weird to lump all cars together in your head as ‘better than 10x my previous efficiency’ or ‘worse than 10x my previous efficiency’.
Thanks, this is a very substantial and interesting post.
Could you clarify (to a layperson) what the disagreement is here?
My understanding: Say we need to choose between interventions A and B, where A and B have outputs of different types. In order to make a choice, we need to make some assumptions—either explicit or implicit—about how to compare those different types of outputs. Either we can make those assumptions explicitly and bake them into the model, or we can model the interventions with separate units, and then make the assumptions (either explicitly or implicitly).
I take it that your fertility analysis did not do this, but that GiveWell does do some of this (e.g. comparing lives saved to increased consumption), only they then take other things—like track record and strength of evidence—into account in addition to the model’s output. Is the disagreement that you think GiveWell should also include these additional considerations in the cost-effectiveness analysis?
Ah sorry, I think I might have confused the issue a bit with my footnote. I think I’ve managed to conflate two issues in your mind.
The first is exactly as you say; any intervention worth doing has some effects which are easy to model and some which are difficult (maybe impossible) to model. What GiveWell has done is completely reasonable here; modelling what it can and then making assumptions about how important the other things, like track record, are in comparison to the main cost-effectiveness results.
The second issue is the more subtle one that I was driving at. Imagine you are going to buy a new car, and your friend (who knows about cars) says that modern cars are 10x more fuel efficient than the car you currently drive. Speaking very roughly, there are two strategies you could pick from to choose your next car:
Completely ignore your friend, and pick the car that has the best MPG regardless of any other feature. This would be a good strategy if literally all you care about is fuel efficiency, but a bad strategy otherwise (because it is unlikely the most fuel efficient car is also the most comfortable to drive—especially if fuel efficiency and comfort are sort-of tradeoffs)
Treat your friend as having offered a useful rule of thumb, and so have an idea in your head about what ‘good’ fuel efficiency looks like. This is a good strategy if cars aren’t really directly comparable along a straightforward scale—a Ford F-150 isn’t ‘better’ or ‘worse’ than a Prius, it is just a different kind of thing.
Both GiveWell (implicitly) and me in my fertility days (explicitly) argue that QALYs are like cars—you can end up in a situation where you can generate different kinds of QALYs and your best bet is to compare them with a rule of thumb like GiveWell’s 10x multiplier. However I don’t think GiveWell is correct in making this assumption about charities—there is in fact a single measure like MPG which we want to ruthlessly optimise, and therefore we do actually want to it the F-150 and Prius directly against each other.
However my point in the essay is that GiveWell don’t actually have to choose—they can build their model as if they are in the first world and directly compare charities together, and then make their final decision as though they are in the second world and different charities will offer different profiles of benefit on top of their cost-effectiveness. This is pretty much the commonsense way of choosing a car too—you would look at MPG and directly compare cars in this way, but you might then consider other factors. It would be weird to lump all cars together in your head as ‘better than 10x my previous efficiency’ or ‘worse than 10x my previous efficiency’.