Factoring cost-effectiveness
Summary: We can split the cost-effectiveness of an intervention into how good the cause is, and how good the intervention is relative to the cause. This perspective could help our efforts in prioritisation by letting us bring appropriate tools to bear on the different parts.
Cost-effectiveness comparisons
When we choose between giving time or money to different interventions, we’re making a comparison. It’s nice to know what these comparisons come down to. There are a lot of sources of evidence, and different ones will be more appropriate in different contexts.
Say we are comparing between intervention x in cause area X, and intervention y in cause area Y. How they compare depends on things like how well thought-out x and y are, how competent the people and organisations implementing them are, as well as how valuable X is as a whole compared to Y.
These are all important factors in telling us how x and y ultimately compare, but they’re quite different from one another. So it shouldn’t be a surprise if it’s best to use different methods to compare the different factors. I think this is the case.
Consider the equation:
Cost-effectiveness of intervention = (cost-effectiveness of area) * (leverage ratio of intervention)
The left-hand side of this equation expresses how much good is achieved per unit of resources invested in the intervention. For the intervention x we’ll denote this G(x). The right-hand side breaks this up as C(X), how much good is achieved per unit of resources invested in X as a whole, and a ‘leverage ratio’ L(x) which expresses the ratio of how effective x is compared to X as a whole [1].
Now to compare between x and y we’re interested in the ratio G(x)/G(y). We can use the above equation to expand this:
G(x)/G(y) = (C(X)L(x))/(C(Y)L(y)).
This rearranges to:
G(x)/G(y) = C(X)/C(Y) * L(x)/L(y).
Here we’ve split the comparison into two parts, each of which is comparing like with like. This is a good general strategy: making comparisons between dissimilar things is hard, and our intuitions are sometimes terrible at it, so it’s helpful to break it into more comparable chunks [2].
Comparing cause effectiveness
Comparing the cost-effectiveness of different cause areas is quite far removed from everyday experience, and it doesn’t have a good feedback mechanism as it can be hard to tell how much something helped the world even after the fact. Moreover it’s just the right setting for scope insensitivity to cause problems for intuitive judgements. This means that relative to most areas of experience, we should be particularly cautious about putting too much weight on intuitive judgements. This in turn means that it’s an area where explicit models are particularly valuable.
That doesn’t mean that explicit models always trump intuitive judgements in this domain; in particular, simple models often omit important factors that are incorporated into our intuitions. Nor does it mean that we should put all our trust into a single model. But it does mean that it’s particularly valuable to build, critique, and refine models for the cost-effectiveness of different causes. It also means we should put more weight on the outputs of such models than we do in most domains—not because the models are more trustworthy, but because the alternatives are worse than usual. This is why I think developing such models is a high value activity, and why I’ve been spending time on it.
Comparing leverage ratios
The leverage ratios are determined largely by things like: whether the intervention is a sensible way of progressing on the cause; the quality of the team involved; how functional the implementing organisation is. In contrast to the overall effectiveness of a cause, these are the much closer to regular experience, so we should be less keen to use explicit models. On the other hand, methods and experience from valuing shares of companies (which have good feedback mechanisms) should be relevant in this context.
There are several reasons why leverage ratios may vary within a cause area. Many of these will be common across cause areas. Because of this, we might expect similar distributions of leverage ratios in different cause areas (but probably some areas have more variance than others, just as some jobs have more variance in the productivity of employees than others). It could be valuable to have an idea of how much leverage ratios do vary in practice. This is an empirical question we might be able to get data for.
Implications for prioritisation work
To choose between interventions, we need to compare cost-effectiveness. I’ve claimed that this is best done by comparing the cost-effectiveness of cause areas, and comparing the leverage ratios of the interventions. If this is right, what’s more valuable to work on evaluating?
Of course they are complementary to each other. The better we are able to identify the best cause areas, the more valuable it is to have good estimates for the leverage ratios of interventions in those areas. And the better we are able to identify interventions with very high leverage ratios, the more valuable it is to be able to say which of those are in the most effective causes. So the answer depends in part on how much work each is already receiving.
It also depends on your beliefs about which component has more variance. If you think that most of the variation in intervention effectiveness comes from leverage ratios, while cost-effectiveness of causes doesn’t vary that much, then it’s more important to evaluate the leverage ratios of interventions. If on the other hand you think more variation comes between causes, then it’s more important to evaluate cause effectiveness. I currently think there is likely to be more variation in cause effectiveness even after you filter to the ones which could plausibly be high value; however I am quite uncertain about this.
There is also an asymmetry which pushes us towards doing more cause assessment first: it’s much easier to cut down the work of evaluating leverage ratios by restricting to a few causes than it is to cut down the work of evaluating cause effectiveness by first identifying opportunities with high leverage ratios. Similarly, if we identify a cause area which is valuable but see no good interventions available to fund, we can advertise this and hopefully create good interventions in the area.
Of course to support giving decisions today we need to compare leverage ratios as well as cause effectiveness. And in some cases studying the interventions may help us to evaluate the cause effectiveness. But I think it will usually be right to investigate leverage ratios only within cause areas that we think have, or might have, high effectiveness, and only after we’ve made an effort to assess that.
Acknowledgements: thanks to Toby Ord and Nick Beckstead for helpful conversations.
Crossposted from the Global Priorities Project.
[1] The leverage ratio is really a function of x together with X. AMF may have one leverage ratio with respect to the area of global health, and another with respect to malaria treatment.
[2] An extra advantage of breaking the comparisons into like-with-like is that it’s easier to track uncertainty so that it doesn’t blow up unnecessarily. I might be very uncertain about how good X is, so I think C(X) lies somewhere in (1, 100). I might also be very uncertain about how good Y is, so that I think C(Y) lies in (1, 100). But it doesn’t follow that C(X)/C(Y) could lie anywhere in (1/100, 100). If my uncertainty about X is related to my uncertainty about Y (say X is reducing carbon emissions and Y is helping communities adapt to climate change), then I might have a better idea of the ratio C(X)/C(Y) than I do about either individually. Of course this just means that my estimates for C(X) and C(Y) are strongly correlated. But I think it’s helpful to have an idea of practical ways to break up the calculation which help to keep the uncertainty under control. For more thoughts on tracking uncertainty through estimates, see here.
This seems like a good way to separate the more “subjective” parts of judging interventions (cause-effectiveness, which depends on what you care about) from the “objective” parts (leverage ratios, which people should reach a consensus on given enough data).
An organization like GiveWell could compile estimated leverage ratios for different charities, while leaving “cause effectiveness” up to individual donors to decide. They (or others) could then provide multiple estimated cause-effectiveness tables that vary based on common differences in donors’ values (e.g. how much they value the future compared to the present). Then donors could pick which “value system” they liked best and decide donations based on that.
Would you say the Global Priorities Project is primarily focused on estimating L or C? One obvious thing your factorization suggests is that projects estimating C tend to complement projects estimating L (like GiveWell?).
I agree with the thrust of what you are saying here. I’m not sure the “objective/subjective” division quite fits, though. Of course questions of values do come into cause-effectiveness, but I also think there are a lot of questions about matters which are in theory objective, although in practice very difficult to be sure about (for instance “will health or education lead to more economic growth?”).
In this domain the focus has been on improving our comparisons of C—exactly as you say, because it’s a better complement to things that are already being done. We’ve also been exploring some things (e.g. engaging with policy) which aren’t about estimating cost-effectiveness at all. Seb Farquhar will be joining our team in the new year, and we’ll spend some time after that thinking through what the best avenues to pursue going forward are.
Good point—if you just list the effectiveness of a bunch of causes (C1, C2, C3, etc.) the shared downstream effects (like your example of economic growth) probably make them non-independent, unless you have very unusual values.