I am worried that the cited data does not really inform this question—as we can always choose solutions that leverage “conjunctions of multipliers” (e.g. advocacy, changing trajectories) so that real variance also in solutions should be *much* larger than 10x for anyone being a funder within a cause area.
To make this more concrete, when choosing what to fund in climate one would not choose between different policies or lifestyle actions (the evidence for climate presented here), but between fundable opportunities that stack different impact multipliers on top of each other, e.g. advocacy instead of direct action, supporting policies with large expected long-term consequences (e.g. by accelerating technological change, whereas—AFAICT—the data from Gillngham and Stock displayed here is their static case which they describe as focused on current technology and project cost, contrasted to their dynamic case studies which seems much more likely to be something attractive to fund), etc.
So, it seems to me the evidence presented significantly underplays the real variance in solution effectiveness that a funder faces because it uses data on single-variable direct intervention variance as a proxy for variance in effectiveness of interventions, despite the most effective interventions not usually being direct actions (certainly outside GHD, most EA funding does not buy equivalents of malaria nets for ex-risk etc.) and not actions where impact differentials can be easily quantified with certainty (despite being very large in expectation).
This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.
Hey, thanks for the comments. Here are some points that might help us get on the same page:
1) I agree this data is missing difficult-to-measure hits based interventions, like research and advocacy, which means it’ll understate the degree of spread.
I discuss that along with other ways it could understate the differences here:
2) Aside: I’m not sure conjunction of multipliers is the best way to illustrate this point. Each time you add a multiplier it increases the chance it doesn’t work at all. I doubt the optimal degree of leverage in all circumstances is “the most possible”, which is why Open Philanthropy supports interventions with a range of degree of multipliers (including those without), rather than putting everything into the most multiplied thing possible (research into advocacy into research into malaria..). (Also if adding multipliers is the right way to think about it, this data still seems relevant, since it tells you the variance of what you’re multiplying in the first place.)
3) My comparison is between the ex ante returns of top solutions and the mean of the space.
Even if you can pick the top 1% of solutions with certainty, and the other 99% achieve nothing, then your selection is only ~100x the mean. And I’m skeptical we can pick the top 1% in most cause areas, so that seems like an upper bound. E.g. in most cases (esp things like advocacy) I think there’s more than a 1% chance of picking something net harmful, which would already take us out of the top 1% in expectation.
4) There are also major ways the data overstates differences in spread, like regression to the mean.
The data shows the top are ~10x the mean. If you were optimistic about getting a big multiplier on those, that maybe could get you to 1,000x. But then when we take into account regression to the mean, that can easily reduce spread another 10x, getting us back to something like 100x.
That seems plausible but pretty optimistic to me. My overall estimate for top vs. mean is ~10x, but with a range of 3-100x.
5)
>This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.
I agree estimates of cause spread should be regressed more than solution spread. I’ve tried to take this into account, but could have underestimated it.
Hey Ben, thanks for the replies—adding some more to get closer to the same page 🙂
Re your 1), my criticism here is more one of emphasis and of the top-line messaging, as you indeed mention these cases of advocacy and research.
I just think that these cases are rather fundamental and affecting the conclusions very significantly—because we are almost never in the situation that all we can choose from are direct interventions so the solution space (and with it, the likely variance) will almost always look quite different than what is discussed as primary evidence in the article (that does not mean we will never choose direct interventions, to be sure, just that the variance of solutions will mostly be one that emerges from the conjunction of impact differentials).
Re your 2), I think this is mostly a misunderstanding—my comment was also very quickly written, apologies.
I am not saying we should always choose the most leveraged thing ever, but rather that the solution space will essentially always be structured by conjunction of multipliers. There are reasons to not only choose the most leveraged solution, as you point out, but I don’t think this is enough to argue that the most effective actions will not usually be conjunctive ones.
I agree that the data in the article is useful for specifying the shape of a particular impact differential, I am mostly arguing that it understates the variance of the solution space.
(I worry that we are mixing expected and realized value here, I am mostly talking about conjunctive strategies affecting how the variance of the solution space looks like on expected value, this does not preclude the realized value sometimes being zero (and that risk aversion or other considerations can drive us to prefer less leveraged actions.)).
Re your 3) & 4) I agree—my understanding was that these are the factors that lead you to only 10x and my comment was merely that I think direct intervention space variance is not that informative with regards to solution selection in most decision contexts. Aside: I agree with you that I don’t think that advocacy by itself is a 100x multiplier in expectation.
I’d also add it would be great if there was more work to empirically analyse ex ante and ex post spread among hits based interventions with multiple outcomes. I could imagine it leading to a somewhat different picture, though I think the general thrust will still hold, and I still thinking looking at spread among measurable interventions can help to inform intuitions about the hits based case.
One example of work in this area is this piece by OP, where they say they believe they found some 100x and a few 1000x multipliers on cash transfers to US citizens by e.g. supporting advocacy into land use reform. But this involves an element of cause selection as well as solution selection, cash transfers seem likely below the mean, and this was based on BOTECs that will contain a lot of model error and so should be further regressed. Overall I’d say this is consistent with within-cause differences of ~10x from top to mean, and doesn’t support > 100x differences.
I agree that this would be great to exist, though it is likely very hard and the examples that will exist soon will not be the strongest ones (given how effects can become visible over longer time-frames, e.g. how OP discusses green revolution and other interventions that took many years to have the large effects we can now observe).
One small extra data point that might be useful: I made a rough estimate for smallpox eradication in the post, finding it fell in the top 0.1% of the distribution for global health, so it seemed consistent.
I am worried that the cited data does not really inform this question—as we can always choose solutions that leverage “conjunctions of multipliers” (e.g. advocacy, changing trajectories) so that real variance also in solutions should be *much* larger than 10x for anyone being a funder within a cause area.
To make this more concrete, when choosing what to fund in climate one would not choose between different policies or lifestyle actions (the evidence for climate presented here), but between fundable opportunities that stack different impact multipliers on top of each other, e.g. advocacy instead of direct action, supporting policies with large expected long-term consequences (e.g. by accelerating technological change, whereas—AFAICT—the data from Gillngham and Stock displayed here is their static case which they describe as focused on current technology and project cost, contrasted to their dynamic case studies which seems much more likely to be something attractive to fund), etc.
So, it seems to me the evidence presented significantly underplays the real variance in solution effectiveness that a funder faces because it uses data on single-variable direct intervention variance as a proxy for variance in effectiveness of interventions, despite the most effective interventions not usually being direct actions (certainly outside GHD, most EA funding does not buy equivalents of malaria nets for ex-risk etc.) and not actions where impact differentials can be easily quantified with certainty (despite being very large in expectation).
This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.
Hey, thanks for the comments. Here are some points that might help us get on the same page:
1) I agree this data is missing difficult-to-measure hits based interventions, like research and advocacy, which means it’ll understate the degree of spread.
I discuss that along with other ways it could understate the differences here:
https://80000hours.org/2023/02/how-much-do-solutions-differ-in-effectiveness/#ways-the-data-could-understate-differences-between-the-best-and-typical-interventions
2) Aside: I’m not sure conjunction of multipliers is the best way to illustrate this point. Each time you add a multiplier it increases the chance it doesn’t work at all. I doubt the optimal degree of leverage in all circumstances is “the most possible”, which is why Open Philanthropy supports interventions with a range of degree of multipliers (including those without), rather than putting everything into the most multiplied thing possible (research into advocacy into research into malaria..). (Also if adding multipliers is the right way to think about it, this data still seems relevant, since it tells you the variance of what you’re multiplying in the first place.)
3) My comparison is between the ex ante returns of top solutions and the mean of the space.
Even if you can pick the top 1% of solutions with certainty, and the other 99% achieve nothing, then your selection is only ~100x the mean. And I’m skeptical we can pick the top 1% in most cause areas, so that seems like an upper bound. E.g. in most cases (esp things like advocacy) I think there’s more than a 1% chance of picking something net harmful, which would already take us out of the top 1% in expectation.
4) There are also major ways the data overstates differences in spread, like regression to the mean.
The data shows the top are ~10x the mean. If you were optimistic about getting a big multiplier on those, that maybe could get you to 1,000x. But then when we take into account regression to the mean, that can easily reduce spread another 10x, getting us back to something like 100x.
That seems plausible but pretty optimistic to me. My overall estimate for top vs. mean is ~10x, but with a range of 3-100x.
5)
>This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.
I agree estimates of cause spread should be regressed more than solution spread. I’ve tried to take this into account, but could have underestimated it.
In general I think regression to the mean is a very interesting avenue for developing a critique of core EA ideas.
Hey Ben, thanks for the replies—adding some more to get closer to the same page 🙂
Re your 1), my criticism here is more one of emphasis and of the top-line messaging, as you indeed mention these cases of advocacy and research.
I just think that these cases are rather fundamental and affecting the conclusions very significantly—because we are almost never in the situation that all we can choose from are direct interventions so the solution space (and with it, the likely variance) will almost always look quite different than what is discussed as primary evidence in the article (that does not mean we will never choose direct interventions, to be sure, just that the variance of solutions will mostly be one that emerges from the conjunction of impact differentials).
Re your 2), I think this is mostly a misunderstanding—my comment was also very quickly written, apologies.
I am not saying we should always choose the most leveraged thing ever, but rather that the solution space will essentially always be structured by conjunction of multipliers. There are reasons to not only choose the most leveraged solution, as you point out, but I don’t think this is enough to argue that the most effective actions will not usually be conjunctive ones.
I agree that the data in the article is useful for specifying the shape of a particular impact differential, I am mostly arguing that it understates the variance of the solution space.
(I worry that we are mixing expected and realized value here, I am mostly talking about conjunctive strategies affecting how the variance of the solution space looks like on expected value, this does not preclude the realized value sometimes being zero (and that risk aversion or other considerations can drive us to prefer less leveraged actions.)).
Re your 3) & 4) I agree—my understanding was that these are the factors that lead you to only 10x and my comment was merely that I think direct intervention space variance is not that informative with regards to solution selection in most decision contexts.
Aside: I agree with you that I don’t think that advocacy by itself is a 100x multiplier in expectation.
I’d also add it would be great if there was more work to empirically analyse ex ante and ex post spread among hits based interventions with multiple outcomes. I could imagine it leading to a somewhat different picture, though I think the general thrust will still hold, and I still thinking looking at spread among measurable interventions can help to inform intuitions about the hits based case.
One example of work in this area is this piece by OP, where they say they believe they found some 100x and a few 1000x multipliers on cash transfers to US citizens by e.g. supporting advocacy into land use reform. But this involves an element of cause selection as well as solution selection, cash transfers seem likely below the mean, and this was based on BOTECs that will contain a lot of model error and so should be further regressed. Overall I’d say this is consistent with within-cause differences of ~10x from top to mean, and doesn’t support > 100x differences.
I agree that this would be great to exist, though it is likely very hard and the examples that will exist soon will not be the strongest ones (given how effects can become visible over longer time-frames, e.g. how OP discusses green revolution and other interventions that took many years to have the large effects we can now observe).
One small extra data point that might be useful: I made a rough estimate for smallpox eradication in the post, finding it fell in the top 0.1% of the distribution for global health, so it seemed consistent.