I agree with the general principle: non-robust estimates should be discounted, and thus areas where the evidence is less robust (e.g. animal welfare, and probably a fortiori far future) should be penalized compared to areas with more robust estimates (e.g. global poverty) in addition to any âface valueâ comparison.
I also agree it is plausible that global poverty interventions may be better than interventions in more speculative fields because we are closer to selecting randomly from these wide and unknown distributions in the latter case, so even if the âmeanâ EV from a global health charity is << than the mean EV of (say) a far future cause, the EV of a top Givewell charity may be higher than our best guess for best animal welfare cause, even making generous assumptions about the ancillary issues required (e.g. inter-species welfare comparison, pop ethics, etc. etc. etc.)
However, I think your illustration is probably slanted too unfavourably towards the animal welfare cause, for two reasons.
Due to regression to mean and bog standard measurement error, it seems likely that estimates of the best global poverty interventions will be biased high, ditto any subsequent evaluation which âselects from the topâ (e.g. Givewell, GWWC). So the actual value of best measured charity will be less than 70x greater than the actual mean.
I broadly agree with your remarks about the poverty of the evidence base in animal welfare interventions. However it seems slightly too conservative to discard all information from (e.g.) ACE entirely.
The DCP data does give fair evidence that the distribution for global poverty interventions is approximately log normal and Iâd guess itâs mean is fairly close to the âtrueâ population value. It is unfortunate that there is no similar work giving approximate distribution type or parameters for animal welfare/âglobal poverty/â anything else causes. I would guess it is also lognormally distributed-ish (there or there abouts, I agree with your remarks about plausibly negative values) although with an unclear mean.
I have been working on approaches for corecting regression to the mean issues. Although the results are mathematically immature and probably mistaken (my previous attempt was posted here, which I hope to return to in timeâsee particularly Cotton-Barrattâs remarks), I think the two qualitative takeaways are important. 1) With heavy tailed (e.g. log-normal) distributions, regression to the mean can easily knock orders of magnitude off the estimate for âbest performingâ interventions, 2) regression to the mean can bite much more (namely, orders of magnitude more) off a less robust estimate than a more robust estimate.
For these reasons comparing (e.g.) top Givewell charity estimates to ACEs effectiveness estimates are probably illegitimate, as the latterâs estimates will probably have much greater expected error, in part due to being a smaller org with less human and capital resources to devote to the project, and (probably more importantly) the considerably worse evidence base they have to work with. For similar reasons arguments of the form âI did a fermi estimate with conservative assumptions, and it turns out X has a QALY yield a thousand/âmillion/âwhatever times greater than Givewell top charities, therefore X is betterâ warrant withering scepticism.
How to go further than this likely requires distributional measures we are unlikely to get good access to save for global poverty. There is some research one could perform to get a handle on regression the mean, and potentially some analytic or simulation methods to estimate âtrueâ effectiveness conditioned on the error prone estimate, and I hope to attack some this work in due course. For other fields, similar to Dickens, one may conjecture different distributions and test for sensitivity, my fear is the results of these analyses will prove wholly sensitive to recondite statistical considerations.
I also agree with your remarks that global poverty causes may prove misleadingly robust given the challenge of estimating flow through effects and differential impact given varying normative assumptions. Thus âtrueâ EV of even the most robustly estimate global poverty cause likely has considerable error, and these errors may not be easy to characterize, and plausibly in many cases âpinning downâ the relevant variables may demand a greater sample size than can be obtained prospectively within the Earthâs future light-cone. I leave as an exercise to the reader how this may undermine the method within EA for relying on data, cost effectiveness estimates, and so forth.
I agree with the general principle: non-robust estimates should be discounted, and thus areas where the evidence is less robust (e.g. animal welfare, and probably a fortiori far future) should be penalized compared to areas with more robust estimates (e.g. global poverty) in addition to any âface valueâ comparison.
I also agree it is plausible that global poverty interventions may be better than interventions in more speculative fields because we are closer to selecting randomly from these wide and unknown distributions in the latter case, so even if the âmeanâ EV from a global health charity is << than the mean EV of (say) a far future cause, the EV of a top Givewell charity may be higher than our best guess for best animal welfare cause, even making generous assumptions about the ancillary issues required (e.g. inter-species welfare comparison, pop ethics, etc. etc. etc.)
However, I think your illustration is probably slanted too unfavourably towards the animal welfare cause, for two reasons.
Due to regression to mean and bog standard measurement error, it seems likely that estimates of the best global poverty interventions will be biased high, ditto any subsequent evaluation which âselects from the topâ (e.g. Givewell, GWWC). So the actual value of best measured charity will be less than 70x greater than the actual mean.
I broadly agree with your remarks about the poverty of the evidence base in animal welfare interventions. However it seems slightly too conservative to discard all information from (e.g.) ACE entirely.
The DCP data does give fair evidence that the distribution for global poverty interventions is approximately log normal and Iâd guess itâs mean is fairly close to the âtrueâ population value. It is unfortunate that there is no similar work giving approximate distribution type or parameters for animal welfare/âglobal poverty/â anything else causes. I would guess it is also lognormally distributed-ish (there or there abouts, I agree with your remarks about plausibly negative values) although with an unclear mean.
I have been working on approaches for corecting regression to the mean issues. Although the results are mathematically immature and probably mistaken (my previous attempt was posted here, which I hope to return to in timeâsee particularly Cotton-Barrattâs remarks), I think the two qualitative takeaways are important. 1) With heavy tailed (e.g. log-normal) distributions, regression to the mean can easily knock orders of magnitude off the estimate for âbest performingâ interventions, 2) regression to the mean can bite much more (namely, orders of magnitude more) off a less robust estimate than a more robust estimate.
For these reasons comparing (e.g.) top Givewell charity estimates to ACEs effectiveness estimates are probably illegitimate, as the latterâs estimates will probably have much greater expected error, in part due to being a smaller org with less human and capital resources to devote to the project, and (probably more importantly) the considerably worse evidence base they have to work with. For similar reasons arguments of the form âI did a fermi estimate with conservative assumptions, and it turns out X has a QALY yield a thousand/âmillion/âwhatever times greater than Givewell top charities, therefore X is betterâ warrant withering scepticism.
How to go further than this likely requires distributional measures we are unlikely to get good access to save for global poverty. There is some research one could perform to get a handle on regression the mean, and potentially some analytic or simulation methods to estimate âtrueâ effectiveness conditioned on the error prone estimate, and I hope to attack some this work in due course. For other fields, similar to Dickens, one may conjecture different distributions and test for sensitivity, my fear is the results of these analyses will prove wholly sensitive to recondite statistical considerations.
I also agree with your remarks that global poverty causes may prove misleadingly robust given the challenge of estimating flow through effects and differential impact given varying normative assumptions. Thus âtrueâ EV of even the most robustly estimate global poverty cause likely has considerable error, and these errors may not be easy to characterize, and plausibly in many cases âpinning downâ the relevant variables may demand a greater sample size than can be obtained prospectively within the Earthâs future light-cone. I leave as an exercise to the reader how this may undermine the method within EA for relying on data, cost effectiveness estimates, and so forth.