This looks pretty similar to a model I wrote with Nick Dunkley way back in the 2012 (part 1, part 2). I still stand by that as a reasonable stab at the problem, so I also think your model is pretty reasonable :)
Charity population:
You’re assuming a fixed pool of charities, which makes sense given the evidence gathering strategy you’ve used (see below). But I think it’s better to model charities as an unbounded population following the given distribution, from which we can sample.
That’s because we do expect new opportunities to arise. And if we believe that the distribution is heavy-tailed, a large amount of our expected value may come from the possibility of eventually finding something way out in the tails. In your model we only ever get N opportunities to get a really exceptional charity—after that we are just reducing our uncertainty. I think we want to model the fact that we can keep looking for things out in the tails, even if they maybe don’t exist yet.
I do think that a lognormal is a sensible distribution for charity effectiveness. The real distribution may be broader, but that just makes your estimate more conservative, which is probably fine. I just did the boring thing and used the empirical distribution of the DCP intervention cost-effectivenss (note: interventions, not charities).
Evidence gathering strategy:
You’re assuming that the evaluator does a lot of evaluating: they evaluate every charity in the pool in every round. In some sense I suppose this is true, in that charities which are not explicitly “investigated” by an evaluator can be considered to have failed the first test by not being notable enough to even be considered. However, I still think this is somewhat unrealistic and is going to drive diminishing returns very quickly, since we’re really just waiting for the errors for the various charities settle down so that the best charity becomes apparent.
I modelled this as the process as the evaluator sequentially evaluating a single charity, chosen at random (with replacement). This is also unrealistic, because in fact an evaluator won’t waste their time with things that are obviously bad, but even with this fairly conservative strategy things turned out pretty well.
I think it’s interesting to think what happens when model the pool more explicitly, and consider strategies like investigating the top recommendation further to reduce error.
Increasing scale with money moved:
Charity evaluators have the wonderful feature that their effectiveness scales more or less linearly with the amount of money they move (assuming that the money all goes to their top pick). This is a pretty great property, so worth mentioning.
The big caveat there is room for more funding, or saturation of opportunities. I’m not sure how best to model this. We could model charities as rather “deposits” of effectiveness that are of a fixed size when discovered, and can be exhausted. I don’t know how that would change things, but I’d be interested to see! In particular, I suspect it may be important how funding capacity co-varies with effectiveness. If we find a charity with a cost-effectiveness that’s 1000x higher than our best, but it can only take a single dollar, then that’s not so great.
The fact that sometimes people’s estimates of impact are subsequently revised down by several orders of magnitude seems like strong evidence against evidence being normally distributed around the truth. I expect that if anything it is broader than lognormally distributed. I also think that extra pieces of evidence are likely to be somewhat correlated in their error, although it’s not obvious how best to model that.
I expect that if anything it is broader than lognormally distributed.
It might depend what we’re using the model for.
In general, it does seem reasonable that direct (expected) net impact of interventions should be broader than lognormal, as Carl argued in 2011. On the other hand, it seems like the expected net impact all things considered shouldn’t be broader than lognormal. For one argument, most charities probably funge against each other by at least 1/10^6. For another, you can imagine that funding global health improves the quality of research a bit, which does a bit of the work that you’d have wanted done by funding a research charity. These kinds of indirect effects are hard to map. Maybe people should think more about them.
AFAICT, the basic thing for a post like this one to get right is to compare apples with apples. Tom is trying to evaluate various charities, of which some are evaluators. If he’s evaluating the other charities on direct estimates, and is not smoothing the results over by assuming indirect effects, then he should use a broader than lognormal assumption for the evaluators too (and they will be competitive). If he’s taking into account that each of the other charities will indirectly support the cause of one another (or at least the best ones will), then he should assume the same for the charity evaluators.
I could be wrong about some of this. A couple of final remarks: it gets more confusing if you think lots of charities have negative value e.g. because of the value of technological progress. Also, all of this makes me think that if you’re so convinced that flow-through effects cause many charities to have astronomical benefits, perhaps you ought to be studying these effects intensely and directly, although that admittedly does seem counterintuitive to me, compared with working on problems of known astronomical importance directly.
I largely agree with these considerations about the distribution of net impact of interventions (although with some possible disagreements, e.g. I think negative funging is also possible).
However, I actually wasn’t trying to comment on this at all! I was talking about the distribution of people’s estimates of impact around the true impact for a given intervention. Sorry for not being clearer :/
This looks pretty similar to a model I wrote with Nick Dunkley way back in the 2012 (part 1, part 2). I still stand by that as a reasonable stab at the problem, so I also think your model is pretty reasonable :)
Charity population:
You’re assuming a fixed pool of charities, which makes sense given the evidence gathering strategy you’ve used (see below). But I think it’s better to model charities as an unbounded population following the given distribution, from which we can sample.
That’s because we do expect new opportunities to arise. And if we believe that the distribution is heavy-tailed, a large amount of our expected value may come from the possibility of eventually finding something way out in the tails. In your model we only ever get N opportunities to get a really exceptional charity—after that we are just reducing our uncertainty. I think we want to model the fact that we can keep looking for things out in the tails, even if they maybe don’t exist yet.
I do think that a lognormal is a sensible distribution for charity effectiveness. The real distribution may be broader, but that just makes your estimate more conservative, which is probably fine. I just did the boring thing and used the empirical distribution of the DCP intervention cost-effectivenss (note: interventions, not charities).
Evidence gathering strategy:
You’re assuming that the evaluator does a lot of evaluating: they evaluate every charity in the pool in every round. In some sense I suppose this is true, in that charities which are not explicitly “investigated” by an evaluator can be considered to have failed the first test by not being notable enough to even be considered. However, I still think this is somewhat unrealistic and is going to drive diminishing returns very quickly, since we’re really just waiting for the errors for the various charities settle down so that the best charity becomes apparent.
I modelled this as the process as the evaluator sequentially evaluating a single charity, chosen at random (with replacement). This is also unrealistic, because in fact an evaluator won’t waste their time with things that are obviously bad, but even with this fairly conservative strategy things turned out pretty well.
I think it’s interesting to think what happens when model the pool more explicitly, and consider strategies like investigating the top recommendation further to reduce error.
Increasing scale with money moved:
Charity evaluators have the wonderful feature that their effectiveness scales more or less linearly with the amount of money they move (assuming that the money all goes to their top pick). This is a pretty great property, so worth mentioning.
The big caveat there is room for more funding, or saturation of opportunities. I’m not sure how best to model this. We could model charities as rather “deposits” of effectiveness that are of a fixed size when discovered, and can be exhausted. I don’t know how that would change things, but I’d be interested to see! In particular, I suspect it may be important how funding capacity co-varies with effectiveness. If we find a charity with a cost-effectiveness that’s 1000x higher than our best, but it can only take a single dollar, then that’s not so great.
The fact that sometimes people’s estimates of impact are subsequently revised down by several orders of magnitude seems like strong evidence against evidence being normally distributed around the truth. I expect that if anything it is broader than lognormally distributed. I also think that extra pieces of evidence are likely to be somewhat correlated in their error, although it’s not obvious how best to model that.
It might depend what we’re using the model for.
In general, it does seem reasonable that direct (expected) net impact of interventions should be broader than lognormal, as Carl argued in 2011. On the other hand, it seems like the expected net impact all things considered shouldn’t be broader than lognormal. For one argument, most charities probably funge against each other by at least 1/10^6. For another, you can imagine that funding global health improves the quality of research a bit, which does a bit of the work that you’d have wanted done by funding a research charity. These kinds of indirect effects are hard to map. Maybe people should think more about them.
AFAICT, the basic thing for a post like this one to get right is to compare apples with apples. Tom is trying to evaluate various charities, of which some are evaluators. If he’s evaluating the other charities on direct estimates, and is not smoothing the results over by assuming indirect effects, then he should use a broader than lognormal assumption for the evaluators too (and they will be competitive). If he’s taking into account that each of the other charities will indirectly support the cause of one another (or at least the best ones will), then he should assume the same for the charity evaluators.
I could be wrong about some of this. A couple of final remarks: it gets more confusing if you think lots of charities have negative value e.g. because of the value of technological progress. Also, all of this makes me think that if you’re so convinced that flow-through effects cause many charities to have astronomical benefits, perhaps you ought to be studying these effects intensely and directly, although that admittedly does seem counterintuitive to me, compared with working on problems of known astronomical importance directly.
I largely agree with these considerations about the distribution of net impact of interventions (although with some possible disagreements, e.g. I think negative funging is also possible).
However, I actually wasn’t trying to comment on this at all! I was talking about the distribution of people’s estimates of impact around the true impact for a given intervention. Sorry for not being clearer :/