I made a similar comment on Noah’s original submission: I think the optimizer’s curse will not be a serious problem, if two conditions hold.
We don’t care about learning the actual cost-effectiveness of the interventions we select; we only care about ranking them correctly.
The distribution of true cost-effectiveness for interventions has a thicker tail than the distribution of errors in our estimates.
1 is a normative question, and it’s my understanding of what GiveWell’s goals are (since it ties most directly to the question of whether to fund the intervention). We want to select the best interventions: the cost-effectiveness number is just a means to that end.
2 is an empirical question. It’s well known that the distribution of true cost-effectiveness is fat tailed in many domains, but it’s not really well known what the distribution of errors in estimates is. Sometimes people slap a lognormal assumption into their Monte Carlo simulations and run with it, but I don’t think that’s principled at all, and it’s more likely to be a product of normal variables. GiveWell has the tools to estimate this, though.
What happens if both 1 and 2 are true? Then our decision problem is simply to identify the top N interventions. The main risk is that uncertainty might cause us to label an intervention as being in the top N when it’s actually not. However, since true cost effectiveness has a fat tail, the top N interventions are miles better than the rest. For a lemon intervention to sneak into our top N interventions, it would have to have a very high error. But condition 2 implies that the error variance is too small to generate errors large enough to knock a lemon intervention into the top! So when you estimate an intervention as “top”, the prior probability of an error large enough to support a lemon getting that estimate is low, so the probability that this really is a top intervention is high. Thus, optimizer’s curse is not a problem.
This is an argument sketch, not a proof by any means. Obviously, 2 could fall apart in reality. I have done simulations with some other cost-effectiveness data that suggest it holds there, but I haven’t written that up, and it’s not the same as GiveWell’s set of interventions. The point of this sketch is to suggest that GiveWell should look more into condition 2. It could shed light on whether the optimizer’s curse is really important in this setting.
If 2 holds, the risk of noise causing interventions to be re-ranked is small, because the noise distribution is more compressed than the true gap between interventions.
I made a similar comment on Noah’s original submission: I think the optimizer’s curse will not be a serious problem, if two conditions hold.
We don’t care about learning the actual cost-effectiveness of the interventions we select; we only care about ranking them correctly.
The distribution of true cost-effectiveness for interventions has a thicker tail than the distribution of errors in our estimates.
1 is a normative question, and it’s my understanding of what GiveWell’s goals are (since it ties most directly to the question of whether to fund the intervention). We want to select the best interventions: the cost-effectiveness number is just a means to that end.
2 is an empirical question. It’s well known that the distribution of true cost-effectiveness is fat tailed in many domains, but it’s not really well known what the distribution of errors in estimates is. Sometimes people slap a lognormal assumption into their Monte Carlo simulations and run with it, but I don’t think that’s principled at all, and it’s more likely to be a product of normal variables. GiveWell has the tools to estimate this, though.
What happens if both 1 and 2 are true? Then our decision problem is simply to identify the top N interventions. The main risk is that uncertainty might cause us to label an intervention as being in the top N when it’s actually not. However, since true cost effectiveness has a fat tail, the top N interventions are miles better than the rest. For a lemon intervention to sneak into our top N interventions, it would have to have a very high error. But condition 2 implies that the error variance is too small to generate errors large enough to knock a lemon intervention into the top! So when you estimate an intervention as “top”, the prior probability of an error large enough to support a lemon getting that estimate is low, so the probability that this really is a top intervention is high. Thus, optimizer’s curse is not a problem.
This is an argument sketch, not a proof by any means. Obviously, 2 could fall apart in reality. I have done simulations with some other cost-effectiveness data that suggest it holds there, but I haven’t written that up, and it’s not the same as GiveWell’s set of interventions. The point of this sketch is to suggest that GiveWell should look more into condition 2. It could shed light on whether the optimizer’s curse is really important in this setting.
I agree with your point 2. To be Bayesian: if your prior is much more uncertain than you likelihood, the likelihood dominates the posterior.
Isn’t 1 addressed by Noah’s submission? That you will rank noisily-estimated interventions higher.
If 2 holds, the risk of noise causing interventions to be re-ranked is small, because the noise distribution is more compressed than the true gap between interventions.