I support this idea and have mentioned it previously (e.g. here and here).
This doesn’t have to involve denying ‘treatment’ to people/places—presumably there are more applicants than there are places—you introduce randomisation at the cutoff.
I’m not sure I understand your proposal correctly. To take a concrete example, say 80k gets 500 coaching requests per year and they only have the capacity to coach 250 people. Presumably they select the 250 people they think are most promising, whereas a randomized study would select 250 people randomly and use the remaining 250 as a control. In a sense, this does not involve denying treatment to anyone, since the same number of people (though not the same people) receive coaching, but it does involve a cost in expected impact, which is what matters in this case (and presumably in most other relevant cases—it would be surprising if EA orgs were not prioritizing when they are unable to allocate a resource or service to everyone who requests it). I think the cost is almost certainly justified, given that no randomized studies have been conducted so far and the existing methods of evaluation are often highly speculative, but this doesn’t mean that there are no costs. But as noted, I may be misunderstanding you.
If one is still concerned about the costs, or if randomization is infeasible for other reasons, an alternative is to use a quasi-experimental approach such as a regression discontinuity design. Another alternative is to have a series of Metaculus questions on what the results of the experiment would be if it was conducted, which can be informative even if no experiment is ever conducted.
I just want to add, on top of Haydn’s comment to your comment, that:
You don’t need the treatment and the control group to be of the same size, so you could, for instance, randomize among the top 300 candidates.
In my experience, when there isn’t a clear metric for ordering, it is extremely hard to make clear judgements. Therefore, I think that in practice, it is very likely that let’s say places 100-200 in their ranking seem very similar.
I think that these two factors, combined with Haydn’s suggestion to take the top candidates and exclude them from the study, make it very reasonable, and of very low cost.
Very cool you’ve previously mentioned it—nice that we’ve both been thinking about it!
One proposal is a slight modification. Basically to use your example, you could (a) randomise the entire 250 or (b) you could rank the 500, give the ‘treatment’ to the top 150 say, then randomise 100 ‘treatments’ to 200 around (100 above and 100 below) the cutoff. I think both proposals, or a RDD, would be good—but would defer to advice from actual EA experts on RCTs.
I support this idea and have mentioned it previously (e.g. here and here).
I’m not sure I understand your proposal correctly. To take a concrete example, say 80k gets 500 coaching requests per year and they only have the capacity to coach 250 people. Presumably they select the 250 people they think are most promising, whereas a randomized study would select 250 people randomly and use the remaining 250 as a control. In a sense, this does not involve denying treatment to anyone, since the same number of people (though not the same people) receive coaching, but it does involve a cost in expected impact, which is what matters in this case (and presumably in most other relevant cases—it would be surprising if EA orgs were not prioritizing when they are unable to allocate a resource or service to everyone who requests it). I think the cost is almost certainly justified, given that no randomized studies have been conducted so far and the existing methods of evaluation are often highly speculative, but this doesn’t mean that there are no costs. But as noted, I may be misunderstanding you.
If one is still concerned about the costs, or if randomization is infeasible for other reasons, an alternative is to use a quasi-experimental approach such as a regression discontinuity design. Another alternative is to have a series of Metaculus questions on what the results of the experiment would be if it was conducted, which can be informative even if no experiment is ever conducted.
I just want to add, on top of Haydn’s comment to your comment, that:
You don’t need the treatment and the control group to be of the same size, so you could, for instance, randomize among the top 300 candidates.
In my experience, when there isn’t a clear metric for ordering, it is extremely hard to make clear judgements. Therefore, I think that in practice, it is very likely that let’s say places 100-200 in their ranking seem very similar.
I think that these two factors, combined with Haydn’s suggestion to take the top candidates and exclude them from the study, make it very reasonable, and of very low cost.
Very cool you’ve previously mentioned it—nice that we’ve both been thinking about it!
One proposal is a slight modification. Basically to use your example, you could (a) randomise the entire 250 or (b) you could rank the 500, give the ‘treatment’ to the top 150 say, then randomise 100 ‘treatments’ to 200 around (100 above and 100 below) the cutoff. I think both proposals, or a RDD, would be good—but would defer to advice from actual EA experts on RCTs.