It’s hard to tell without seeing the data, but do you think you might have faced a range restriction problem here? i.e. if you’re admitting only people with the highest scores, and then seeing whether the scores of those people correlate with outcomes, you will likely have relatively little variation in the predictor variable.
Yes, this is definitely a concern, for some cohorts more than others. Here are the number of people we interviewed each cohort:
Fall 2018: 37
Spring 2019: 22
Fall 2019: 30
Spring 2020: 22
Summer 2020: 40
So for Fall 2018 and Summer 2020, I think the case can be made that the range restriction effects might be high (given we have admitted ~15 fellows). For the Spring fellowships, we admitted the majority of applications and thus there should be more differentiation in the predictor variable.
Thanks for the info! I guess that even if you aren’t applying such strong selection pressure yourselves in some of these years, it could still be that all your applicants are sufficiently high in whatever the relevant factor is (there may be selection effects prior to your selection) that the measure doesn’t make much difference. This would still might suggest that you shouldn’t select based on this measure (at least while the applicant pool remains similar), but the same might not apply to other groups (who may have a less selective applicant pool).
There are definitely a lot of selection effects prior to us making our selection. I think what we are trying to say is that our selections based on interview scores were not very helpful. Perhaps, they would be helpful if our system worked very differently (for instance, if just interviewed anyone who put down their email). But it seems like with the selection effects we had (have to make an effort to fill out the application, do a small amount of background reading, schedule and show up to an interview) we arrived at a place where our interview scoring system didn’t do a good job further narrowing down the applicants.
We definitely do not mean to say that other groups definitively shouldn’t be selective, or even shouldn’t be selective using our criteria. We just don’t have the evidence to suggest that our criteria were particularly helpful in our case, so we can’t really recommend it for others.
Thanks for writing this up!
It’s hard to tell without seeing the data, but do you think you might have faced a range restriction problem here? i.e. if you’re admitting only people with the highest scores, and then seeing whether the scores of those people correlate with outcomes, you will likely have relatively little variation in the predictor variable.
Yes, this is definitely a concern, for some cohorts more than others. Here are the number of people we interviewed each cohort:
Fall 2018: 37
Spring 2019: 22
Fall 2019: 30
Spring 2020: 22
Summer 2020: 40
So for Fall 2018 and Summer 2020, I think the case can be made that the range restriction effects might be high (given we have admitted ~15 fellows). For the Spring fellowships, we admitted the majority of applications and thus there should be more differentiation in the predictor variable.
Thanks for the info! I guess that even if you aren’t applying such strong selection pressure yourselves in some of these years, it could still be that all your applicants are sufficiently high in whatever the relevant factor is (there may be selection effects prior to your selection) that the measure doesn’t make much difference. This would still might suggest that you shouldn’t select based on this measure (at least while the applicant pool remains similar), but the same might not apply to other groups (who may have a less selective applicant pool).
There are definitely a lot of selection effects prior to us making our selection. I think what we are trying to say is that our selections based on interview scores were not very helpful. Perhaps, they would be helpful if our system worked very differently (for instance, if just interviewed anyone who put down their email). But it seems like with the selection effects we had (have to make an effort to fill out the application, do a small amount of background reading, schedule and show up to an interview) we arrived at a place where our interview scoring system didn’t do a good job further narrowing down the applicants.
We definitely do not mean to say that other groups definitively shouldn’t be selective, or even shouldn’t be selective using our criteria. We just don’t have the evidence to suggest that our criteria were particularly helpful in our case, so we can’t really recommend it for others.
If the data is available, for each of these 5 iterations could you please list the following?
number of people who applied
number of people you interviewed (done)
number of people you admitted
number of people who completed the program