The simplest thing you could do to improve this would be to measure engagement for all the people who applied and then re-estimate the correlation on the full sample, rather than the selected subsample… However, a lot of them are explicitly linked to participation in the fellowship which biases it towards fellows somewhat, so if you could construct an alternative engagement measure which doesn’t include these, that would likely be better.
The other big issue with this approach is that this would likely be confounded by the treatment effect of being selected for and undertaking the fellowship. i.e. we would hope that going through the fellowship actually makes people more engaged, which would lead to the people with higher scores (who get accepted to the fellowship) also having higher engagement scores.
But perhaps what you had in mind was combining the simple approach with a more complex approach, like randomly selecting people for the fellowship across the range of predictor scores and evaluating the effects of the fellowship as well as the effect of the initial scores?
The other big issue with this approach is that this would likely be confounded by the treatment effect of being selected for and undertaking the fellowship. i.e. we would hope that going through the fellowship actually makes people more engaged, which would lead to the people with higher scores (who get accepted to the fellowship) also having higher engagement scores.
But perhaps what you had in mind was combining the simple approach with a more complex approach, like randomly selecting people for the fellowship across the range of predictor scores and evaluating the effects of the fellowship as well as the effect of the initial scores?