I just ran the numbers. These are the GMA correlations with an equally-weighted combination of all other instruments of the first three stages (form, CV, work test(s), two interviews). Note that this make the sample size very small:
Research Analyst: 0.19 (N=6)
Operations Analyst: 0.79 (N=4)
First two stages only (CV, form, work test(s)):
Research Analyst: 0.13 (N=9)
Operations Analyst: 0.70 (N=7)
I think the strongest case is their cost-effectiveness in terms of time invested on both sides.
Reference checks can mimic a longer trial which allow you to learn much more about somebody’s behavior and performance in a regular work context. This depends on references being honest and willing to share potential weaknesses of candidates as well. We thought the EA community was very exemplary in this regard.
No reference checks was decisive. I’d imagine this would only be the case for major red flags. Still, they informed our understanding of the relative strengths and weaknesses.
We think they’re great because they’re very cost-effective, and can highlight potential areas of improvement and issues to further investigate in a trial.