“think the problem is that it’s hard to establish expert “baselines” via which to measure uplift”
If you could find enough experts (say 100) then randomisation is probably enough to solve this problem even if they have a wide range of capabilities. I agree though that a category such as “2-5 years post-doc would be even nicer. Maybe could find a couple of large PHD or Post-doc cohorts.
“think the problem is that it’s hard to establish expert “baselines” via which to measure uplift”
If you could find enough experts (say 100) then randomisation is probably enough to solve this problem even if they have a wide range of capabilities. I agree though that a category such as “2-5 years post-doc would be even nicer. Maybe could find a couple of large PHD or Post-doc cohorts.