Thanks for publishing this + your code, I found this approach interesting :) and in general I am excited at people trying different approaches to impact measurement within field building.
I had some model qs (fair if you don’t get around to these given that it’s been a while since publication):
We define a QARY as:
A year of research labor (40 hours * 50 weeks),
Conducted by a research scientist (other researcher types will be inflated or deflated),
Of average ability relative to the ML research community (other cohorts will be inflated or deflated),
Working on a research avenue as relevant as adversarial robustness (alternative research avenues will be inflated or deflated),
[...]
I feel confused by the mechanics of especially adjustments 2-4:
On 2: I think you’re estimating these adjustments based on researcher type — what is this based on?
Here, scientists, professors, engineers, and PhD students are assigned ‘scientist-equivalence’ of 1, 10, 0.1, and 0.1 respectively.
On 3: I feel a bit lost at how you’re estimating average ability differences — how did you come up with these numbers?
Given the number of pre-PhD participants each program enrolls, Atlas participants have a mean ability of ~1.1x, Student Group and Undergraduate Stipends ~1x, and MLSS ~0.9x. Student Group PhD students have mean ability ~1.5x.
On 4:
Am I right that this is the place where you adjust for improvements in research agendas (i.e. maybe some people shift from less → more useful agendas as per CAIS’s opinion, but CAIS still considers their former agenda as useful)?
Is that why Atlas gets such a big boost here, because you think it’s more likely that people who go on to do useful AI work via Atlas wouldn’t have done any useful AI work but for Atlas?
I feel confused explicitly how to parse what you’re saying here re: which programs are leading to the biggest improvements in research agendas, and why.
The shaded area indicates research avenue relevance for the average participant with (solid line) and without (dashed line) the program. Note that, after finishing their PhD, some pre-PhD students shift away from high-relevance research avenues, represented as vertical drops in the plot.
In general, I’d find it easier to work with this model if I understood better, for each of your core results, which critical inputs were based on CAIS’s inside views v.s. evidence gathered by the program (feedback surveys, etc.) v.s. something else :)
I’d be interested to know whether CAIS has changed its field building portfolio based on these results / still relies on this approach!
Thanks for publishing this + your code, I found this approach interesting :) and in general I am excited at people trying different approaches to impact measurement within field building.
I had some model qs (fair if you don’t get around to these given that it’s been a while since publication):
I feel confused by the mechanics of especially adjustments 2-4:
On 2: I think you’re estimating these adjustments based on researcher type — what is this based on?
On 3: I feel a bit lost at how you’re estimating average ability differences — how did you come up with these numbers?
On 4:
Am I right that this is the place where you adjust for improvements in research agendas (i.e. maybe some people shift from less → more useful agendas as per CAIS’s opinion, but CAIS still considers their former agenda as useful)?
Is that why Atlas gets such a big boost here, because you think it’s more likely that people who go on to do useful AI work via Atlas wouldn’t have done any useful AI work but for Atlas?
I feel confused explicitly how to parse what you’re saying here re: which programs are leading to the biggest improvements in research agendas, and why.
In general, I’d find it easier to work with this model if I understood better, for each of your core results, which critical inputs were based on CAIS’s inside views v.s. evidence gathered by the program (feedback surveys, etc.) v.s. something else :)
I’d be interested to know whether CAIS has changed its field building portfolio based on these results / still relies on this approach!