Pretty ambitious, thanks for attempting to quantify this!
Having only quickly skimmed this and not looked into your code (so could be my fault), I find myself a bit confused about the baselines: funding a single research scientist (I’m assuming this means at a lab?) or Ph.D. student for even 5 years seems to unclearly equivalent to 87 or 8 adjusted counterfactual years of research—I’d imagine it’s much less than that. Could you provide some intuition for how the baseline figures are calculated (maybe you are assuming second-order effects, like funded individuals getting interested in safety and doing more or it or mentoring others under them)?
I was initially confused about this too. But I think I understood some of what was going on on a second skim:
Recall that their metric for “quality-adjusted research year” assumes a year of research “of average ability relative to the ML research community,” and “Working on a research avenue as relevant as adversarial robustness.”
I think their baselines are assuming a) much higher competency than the ML research community average and b) that the research avenues in question are considerably more impactful than the standard unit of analysis.
“Scientist Trojans” and “PhD Trojans” are hypothetical programs, wherein a research scientist or a PhD student is funded for 1 or 5 years, respectively. This funding causes the scientist or PhD student to work on trojans research (a research avenue that CAIS believes is 10x the relevance of adversarial robustness) rather than a research avenue that CAIS considers to have limited relevance to AI safety (0x). Unlike participants considered previously in this post, the scientist or PhD student has ability 10x the ML research community average — akin to assuming that the program reliably selects unusually productive researchers. The benefits of these programs cease after the funding period.
This will naively get you to 100x. Presumably adjusting for counterfactuals means you go a little lower than that. That said, I’m still not sure how they ended up with 84x and 8.7x, or why the two numbers are so different from each other.
I’m still not sure how they ended up with 84x and 8.7x
the answer is discounting for time and productivity. Consider the 84x for research scientists. With a 20% annual research discount rate, the average value of otherwise-identical research relative to the present is a bit less than 0.9. And productivity relative to peak is very slightly less than 1. These forces move the 100 to 84.
Regarding
why the two numbers [84x and 8.7x] are so different from each other
the answer is mainly differences in productivity relative to peak and scientist-equivalence. As in the plots in this section, PhD students midway through their PhD are ~0.5x as productive as they will be at their career peak. And, as in this section, we value PhD student research labor at 0.1x that of research scientists. The other important force is the length of a PhD—the research scientist is assumed to be working for 1 year whilst the PhD student is funded for 5 years, which increases the duration of the treatment effect and decreases the average time value of research.
Very roughly: 100x baseline you identified * ~0.5x productivity * 0.1x scientist-equivalence * 5 years * ~0.5 average research discount rate = 12.5. (Correcting errors in these rough numbers takes us to 8.4.)
That makes sense, thanks for the explanation! Yeah still a bit confused why they chose different numbers of years for the scientist and PhD, how those particular numbers arise, and why they’re so different (I’m assuming it’s 1 year of scientist funding or 5 years of PhD funding).
Yup, wanted to confirm here the ~100x in efficacy comes from getting 10x in relevance and 10x in ability (from selecting someone 10x better than the average research scientist).
Regarding the relative value of PhD vs scientist: the model currently values the average scientist at ~10x the average PhD at graduation (which seem broadly consistent with the selectivity of becoming a scientist and likely underrepresents the research impact as measured by citations—the average scientist likely has more than 10x the citation count as the average PhD). Then, the 5 years includes the PhD growing significantly as they gain more research experience, so the earlier years will not be as productive as their final year.
I’m confused where these assumptions are stored. All of the parameter files I see in GitHub have all of the `ability_at_*` variables set equal to one. And when I print out the average of `qa.mean_ability_piecewise` for all the models that also appears to be one. Where is the 10x coming from?
Pretty ambitious, thanks for attempting to quantify this!
Having only quickly skimmed this and not looked into your code (so could be my fault), I find myself a bit confused about the baselines: funding a single research scientist (I’m assuming this means at a lab?) or Ph.D. student for even 5 years seems to unclearly equivalent to 87 or 8 adjusted counterfactual years of research—I’d imagine it’s much less than that. Could you provide some intuition for how the baseline figures are calculated (maybe you are assuming second-order effects, like funded individuals getting interested in safety and doing more or it or mentoring others under them)?
(I only have the post itself to go off of)
I was initially confused about this too. But I think I understood some of what was going on on a second skim:
Recall that their metric for “quality-adjusted research year” assumes a year of research “of average ability relative to the ML research community,” and “Working on a research avenue as relevant as adversarial robustness.”
I think their baselines are assuming a) much higher competency than the ML research community average and b) that the research avenues in question are considerably more impactful than the standard unit of analysis.
This will naively get you to 100x. Presumably adjusting for counterfactuals means you go a little lower than that. That said, I’m still not sure how they ended up with 84x and 8.7x, or why the two numbers are so different from each other.
This is broadly correct!
Regarding
the answer is discounting for time and productivity. Consider the 84x for research scientists. With a 20% annual research discount rate, the average value of otherwise-identical research relative to the present is a bit less than 0.9. And productivity relative to peak is very slightly less than 1. These forces move the 100 to 84.
Regarding
the answer is mainly differences in productivity relative to peak and scientist-equivalence. As in the plots in this section, PhD students midway through their PhD are ~0.5x as productive as they will be at their career peak. And, as in this section, we value PhD student research labor at 0.1x that of research scientists. The other important force is the length of a PhD—the research scientist is assumed to be working for 1 year whilst the PhD student is funded for 5 years, which increases the duration of the treatment effect and decreases the average time value of research.
Very roughly: 100x baseline you identified * ~0.5x productivity * 0.1x scientist-equivalence * 5 years * ~0.5 average research discount rate = 12.5. (Correcting errors in these rough numbers takes us to 8.4.)
That makes sense, thanks for the explanation! Yeah still a bit confused why they chose different numbers of years for the scientist and PhD, how those particular numbers arise, and why they’re so different (I’m assuming it’s 1 year of scientist funding or 5 years of PhD funding).
Yup, wanted to confirm here the ~100x in efficacy comes from getting 10x in relevance and 10x in ability (from selecting someone 10x better than the average research scientist).
Regarding the relative value of PhD vs scientist: the model currently values the average scientist at ~10x the average PhD at graduation (which seem broadly consistent with the selectivity of becoming a scientist and likely underrepresents the research impact as measured by citations—the average scientist likely has more than 10x the citation count as the average PhD). Then, the 5 years includes the PhD growing significantly as they gain more research experience, so the earlier years will not be as productive as their final year.
I’m confused where these assumptions are stored. All of the parameter files I see in GitHub have all of the `ability_at_*` variables set equal to one. And when I print out the average of `qa.mean_ability_piecewise` for all the models that also appears to be one. Where is the 10x coming from?