In case it’s useful to anyone: that 100k number is ~4-5x the actual cost of increasing the size of a MATS cohort by 1.
edit for more fleshed out thoughts and some questions....
and now edited again to replace those questions with answers, since the doc is available...
Reasoning about how exceptional that exceptional technical researcher is is super hard for me because even very sharp people in the space have highly varied impact (like maybe 4+ OOM between the bottom person I’d describe with the language you used and the top person I’d describe in the same language, e.g. Christiano).
Would have been interested to see a more apples to apples with technical researchers on the policy side. Most technical researchers have at least some research and/or work experience (usually ~5 years of the two combined). One of the policy categories is massively underqualified in comparison, and the other is massively overqualified. I’d guess this is downstream of where the community has set the bar for policy people, but I’d take “has worked long enough to actually Know How Government Works, but has no special connections or string-pulling power” at like >10:1 against the kind of gov researcher listed (although I’d also take that kind of gov researcher at less than half the median exchange rate above).
Surprised a UN AI think tank (a literal first, afaik, and likely a necessary precursor for international coordination or avoiding an arms race) would be rated so low, whereas a US think tank (when many US think tanks, including the most important one, have already pivoted to spending a lot of time thinking about AI) was rated so highly.
Of course, the marginal graduate is worse than the median graduate, and in order for someone to end up participating in MATS many more things need to happen than for MATS to accept them (most MATS students have spent dozens to hundreds of hours reading existing content, or already extensively engaged with existing community institutions, which someone has to pay for).
As such, this at least does not straightforwardly imply that people think MATS should get more funding (I do think MATS probably should get more funding, but I care about the local validity of this argument here).
Ah, really just meant it as a data point and not an argument! I think if I were reading this I’d want to know the above (maybe that’s just because I already knew it?).
But to carry on the thread: It’s not clear to me from what we know about the questions in the survey if ‘creating’ meant ‘courting, retraining’, or ‘sum of all development that made them a good candidate in the first place, plus courting, retraining.’ I’d hope it’s the former, since the latter feels much harder to reason about commutatively. Maybe this ambiguity is part of the ‘roughness’ brought up in the OP.
I’m also not sure if ‘the marginal graduate is worse than the median graduate’ is strongly true. Logically it seems inevitable, but also it’s very hard to know ex ante how good a scholar’s work will be, and I don’t think it’s exactly right to say there’s a bar that gets lowered when the cohort increases in size. We’ve been surprised repeatedly (in both directions) by the contributions of scholars even after we feel we’ve gotten a bead on their abilities (reviewed their research plans, etc).
Often the marginal scholar allows us to support a mentor we otherwise wouldn’t have supported, who may have a very different set of selection criteria than other mentors.
If the marginal scholar is better than the median scholar, why would you just not admit the worst scholars and then admit the better scholars? Clearly the marginal scholar would usually be the worst scholar? Are you saying that if you had half the money that the average quality of the cohort would go down instead of up, and that you would be unable to prioritize only admitting the more competent people?
I think your claim is directionally correct all-else-equal; I just don’t think the effect is big enough in context, with high enough confidence, that it changes the top-line calculation you’re responding to (that 4-5x) at the resolution it was offered (whole numbers).
The naive assumption that scholars can be arranged linearly according to their abilities and admitted one-by-one in accordance with the budget is flawed. If it were true, we could probably say that the marginal MATS scholar at selection was worth maybe <80 percent of the central scholar (the threshold at which I would have written 3-4x above rather than 4-5x). But it’s not true.
Mentors pick scholars based on their own criteria (MATS ~doesn’t mess with this, although we do offer support in the process). Criteria vary significantly between mentors. It’s not the case, for instance, that all of the mentors put together their ordered list of accepted and waitlisted scholars and end up competing for the same top picks. This happens some, but quite rarely relative to the size of the cohort. If what you’ve assumed actually had a strong effect, we’d expect every mentor to have the same (or even very similar) top picks. They simply don’t.
MATS 6 is both bigger and (based on feedback from mentors) more skill-dense than any previous MATS cohort, because it turns out all else does not hold equal as you scale and you can’t treat a talent pipe line like a pressure calculation.
You might believe that there are network effects, or that the “best” people are only willing to come along if there’s a sufficiently large intellectual scene. (Not saying either is likely, just illustrating that the implied underlying model is not a tautology).
I predict the opposite effect—average intellectual scene quality is a much bigger draw than total number of people (MATS is already large). I expect a larger program is actively detrimental for drawing top people
I’m thinking less of total number of people and more like probability of having specific collaborators work in your exact area or are otherwise useful to have around.
My ill-informed impression of the RAND situation was that there’s a new group inside RAND thinking about AI, and its small in personnel and resources compared to RAND at large. Is that not so?
small but expanding (like everything in the space) is my understanding; there are also a lot of non-rand government and government-adjacent groups devoted to AI safety and nat sec.
I didn’t mean to imply that the org had retooled to become entirely AI-focused or something; sorry if that’s how it read!
In case it’s useful to anyone: that 100k number is ~4-5x the actual cost of increasing the size of a MATS cohort by 1.
edit for more fleshed out thoughts and some questions....
and now edited again to replace those questions with answers, since the doc is available...
Reasoning about how exceptional that exceptional technical researcher is is super hard for me because even very sharp people in the space have highly varied impact (like maybe 4+ OOM between the bottom person I’d describe with the language you used and the top person I’d describe in the same language, e.g. Christiano).
Would have been interested to see a more apples to apples with technical researchers on the policy side. Most technical researchers have at least some research and/or work experience (usually ~5 years of the two combined). One of the policy categories is massively underqualified in comparison, and the other is massively overqualified. I’d guess this is downstream of where the community has set the bar for policy people, but I’d take “has worked long enough to actually Know How Government Works, but has no special connections or string-pulling power” at like >10:1 against the kind of gov researcher listed (although I’d also take that kind of gov researcher at less than half the median exchange rate above).
Surprised a UN AI think tank (a literal first, afaik, and likely a necessary precursor for international coordination or avoiding an arms race) would be rated so low, whereas a US think tank (when many US think tanks, including the most important one, have already pivoted to spending a lot of time thinking about AI) was rated so highly.
Of course, the marginal graduate is worse than the median graduate, and in order for someone to end up participating in MATS many more things need to happen than for MATS to accept them (most MATS students have spent dozens to hundreds of hours reading existing content, or already extensively engaged with existing community institutions, which someone has to pay for).
As such, this at least does not straightforwardly imply that people think MATS should get more funding (I do think MATS probably should get more funding, but I care about the local validity of this argument here).
Ah, really just meant it as a data point and not an argument! I think if I were reading this I’d want to know the above (maybe that’s just because I already knew it?).
But to carry on the thread: It’s not clear to me from what we know about the questions in the survey if ‘creating’ meant ‘courting, retraining’, or ‘sum of all development that made them a good candidate in the first place, plus courting, retraining.’ I’d hope it’s the former, since the latter feels much harder to reason about commutatively. Maybe this ambiguity is part of the ‘roughness’ brought up in the OP.
I’m also not sure if ‘the marginal graduate is worse than the median graduate’ is strongly true. Logically it seems inevitable, but also it’s very hard to know ex ante how good a scholar’s work will be, and I don’t think it’s exactly right to say there’s a bar that gets lowered when the cohort increases in size. We’ve been surprised repeatedly (in both directions) by the contributions of scholars even after we feel we’ve gotten a bead on their abilities (reviewed their research plans, etc).
Often the marginal scholar allows us to support a mentor we otherwise wouldn’t have supported, who may have a very different set of selection criteria than other mentors.
If the marginal scholar is better than the median scholar, why would you just not admit the worst scholars and then admit the better scholars? Clearly the marginal scholar would usually be the worst scholar? Are you saying that if you had half the money that the average quality of the cohort would go down instead of up, and that you would be unable to prioritize only admitting the more competent people?
I think your claim is directionally correct all-else-equal; I just don’t think the effect is big enough in context, with high enough confidence, that it changes the top-line calculation you’re responding to (that 4-5x) at the resolution it was offered (whole numbers).
The naive assumption that scholars can be arranged linearly according to their abilities and admitted one-by-one in accordance with the budget is flawed. If it were true, we could probably say that the marginal MATS scholar at selection was worth maybe <80 percent of the central scholar (the threshold at which I would have written 3-4x above rather than 4-5x). But it’s not true.
Mentors pick scholars based on their own criteria (MATS ~doesn’t mess with this, although we do offer support in the process). Criteria vary significantly between mentors. It’s not the case, for instance, that all of the mentors put together their ordered list of accepted and waitlisted scholars and end up competing for the same top picks. This happens some, but quite rarely relative to the size of the cohort. If what you’ve assumed actually had a strong effect, we’d expect every mentor to have the same (or even very similar) top picks. They simply don’t.
MATS 6 is both bigger and (based on feedback from mentors) more skill-dense than any previous MATS cohort, because it turns out all else does not hold equal as you scale and you can’t treat a talent pipe line like a pressure calculation.
You might believe that there are network effects, or that the “best” people are only willing to come along if there’s a sufficiently large intellectual scene. (Not saying either is likely, just illustrating that the implied underlying model is not a tautology).
I predict the opposite effect—average intellectual scene quality is a much bigger draw than total number of people (MATS is already large). I expect a larger program is actively detrimental for drawing top people
I’m thinking less of total number of people and more like probability of having specific collaborators work in your exact area or are otherwise useful to have around.
Ah, fair. Yes, I agree that’s a plausible factor, especially for nicher areas
Yeah, I think those are not implausible, but very unlikely.
My ill-informed impression of the RAND situation was that there’s a new group inside RAND thinking about AI, and its small in personnel and resources compared to RAND at large. Is that not so?
small but expanding (like everything in the space) is my understanding; there are also a lot of non-rand government and government-adjacent groups devoted to AI safety and nat sec.
I didn’t mean to imply that the org had retooled to become entirely AI-focused or something; sorry if that’s how it read!