Thanks for publishing this, Arb! I have some thoughts, mostly pertaining to MATS:
MATS believes a large part of our impact comes via accelerating researchers who might still enter AI safety, but would otherwise take significantly longer to spin up as competent researchers, rather than convertingpeople into AIS researchers. MATS highly recommends that applicants have already completed AI Safety Fundamentals and most of our applicants come from personal recommendations or AISF alumni (though we are considering better targeted advertising to professional engineers and established academics). Here is a simplified model of the AI safety technical research pipeline as we see it.
Why do we emphasize acceleration over conversion? Because we think that producing a researcher takes a long time (with a high drop-out rate), often requires apprenticeship (including illegible knowledge transfer) with a scarce group of mentors (with high barrier to entry), and benefits substantially from factors such as community support and curriculum. Additionally, MATS’ acceptance rate is ~15% and many rejected applicants are very proficient researchers or engineers, including some with AI safety research experience, who can’t find better options (e.g., independent research is worse for them). MATS scholars with prior AI safety research experience generally believe the program was significantly better than their counterfactual options, or was critical for finding collaborators or co-founders (alumni impact analysis forthcoming). So, the appropriate counterfactual for MATS and similar programs seems to be, “Junior researchers apply for funding and move to a research hub, hoping that a mentor responds to their emails, while orgs still struggle to scale even with extra cash.”
The “push vs. pull” model seems to neglect that e.g. many MATS scholars had highly paid roles in industry (or de facto offers given their qualifications) and chose to accept stipends at $30-50/h because working on AI safety is intrinsically a “pull” for a subset of talent and there were no better options. Additionally, MATS stipends are basically equivalent to LTFF funding; scholars are effectively self-employed as independent researchers, albeit with mentorship, operations, research management, and community support. Also, 63% of past MATS scholars have applied for funding immediately post-program as independent researchers for 4+ months as part of our extension program (many others go back to finish their PhDs or are hired) and 85% of those have been funded. I would guess that the median MATS scholar is slightly above the level of the median LTFF grantee from 2022 in terms of research impact, particularly given the boost they give to a mentor’s research.
Comparing the cost of funding marginal good independent researchers ($80k/year) to the cost of producing a good new researcher ($40k) seems like a false equivalence if you can’t have one without the other. I believe the most taut constraint on producing more AIS researchers is generally training/mentorship, not money. Even wizard software engineers generally need an on-ramp for a field as pre-paradigmatic and illegible as AI safety. If all MATS’ money instead went to the LTFF to support further independent researchers, I believe that substantially less impact would be generated. Many LTFF-funded researchers have enrolled in MATS! Caveat: you could probably hire e.g. Terry Tao for some amount of money, but this would likely be very large. Side note: independent researchers are likely cheaper than scholars in managed research programs or employees at AIS orgs because the latter two have overhead costs that benefit researcher output.
Some of the researchers who passed through AISC later did MATS. Similarly, several researchers who did MLAB or REMIX later did MATS. It’s often hard to appropriately attribute Shapley value to elements of the pipeline, so I recommend assessing orgs addressing different components of the pipeline by how well they achieve their role, and distributing funds between elements of the pipeline based on how much each is constraining the flow of new talent to later sections (anchored by elasticity to funding). For example, I believe that MATS and AISC should be assessed by their effectiveness (including cost, speedup, and mentor time) at converting “informed talent” (i.e., understands the scope of the problem) into “empowered talent” (i.e., can iterate on solutions and attract funding/get hired). This said, MATS aims to improve our advertising towards established academics and software engineers, which might bypass the pipeline in the diagram above. Side note: I believe that converting “unknown talent” into “informed talent” is generally much cheaper than converting “informed talent” into “empowered talent.”
Several MATS mentors (e.g., Neel Nanda) credit the program for helping them develop as research leads. Similarly, several MATS alumni have credited AISC (and SPAR) for helping them develop as research leads, similar to the way some Postdocs or PhDs take on supervisory roles on the way to Professorship. I believe the “carrying capacity” of the AI safety research field is largely bottlenecked on good research leads (i.e., who can scope and lead useful AIS research projects), especially given how many competent software engineers are flooding into AIS. It seems a mistake not to account for this source of impact in this review.
Thanks for writing this, its great to hear your thoughts on talent pipelines in AIS.
I agree with your model of AISC, MATS and your diagram of talent pipelines. I generally see MATS as a “next step” after AISC for many participants. Because of that, its true that we can’t cleanly compare the cost-per-researcher-produced between programs at different points in the pipeline since they are complements rather than substitutes.
A funder would have to consider how to distribute funding between these options (e.g. conversion vs. acceleration) and that’s something I’m hoping to model mathematically at some point.
I believe the “carrying capacity” of the AI safety research field is largely bottlenecked on good research leads (i.e., who can scope and lead useful AIS research projects), especially given how many competent software engineers are flooding into AIS. It seems a mistake not to account for this source of impact in this review.
Good idea, this could be a valuable follow-up analysis. To give this a proper treatment, we would need a model for how students and mentors interact to (say) produce more research and estimate how much they compliment each other.
In general, we assumed that impacts were negligible if we couldn’t model or measure them well in order to get a more conservative estimate. But hopefully we can build the capacity to consider these things!
Thanks for this comment. To me this highlights how AISC is very much not like MATS. We’re very different programs doing very different things. MATS and AISC are both AI safety upskilling programs, but we are using different resources to help different people with different aspects of their journey.
I can’t say where AISC falls in the talent pipeline model, because that’s not how the world actually work.
AISC participants have obviously heard about AI safety, since they would not have found us otherwise. But other than that, people are all over the place in where they are on their journey, and that’s ok. This is actually more a help than a hindrance for AISC projects. Some people have participate in more than one AISC. One of last years research leads are a participants in one of this years projects. This don’t mean they are moving backwards in their journey, this is them lending their expertise to a project that could use it.
So, the appropriate counterfactual for MATS and similar programs seems to be, “Junior researchers apply for funding and move to a research hub, hoping that a mentor responds to their emails, while orgs still struggle to scale even with extra cash.”
This seems correct to me for MATS, and even if I disagreed you should trust Ryan over me. However this is very much not a correct counterfactual for AISC.
If all MATS’ money instead went to the LTFF to support further independent researchers, I believe that substantially less impact would be generated.
This seems correct. I don’t know exactly the cost of MATS, but assuming the majority of the cost is stipends, then giving this money to MATS scrollas with all the MATS support seems just straight up better, even with some overhead cost for the organisers.
I’m less sure about how MATS compare to funding researchers in lower cost locations than SF Bay and London.
I believe the most taut constraint on producing more AIS researchers is generally training/mentorship, not money.
I’m not so sure about this, but if true then this is an argument for funnelling more money to both MATS and AISC and other upskilling programs.
Some of the researchers who passed through AISC later did MATS. Similarly, several researchers who did MLAB or REMIX later did MATS. It’s often hard to appropriately attribute Shapley value to elements of the pipeline, so I recommend assessing orgs addressing different components of the pipeline by how well they achieve their role, and distributing funds between elements of the pipeline based on how much each is constraining the flow of new talent to later sections (anchored by elasticity to funding). For example, I believe that MATS and AISC should be assessed by their effectiveness (including cost, speedup, and mentor time) at converting “informed talent” (i.e., understands the scope of the problem) into “empowered talent” (i.e., can iterate on solutions and attract funding/get hired).
I agree that it’s hard to attribute value when someone done more than one program. They way we asked Arb to adress this is by just asking people. This will be in their second report. I also don’t know the result of this yet.
I don’t think programs should be evaluated based on how well they achieve their role in the pipeline, since I reject this framework.
This said, MATS aims to advertise better towards established academics and software engineers, which might bypass the pipeline in the diagram above. Side note: I believe that converting “unknown talent” into “informed talent” is generally much cheaper than converting “informed talent” into “empowered talent.”
We already have some established academics and software engineers joining AISC. Being a part-time online program is very helfull for being able to include people who have jobs, but would like to try out some AI safety research on the side. This is one of several ways AISC is complementary to MATS, and not a competitor.
Several MATS mentors (e.g., Neel Nanda) credit the program for helping them develop as research leads. Similarly, several MATS alumni have credited AISC (and SPAR) for helping them develop as research leads, similar to the way some Postdocs or PhDs take on supervisory roles on the way to Professorship. I believe the “carrying capacity” of the AI safety research field is largely bottlenecked on good research leads (i.e., who can scope and lead useful AIS research projects), especially given how many competent software engineers are flooding into AIS. It seems a mistake not to account for this source of impact in this review.
Thanks. This is something I’m very proud of as an organiser. Although I was not an organiser the year Neal Nanda was a mentor, I’ve heard this type of feedback from several of the research leads from the last cohort.
This is another way AISC is not like MATS. AISC has a much lower bar for research leads than MATS has for their mentors, which has several down stream effects on how we organise our programs.
MATS has very few, well known, top talent mentors. This means that for them, the time of the mentors is a very limited resource, and everything else is organised around this constraint.
AISC has a lower bar for our research leads, which means we have many more of them, letting up run a much bigger program. This is how AISC is so scalable. On the other hand we have some research leads learning-by-doing, along with everyone else, which creates some potential problems. AISC is structured around addressing this, and it seem to be working.
I don’t like this funnel model, or any other funnel model I’ve seen. It’s not wrong exactly, but it misses so much, that it’s often more harmfull than helpful.
For example:
If you actually talk to people their story is not this linear, and that is important.
The picture make it looks like AISC, MATS, etc are interchangeable, or just different quality versions of the same thing. This is very far from the truth.
I don’t have a nice looking replacement for the funnel. If had a nice clean model like this, it would probably be as bad. The real world is just very messy.
Thanks for publishing this, Arb! I have some thoughts, mostly pertaining to MATS:
MATS believes a large part of our impact comes via accelerating researchers who might still enter AI safety, but would otherwise take significantly longer to spin up as competent researchers, rather than converting people into AIS researchers. MATS highly recommends that applicants have already completed AI Safety Fundamentals and most of our applicants come from personal recommendations or AISF alumni (though we are considering better targeted advertising to professional engineers and established academics). Here is a simplified model of the AI safety technical research pipeline as we see it.
Why do we emphasize acceleration over conversion? Because we think that producing a researcher takes a long time (with a high drop-out rate), often requires apprenticeship (including illegible knowledge transfer) with a scarce group of mentors (with high barrier to entry), and benefits substantially from factors such as community support and curriculum. Additionally, MATS’ acceptance rate is ~15% and many rejected applicants are very proficient researchers or engineers, including some with AI safety research experience, who can’t find better options (e.g., independent research is worse for them). MATS scholars with prior AI safety research experience generally believe the program was significantly better than their counterfactual options, or was critical for finding collaborators or co-founders (alumni impact analysis forthcoming). So, the appropriate counterfactual for MATS and similar programs seems to be, “Junior researchers apply for funding and move to a research hub, hoping that a mentor responds to their emails, while orgs still struggle to scale even with extra cash.”
The “push vs. pull” model seems to neglect that e.g. many MATS scholars had highly paid roles in industry (or de facto offers given their qualifications) and chose to accept stipends at $30-50/h because working on AI safety is intrinsically a “pull” for a subset of talent and there were no better options. Additionally, MATS stipends are basically equivalent to LTFF funding; scholars are effectively self-employed as independent researchers, albeit with mentorship, operations, research management, and community support. Also, 63% of past MATS scholars have applied for funding immediately post-program as independent researchers for 4+ months as part of our extension program (many others go back to finish their PhDs or are hired) and 85% of those have been funded. I would guess that the median MATS scholar is slightly above the level of the median LTFF grantee from 2022 in terms of research impact, particularly given the boost they give to a mentor’s research.
Comparing the cost of funding marginal good independent researchers ($80k/year) to the cost of producing a good new researcher ($40k) seems like a false equivalence if you can’t have one without the other. I believe the most taut constraint on producing more AIS researchers is generally training/mentorship, not money. Even wizard software engineers generally need an on-ramp for a field as pre-paradigmatic and illegible as AI safety. If all MATS’ money instead went to the LTFF to support further independent researchers, I believe that substantially less impact would be generated. Many LTFF-funded researchers have enrolled in MATS! Caveat: you could probably hire e.g. Terry Tao for some amount of money, but this would likely be very large. Side note: independent researchers are likely cheaper than scholars in managed research programs or employees at AIS orgs because the latter two have overhead costs that benefit researcher output.
Some of the researchers who passed through AISC later did MATS. Similarly, several researchers who did MLAB or REMIX later did MATS. It’s often hard to appropriately attribute Shapley value to elements of the pipeline, so I recommend assessing orgs addressing different components of the pipeline by how well they achieve their role, and distributing funds between elements of the pipeline based on how much each is constraining the flow of new talent to later sections (anchored by elasticity to funding). For example, I believe that MATS and AISC should be assessed by their effectiveness (including cost, speedup, and mentor time) at converting “informed talent” (i.e., understands the scope of the problem) into “empowered talent” (i.e., can iterate on solutions and attract funding/get hired). This said, MATS aims to improve our advertising towards established academics and software engineers, which might bypass the pipeline in the diagram above. Side note: I believe that converting “unknown talent” into “informed talent” is generally much cheaper than converting “informed talent” into “empowered talent.”
Several MATS mentors (e.g., Neel Nanda) credit the program for helping them develop as research leads. Similarly, several MATS alumni have credited AISC (and SPAR) for helping them develop as research leads, similar to the way some Postdocs or PhDs take on supervisory roles on the way to Professorship. I believe the “carrying capacity” of the AI safety research field is largely bottlenecked on good research leads (i.e., who can scope and lead useful AIS research projects), especially given how many competent software engineers are flooding into AIS. It seems a mistake not to account for this source of impact in this review.
Thanks for writing this, its great to hear your thoughts on talent pipelines in AIS.
I agree with your model of AISC, MATS and your diagram of talent pipelines. I generally see MATS as a “next step” after AISC for many participants. Because of that, its true that we can’t cleanly compare the cost-per-researcher-produced between programs at different points in the pipeline since they are complements rather than substitutes.
A funder would have to consider how to distribute funding between these options (e.g. conversion vs. acceleration) and that’s something I’m hoping to model mathematically at some point.
Good idea, this could be a valuable follow-up analysis. To give this a proper treatment, we would need a model for how students and mentors interact to (say) produce more research and estimate how much they compliment each other.
In general, we assumed that impacts were negligible if we couldn’t model or measure them well in order to get a more conservative estimate. But hopefully we can build the capacity to consider these things!
Thanks for this comment. To me this highlights how AISC is very much not like MATS. We’re very different programs doing very different things. MATS and AISC are both AI safety upskilling programs, but we are using different resources to help different people with different aspects of their journey.
I can’t say where AISC falls in the talent pipeline model, because that’s not how the world actually work.
AISC participants have obviously heard about AI safety, since they would not have found us otherwise. But other than that, people are all over the place in where they are on their journey, and that’s ok. This is actually more a help than a hindrance for AISC projects. Some people have participate in more than one AISC. One of last years research leads are a participants in one of this years projects. This don’t mean they are moving backwards in their journey, this is them lending their expertise to a project that could use it.
This seems correct to me for MATS, and even if I disagreed you should trust Ryan over me. However this is very much not a correct counterfactual for AISC.
This seems correct. I don’t know exactly the cost of MATS, but assuming the majority of the cost is stipends, then giving this money to MATS scrollas with all the MATS support seems just straight up better, even with some overhead cost for the organisers.
I’m less sure about how MATS compare to funding researchers in lower cost locations than SF Bay and London.
I’m not so sure about this, but if true then this is an argument for funnelling more money to both MATS and AISC and other upskilling programs.
I agree that it’s hard to attribute value when someone done more than one program. They way we asked Arb to adress this is by just asking people. This will be in their second report. I also don’t know the result of this yet.
I don’t think programs should be evaluated based on how well they achieve their role in the pipeline, since I reject this framework.
We already have some established academics and software engineers joining AISC. Being a part-time online program is very helfull for being able to include people who have jobs, but would like to try out some AI safety research on the side. This is one of several ways AISC is complementary to MATS, and not a competitor.
Thanks. This is something I’m very proud of as an organiser. Although I was not an organiser the year Neal Nanda was a mentor, I’ve heard this type of feedback from several of the research leads from the last cohort.
This is another way AISC is not like MATS. AISC has a much lower bar for research leads than MATS has for their mentors, which has several down stream effects on how we organise our programs.
MATS has very few, well known, top talent mentors. This means that for them, the time of the mentors is a very limited resource, and everything else is organised around this constraint.
AISC has a lower bar for our research leads, which means we have many more of them, letting up run a much bigger program. This is how AISC is so scalable. On the other hand we have some research leads learning-by-doing, along with everyone else, which creates some potential problems. AISC is structured around addressing this, and it seem to be working.
I don’t like this funnel model, or any other funnel model I’ve seen. It’s not wrong exactly, but it misses so much, that it’s often more harmfull than helpful.
For example:
If you actually talk to people their story is not this linear, and that is important.
The picture make it looks like AISC, MATS, etc are interchangeable, or just different quality versions of the same thing. This is very far from the truth.
I don’t have a nice looking replacement for the funnel. If had a nice clean model like this, it would probably be as bad. The real world is just very messy.
This is insightful, thanks!