Thanks for sharing. – I love the spirit of aiming to come up with a strategy that, if successful, would have a shot at significantly moving the needle on prospects for AI alignment. I think this is an important but (perhaps surprisingly) hard challenge, and that a lot of work labeled as AI governance or AI safety is not as impactful as it could be in virtue of not being tied into an overall strategy that aims to attack the full problem.
That being said, I feel fairly strongly that this strategy as stated is not viable, and that you aiming to implement the strategy would come with sufficiently severe risks of harming our prospects for achieving aligned AI that I would strongly advise against moving forward with this.
I know that you have emailed several people (including me) asking for their opinion on your plan. I want to share my advice and reasoning publicly so other people you may be talking to can save time by referring to my comment and indicating where they agree or disagree.
--
Here is why I’m very skeptical that you moving forward with your plan is a good idea:
I think you massively understate how unlikely your strategy is to succeed:
There are a lot of AI researchers. More than 10,000 people attended the machine learning conference NeurIPS (in 2019), and if we include engineers the number is in the hundreds of thousands. Having one-on-one conversations with all of them would require at least hundreds of thousands to millions of person-hours from people who could otherwise do AI safety research or do movement building aimed at potentially more receptive audiences.
Several prominent AI researchers have engaged with AI risk arguments for likely dozens of hours if not more (example), and yet they remain unconvinced. So there would very likely be significant resistance against your scheme by leaders in the field, which makes it seem dubious whether you could convince a significant fraction of the field to change gears.
There are significant financial, status, and ‘fun’ incentives for most AI researchers to keep doing what they doing. You allude to this, but seem to fail to grasp the magnitude of the problem and how hard it would be to solve. Have you ever seen “marketing specialists” convince hundreds of thousands of people to leave high-paying and intellectually rewarding jobs to work on something else (let alone a field that is likely pretty frustrating if not impossible to enter)? (Not even mentioning the issue that any such effort would be competing against the ‘marketing’ of trillion-dollar companies like Google that have strong incentives to portray themselves as, and actually become, good places to work at.)
AI safety arguably isn’t a field that can absorb many people right now. Your post sort of acknowledges this when briefly mentioning mentoring bottlenecks, but again in my view fails to grasp with the size and implications of this problem. (And also it’s not just about mentoring bottlenecks, but a lack of strategic clarity, much required research being ‘preparadigmatic’, etc.)
Your plan comes with significant risks, which you do not acknowledge at all. Together with the other flaws and gaps I perceive in your reasoning, I consider this a red flag for your fit for executing any project in the vicinity of what you outline here.
Poorly implemented versions of your plan can easily backfire: AI researchers might either be substantively unconvinced and become more confident in dismissing AI risk or – and this seems like a rather significant risk to me – would perceive an organized effort to seek one-on-one conversations with them in order to convince them of a particular position as weird or even ethically objectionable.
Explaining to a lot of people why AI might be a big deal comes with the risk of putting the idea that they should race toward AGI into the heads of malign or reckless actors.
There are many possible strategic priorities for how to advance AI alignment. For instance, an alternative to your strategy would be: ‘Find talented and EA-aligned students who are able to contribute to AI alignment or AI governance despite both of these being ill-defined fields marred with wicked problems.’ (I.e., roughly speaking, precisely the strategy that is being executed by significant parts of the longtermist EA movement.) And there are significant flaws and gaps in the reasoning that makes you pick out ‘move AI capabilities researchers to AI safety’ as the preferred strategy.
You make it sound like a majority or even “monopoly” of AI researchers needs to work on safety rather than capabilities. However, in fact (and simplifying a bit), we only need as many researchers to work on AI safety as is required to solve the AI alignment problem in time. We don’t know how hard this is. It could be that we need twice as much research effort as on AI capabilities, or that we only need one millionth of that.
There are some reasons to think that expected returns to both safety and capabilities progress toward AGI are logarithmic. That is, each doubling of total research effort produces the same amount of expected returns. Given that AI capabilities research is a field that is several orders of magnitudes larger then AI safety, this means that the marginal returns of moving people from capabilities to safety research are almost all due to increasing AI safety effort, while the effect from slowing down capabilities is very small. This suggests that the overall best strategy is to scale up AI safety research by targeting whatever audiences leads to most quality-adjusted expected AI safety research hours.
(I also think that, in fact, there is not a clean division between “AI capabilities” and “AI safety” research. For instance, work on interpretability or learning from human feedback arguably significantly contributes to both capabilities and safety. I have bracketed this point because I don’t think it is that relevant for the viability of your plan, except perhaps indirectly by providing evidence about your understanding of the AI field.)
--
To be clear, I think that some of the specific ideas you mention are very good if implemented well. For instance, I do think that better AI safety curricula are very valuable.
However, these viable ideas are usually things that are already happening. There are AI alignment curricula, there are events aimed at scaling up the field, there are efforts to make AI safety seem prestigious to mainstream AI talent, and there even are efforts that are partly aimed at increasing the credibility of AI risk ideas to AI researchers, such as TED talks or books by reputable AI professors or high-profile conferences at which AI researchers can mingle with people more concerned about AI risk.
If you wanted to figure out to which of these efforts you are best placed to contribute, or whether there might be any gaps among current activities, then I’m all for it! I just don’t think that trying to tie them into a grand strategy that to me seems flawed in all those places at which it is new and specific, and not new in all those places where it makes sense to me, will be a productive approach.
The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.
This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.
Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.
This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30⁄500 applicants accepted. AISC had 40⁄200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.
The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
I agree that this would be valuable, and I’d be excited about empirically informed work on this.
You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.
I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them.
(I also expect significant variation in which arguments will be most convincing to whom.)
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.
For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I’m still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I’m aware of), and that it’s at least possible that a more deliberate strategy could succeed where past efforts have failed.
(This is less directly relevant, but fwiw I don’t think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of ‘AGI-weighted AI research hours’, whatever that is exactly.)
That being said, my guess is that in addition to ‘trying harder’ and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.
FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident ‘warning shots’, high-profile endorsements of AI risk worries, etc.
I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!
I also agree that a set of really good arguments is great to have but not always sufficient.
Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.
Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.
Thanks for sharing. – I love the spirit of aiming to come up with a strategy that, if successful, would have a shot at significantly moving the needle on prospects for AI alignment. I think this is an important but (perhaps surprisingly) hard challenge, and that a lot of work labeled as AI governance or AI safety is not as impactful as it could be in virtue of not being tied into an overall strategy that aims to attack the full problem.
That being said, I feel fairly strongly that this strategy as stated is not viable, and that you aiming to implement the strategy would come with sufficiently severe risks of harming our prospects for achieving aligned AI that I would strongly advise against moving forward with this.
I know that you have emailed several people (including me) asking for their opinion on your plan. I want to share my advice and reasoning publicly so other people you may be talking to can save time by referring to my comment and indicating where they agree or disagree.
--
Here is why I’m very skeptical that you moving forward with your plan is a good idea:
I think you massively understate how unlikely your strategy is to succeed:
There are a lot of AI researchers. More than 10,000 people attended the machine learning conference NeurIPS (in 2019), and if we include engineers the number is in the hundreds of thousands. Having one-on-one conversations with all of them would require at least hundreds of thousands to millions of person-hours from people who could otherwise do AI safety research or do movement building aimed at potentially more receptive audiences.
Several prominent AI researchers have engaged with AI risk arguments for likely dozens of hours if not more (example), and yet they remain unconvinced. So there would very likely be significant resistance against your scheme by leaders in the field, which makes it seem dubious whether you could convince a significant fraction of the field to change gears.
There are significant financial, status, and ‘fun’ incentives for most AI researchers to keep doing what they doing. You allude to this, but seem to fail to grasp the magnitude of the problem and how hard it would be to solve. Have you ever seen “marketing specialists” convince hundreds of thousands of people to leave high-paying and intellectually rewarding jobs to work on something else (let alone a field that is likely pretty frustrating if not impossible to enter)? (Not even mentioning the issue that any such effort would be competing against the ‘marketing’ of trillion-dollar companies like Google that have strong incentives to portray themselves as, and actually become, good places to work at.)
AI safety arguably isn’t a field that can absorb many people right now. Your post sort of acknowledges this when briefly mentioning mentoring bottlenecks, but again in my view fails to grasp with the size and implications of this problem. (And also it’s not just about mentoring bottlenecks, but a lack of strategic clarity, much required research being ‘preparadigmatic’, etc.)
Your plan comes with significant risks, which you do not acknowledge at all. Together with the other flaws and gaps I perceive in your reasoning, I consider this a red flag for your fit for executing any project in the vicinity of what you outline here.
Poorly implemented versions of your plan can easily backfire: AI researchers might either be substantively unconvinced and become more confident in dismissing AI risk or – and this seems like a rather significant risk to me – would perceive an organized effort to seek one-on-one conversations with them in order to convince them of a particular position as weird or even ethically objectionable.
Explaining to a lot of people why AI might be a big deal comes with the risk of putting the idea that they should race toward AGI into the heads of malign or reckless actors.
There are many possible strategic priorities for how to advance AI alignment. For instance, an alternative to your strategy would be: ‘Find talented and EA-aligned students who are able to contribute to AI alignment or AI governance despite both of these being ill-defined fields marred with wicked problems.’ (I.e., roughly speaking, precisely the strategy that is being executed by significant parts of the longtermist EA movement.) And there are significant flaws and gaps in the reasoning that makes you pick out ‘move AI capabilities researchers to AI safety’ as the preferred strategy.
You make it sound like a majority or even “monopoly” of AI researchers needs to work on safety rather than capabilities. However, in fact (and simplifying a bit), we only need as many researchers to work on AI safety as is required to solve the AI alignment problem in time. We don’t know how hard this is. It could be that we need twice as much research effort as on AI capabilities, or that we only need one millionth of that.
There are some reasons to think that expected returns to both safety and capabilities progress toward AGI are logarithmic. That is, each doubling of total research effort produces the same amount of expected returns. Given that AI capabilities research is a field that is several orders of magnitudes larger then AI safety, this means that the marginal returns of moving people from capabilities to safety research are almost all due to increasing AI safety effort, while the effect from slowing down capabilities is very small. This suggests that the overall best strategy is to scale up AI safety research by targeting whatever audiences leads to most quality-adjusted expected AI safety research hours.
(I also think that, in fact, there is not a clean division between “AI capabilities” and “AI safety” research. For instance, work on interpretability or learning from human feedback arguably significantly contributes to both capabilities and safety. I have bracketed this point because I don’t think it is that relevant for the viability of your plan, except perhaps indirectly by providing evidence about your understanding of the AI field.)
--
To be clear, I think that some of the specific ideas you mention are very good if implemented well. For instance, I do think that better AI safety curricula are very valuable.
However, these viable ideas are usually things that are already happening. There are AI alignment curricula, there are events aimed at scaling up the field, there are efforts to make AI safety seem prestigious to mainstream AI talent, and there even are efforts that are partly aimed at increasing the credibility of AI risk ideas to AI researchers, such as TED talks or books by reputable AI professors or high-profile conferences at which AI researchers can mingle with people more concerned about AI risk.
If you wanted to figure out to which of these efforts you are best placed to contribute, or whether there might be any gaps among current activities, then I’m all for it! I just don’t think that trying to tie them into a grand strategy that to me seems flawed in all those places at which it is new and specific, and not new in all those places where it makes sense to me, will be a productive approach.
Thank you very much for the constructive criticisms, Max! I appreciate your honest response, and agree with many of your points.
I am in the process of preparing a (hopefully) well-thought-out response to your comment.
Thanks for the in depth response!
The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.
This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.
Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.
This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30⁄500 applicants accepted. AISC had 40⁄200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.
I agree that this would be valuable, and I’d be excited about empirically informed work on this.
You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.
I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them.
(I also expect significant variation in which arguments will be most convincing to whom.)
Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.
For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I’m still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I’m aware of), and that it’s at least possible that a more deliberate strategy could succeed where past efforts have failed.
(This is less directly relevant, but fwiw I don’t think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of ‘AGI-weighted AI research hours’, whatever that is exactly.)
That being said, my guess is that in addition to ‘trying harder’ and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.
FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident ‘warning shots’, high-profile endorsements of AI risk worries, etc.
I really like the window of opportunity idea.
I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!
I also agree that a set of really good arguments is great to have but not always sufficient.
Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.
Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.