The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.
This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.
Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.
This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30⁄500 applicants accepted. AISC had 40⁄200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.
The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
I agree that this would be valuable, and I’d be excited about empirically informed work on this.
You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.
I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them.
(I also expect significant variation in which arguments will be most convincing to whom.)
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.
For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I’m still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I’m aware of), and that it’s at least possible that a more deliberate strategy could succeed where past efforts have failed.
(This is less directly relevant, but fwiw I don’t think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of ‘AGI-weighted AI research hours’, whatever that is exactly.)
That being said, my guess is that in addition to ‘trying harder’ and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.
FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident ‘warning shots’, high-profile endorsements of AI risk worries, etc.
I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!
I also agree that a set of really good arguments is great to have but not always sufficient.
Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.
Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.
Thanks for the in depth response!
The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).
You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.
You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.
This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.
Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.
This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30⁄500 applicants accepted. AISC had 40⁄200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.
I agree that this would be valuable, and I’d be excited about empirically informed work on this.
You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.
I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them.
(I also expect significant variation in which arguments will be most convincing to whom.)
Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.
For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I’m still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I’m aware of), and that it’s at least possible that a more deliberate strategy could succeed where past efforts have failed.
(This is less directly relevant, but fwiw I don’t think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of ‘AGI-weighted AI research hours’, whatever that is exactly.)
That being said, my guess is that in addition to ‘trying harder’ and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.
FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident ‘warning shots’, high-profile endorsements of AI risk worries, etc.
I really like the window of opportunity idea.
I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!
I also agree that a set of really good arguments is great to have but not always sufficient.
Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.
Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.