It sounds quite innocuous to ‘build an AI that does what its designer wants’—as long as we ignore the true diversity of its designers (and users) might actually want.
If an AI designer or user is a misanthropic nihilist who wants humanity to go extinct, or it a religious or political terrorist, or is an authoritarian censor who wants to suppress free speech, then we shouldn’t want the AI to do what they want.
Is this problem ‘horribly intractable’? Maybe it is. But if we ignore the truly, horribly intractable problems in AI alignment, then we increase X risk.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly, and limiting the alignment problems that it considers ‘legitimate’ to consider so narrowly, that it we could end up in a situation where alignment looks ‘solved’ at a narrow technical level, and this gives reassurance to corporate AI development teams that they can go full steam ahead towards AGI—but where alignment is very very far from solved at the actual real-world level of billions of diverse people with seriously conflicting interests.
Totally agree that intent alignment does basically nothing to solve misuse risks. To weigh the importance of misuse risks, we should consider (a) how quickly AI to AGI happens, (b) whether the first group to deploy AGI will use it to prevent other groups from developing AGI, (c) how quickly AGI to superintelligence happens, (d) how widely accessible AI will be to the public as it develops, (e) the destructive power of AI misuse at various stages of AI capability, etc.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly...
Paul Christiano’s 2019 EAG-SF talk highlights how there are so many other important subproblems within “make AI go well” besides intent alignment. Of course, Paul doesn’t speak for “AI alignment as a field.”
It sounds quite innocuous to ‘build an AI that does what its designer wants’—as long as we ignore the true diversity of its designers (and users) might actually want.
If an AI designer or user is a misanthropic nihilist who wants humanity to go extinct, or it a religious or political terrorist, or is an authoritarian censor who wants to suppress free speech, then we shouldn’t want the AI to do what they want.
Is this problem ‘horribly intractable’? Maybe it is. But if we ignore the truly, horribly intractable problems in AI alignment, then we increase X risk.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly, and limiting the alignment problems that it considers ‘legitimate’ to consider so narrowly, that it we could end up in a situation where alignment looks ‘solved’ at a narrow technical level, and this gives reassurance to corporate AI development teams that they can go full steam ahead towards AGI—but where alignment is very very far from solved at the actual real-world level of billions of diverse people with seriously conflicting interests.
Totally agree that intent alignment does basically nothing to solve misuse risks. To weigh the importance of misuse risks, we should consider (a) how quickly AI to AGI happens, (b) whether the first group to deploy AGI will use it to prevent other groups from developing AGI, (c) how quickly AGI to superintelligence happens, (d) how widely accessible AI will be to the public as it develops, (e) the destructive power of AI misuse at various stages of AI capability, etc.
Paul Christiano’s 2019 EAG-SF talk highlights how there are so many other important subproblems within “make AI go well” besides intent alignment. Of course, Paul doesn’t speak for “AI alignment as a field.”