Intent alignment seeks to build an AI that does what its designer wants. You seem to want an alternative: build an AI that does what is best for all sentient life (or at least for humanity). Some reasons that we (maybe) shouldn’t focus on this problem:
it seems horribly intractable (but I’d love to hear your ideas for solutions!) at both a technical and philosophical level—this is my biggest qualm
with an AGI that does exactly what Facebook engineer no. 13,882 wants, we “only” need that engineer to want things that are good for all sentient life
(maybe) scenarios with advanced AI killing all sentient life are substantially more likely than scenarios with animal suffering
There are definitely counterarguments to these. E.g. maybe animal suffering scenarios are still higher expected value to work on because of their severity (imagine factory farms continuing to exist for billions of years).
It sounds quite innocuous to ‘build an AI that does what its designer wants’—as long as we ignore the true diversity of its designers (and users) might actually want.
If an AI designer or user is a misanthropic nihilist who wants humanity to go extinct, or it a religious or political terrorist, or is an authoritarian censor who wants to suppress free speech, then we shouldn’t want the AI to do what they want.
Is this problem ‘horribly intractable’? Maybe it is. But if we ignore the truly, horribly intractable problems in AI alignment, then we increase X risk.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly, and limiting the alignment problems that it considers ‘legitimate’ to consider so narrowly, that it we could end up in a situation where alignment looks ‘solved’ at a narrow technical level, and this gives reassurance to corporate AI development teams that they can go full steam ahead towards AGI—but where alignment is very very far from solved at the actual real-world level of billions of diverse people with seriously conflicting interests.
Totally agree that intent alignment does basically nothing to solve misuse risks. To weigh the importance of misuse risks, we should consider (a) how quickly AI to AGI happens, (b) whether the first group to deploy AGI will use it to prevent other groups from developing AGI, (c) how quickly AGI to superintelligence happens, (d) how widely accessible AI will be to the public as it develops, (e) the destructive power of AI misuse at various stages of AI capability, etc.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly...
Paul Christiano’s 2019 EAG-SF talk highlights how there are so many other important subproblems within “make AI go well” besides intent alignment. Of course, Paul doesn’t speak for “AI alignment as a field.”
Intent alignment seeks to build an AI that does what its designer wants. You seem to want an alternative: build an AI that does what is best for all sentient life (or at least for humanity). Some reasons that we (maybe) shouldn’t focus on this problem:
it seems horribly intractable (but I’d love to hear your ideas for solutions!) at both a technical and philosophical level—this is my biggest qualm
with an AGI that does exactly what Facebook engineer no. 13,882 wants, we “only” need that engineer to want things that are good for all sentient life
(maybe) scenarios with advanced AI killing all sentient life are substantially more likely than scenarios with animal suffering
There are definitely counterarguments to these. E.g. maybe animal suffering scenarios are still higher expected value to work on because of their severity (imagine factory farms continuing to exist for billions of years).
It sounds quite innocuous to ‘build an AI that does what its designer wants’—as long as we ignore the true diversity of its designers (and users) might actually want.
If an AI designer or user is a misanthropic nihilist who wants humanity to go extinct, or it a religious or political terrorist, or is an authoritarian censor who wants to suppress free speech, then we shouldn’t want the AI to do what they want.
Is this problem ‘horribly intractable’? Maybe it is. But if we ignore the truly, horribly intractable problems in AI alignment, then we increase X risk.
I increasingly get the sense that AI alignment as a field is defining itself so narrowly, and limiting the alignment problems that it considers ‘legitimate’ to consider so narrowly, that it we could end up in a situation where alignment looks ‘solved’ at a narrow technical level, and this gives reassurance to corporate AI development teams that they can go full steam ahead towards AGI—but where alignment is very very far from solved at the actual real-world level of billions of diverse people with seriously conflicting interests.
Totally agree that intent alignment does basically nothing to solve misuse risks. To weigh the importance of misuse risks, we should consider (a) how quickly AI to AGI happens, (b) whether the first group to deploy AGI will use it to prevent other groups from developing AGI, (c) how quickly AGI to superintelligence happens, (d) how widely accessible AI will be to the public as it develops, (e) the destructive power of AI misuse at various stages of AI capability, etc.
Paul Christiano’s 2019 EAG-SF talk highlights how there are so many other important subproblems within “make AI go well” besides intent alignment. Of course, Paul doesn’t speak for “AI alignment as a field.”