I have the feeling that there is a tendency in the AI safety community to think that if we solve the alignment problem, we’re done and the future must be necessarily flourishing (I observe that some EAs say that either we go extinct or it’s heaven on earth depending on the alignment problem, in a very binary way actually). However, it seems to me that post aligned-AGI scenario merit attention as well: game theory provides us a sufficient rationale to state that even rational agents (in this cases >2 AGIs) can take sub-optimal decisions (including catastrophic scenarios) when face with some social dilemma. Any thoughts on this please?
I think to the extent that there would be post-AGI sub-optimal decision making (or catastrophe), that would be basically a failure of alignment (i.e. the alignment problem would not in actual fact have been solved!). More concretely, there are many things that need aligning beyond single human : single AGI, the most difficult being multi-human : multi-AGI, but there is also alignment needed at every relevant step in the human decision making chain.
I have the feeling that there is a tendency in the AI safety community to think that if we solve the alignment problem, we’re done and the future must be necessarily flourishing (I observe that some EAs say that either we go extinct or it’s heaven on earth depending on the alignment problem, in a very binary way actually). However, it seems to me that post aligned-AGI scenario merit attention as well: game theory provides us a sufficient rationale to state that even rational agents (in this cases >2 AGIs) can take sub-optimal decisions (including catastrophic scenarios) when face with some social dilemma. Any thoughts on this please?
I think to the extent that there would be post-AGI sub-optimal decision making (or catastrophe), that would be basically a failure of alignment (i.e. the alignment problem would not in actual fact have been solved!). More concretely, there are many things that need aligning beyond single human : single AGI, the most difficult being multi-human : multi-AGI, but there is also alignment needed at every relevant step in the human decision making chain.