This talk by Jade Leung got me thinking—I’ve never seen a plan for what we do if AGI turns out misaligned.
The default assumption seems to be something like “well, there’s no point planning for that, because we’ll all be powerless and screwed”. This seems mistaken to me. It’s not clear that we’ll be so powerless that we have absolutely no ability to encourage a trajectory change, particularly in a slow takeoff scenario. Given that most people weight alleviating suffering higher than promoting pleasure, this is especially valuable work in expectation as it might help us change outcomes from ‘very, very bad world’ to ‘slightly negative’ world. This also seems pretty tractable—I’d expect ~10hrs thinking about this could help us come up with a very barebones playbook.
Why isn’t this being done? I think there are a few reasons:
Like suffering focused ethics, it’s depressing.
It seems particularly speculative—most of the ‘humanity becomes disempowered by AGI’ scenarios look pretty sci-fi. So serious academics don’t want to consider it.
People assume, mistakenly IMO, that we’re just totally screwed if AI is misaligned.
No Plans for Misaligned AI:
This talk by Jade Leung got me thinking—I’ve never seen a plan for what we do if AGI turns out misaligned.
The default assumption seems to be something like “well, there’s no point planning for that, because we’ll all be powerless and screwed”. This seems mistaken to me. It’s not clear that we’ll be so powerless that we have absolutely no ability to encourage a trajectory change, particularly in a slow takeoff scenario. Given that most people weight alleviating suffering higher than promoting pleasure, this is especially valuable work in expectation as it might help us change outcomes from ‘very, very bad world’ to ‘slightly negative’ world. This also seems pretty tractable—I’d expect ~10hrs thinking about this could help us come up with a very barebones playbook.
Why isn’t this being done? I think there are a few reasons:
Like suffering focused ethics, it’s depressing.
It seems particularly speculative—most of the ‘humanity becomes disempowered by AGI’ scenarios look pretty sci-fi. So serious academics don’t want to consider it.
People assume, mistakenly IMO, that we’re just totally screwed if AI is misaligned.