S-risk is probably just 1/10th of that – wild guess
This feels high to me – I acknowledge that you are caveating this as just a guess, but I would be interested to hear more of your reasoning.
One specific thing I’m confused about: you described alignment as “an adversarial game that we’re almost sure to lose.” But conflict between misaligned AI’s is not likely to constitute an s-risk, right? You can’t really blackmail a paperclip maximizer by threatening to simulate torture, because the paperclip maximizer doesn’t care about torture, just paperclips.
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Exactly! Even GPT-4 sounds pretty aligned to me, maybe dangerously so. And even if that might have nothing to do with any real goals it might have deep down if it’s a mesa optimizer, the appearance could still lead to trouble in adversarial games with less seemingly aligned agents.
This feels high to me – I acknowledge that you are caveating this as just a guess, but I would be interested to hear more of your reasoning.
One specific thing I’m confused about: you described alignment as “an adversarial game that we’re almost sure to lose.” But conflict between misaligned AI’s is not likely to constitute an s-risk, right? You can’t really blackmail a paperclip maximizer by threatening to simulate torture, because the paperclip maximizer doesn’t care about torture, just paperclips.
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Exactly! Even GPT-4 sounds pretty aligned to me, maybe dangerously so. And even if that might have nothing to do with any real goals it might have deep down if it’s a mesa optimizer, the appearance could still lead to trouble in adversarial games with less seemingly aligned agents.