There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications specifically for s-risks. With all ITN factors taken together but ignoring probabilities, s-risk work beats other x-risk work by a factor of 10^12 for me (your mileage may vary), so if it’s just 10x less likely, that’s not decisive for me.
S-risk is probably just 1/10th of that – wild guess
This feels high to me – I acknowledge that you are caveating this as just a guess, but I would be interested to hear more of your reasoning.
One specific thing I’m confused about: you described alignment as “an adversarial game that we’re almost sure to lose.” But conflict between misaligned AI’s is not likely to constitute an s-risk, right? You can’t really blackmail a paperclip maximizer by threatening to simulate torture, because the paperclip maximizer doesn’t care about torture, just paperclips.
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Exactly! Even GPT-4 sounds pretty aligned to me, maybe dangerously so. And even if that might have nothing to do with any real goals it might have deep down if it’s a mesa optimizer, the appearance could still lead to trouble in adversarial games with less seemingly aligned agents.
There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications specifically for s-risks. With all ITN factors taken together but ignoring probabilities, s-risk work beats other x-risk work by a factor of 10^12 for me (your mileage may vary), so if it’s just 10x less likely, that’s not decisive for me.
I don’t have a response to the third version.
This feels high to me – I acknowledge that you are caveating this as just a guess, but I would be interested to hear more of your reasoning.
One specific thing I’m confused about: you described alignment as “an adversarial game that we’re almost sure to lose.” But conflict between misaligned AI’s is not likely to constitute an s-risk, right? You can’t really blackmail a paperclip maximizer by threatening to simulate torture, because the paperclip maximizer doesn’t care about torture, just paperclips.
Maybe you think that multipolar scenarios are likely to result in AI’s that are almost but not completely aligned?
Exactly! Even GPT-4 sounds pretty aligned to me, maybe dangerously so. And even if that might have nothing to do with any real goals it might have deep down if it’s a mesa optimizer, the appearance could still lead to trouble in adversarial games with less seemingly aligned agents.