The framing here seems really strange to me. You seem to have a strong prior that doom happens, while, to me, most arguments for doom require quite a few hypothesis to be true and hence their conjunction is a priori unlikely. I guess I don’t find the inside view arguments very persuasive to majorly update, much like the median to AI experts, who are around 2%.
To go into your questions specifically.
What mechanisms are at play?
AGI is closer to a very intelligent human than to a naive optimiser.
How is alignment solved so that there are 0 failure modes?
I don’t see why this is required, I’m not arguing p(doom) is 0.
Can we survive despite imperfect alignment? How? Is alignment moot? Will physical limits be reached before there is too much danger?
AGI either can’t or “chooses” not to cause an x-risk.
Sure, it’s uncertain. But we’re 100% of the reference class of general intelligences. Most AI scenarios seem to lean very heavily on the “naive optimiser” though which means low-ish credence in them a priori.
In reality, I guess both these views are wrong or it’s a spectrum with AGI somewhere along it.
We are a product of billions of years of biological evolution. AI shares none of that history in its architecture. Are all planets habitable like the Earth? Apply the Copernican Revolution to mindspace.
This seems like a case of different prior distributions. I think it’s a specific hypothesis to say that strong optimisers won’t happen (i.e. there has to be a specific reason for this, otherwise it’s the default, for convergent instrumental reasons).
The framing here seems really strange to me. You seem to have a strong prior that doom happens, while, to me, most arguments for doom require quite a few hypothesis to be true and hence their conjunction is a priori unlikely. I guess I don’t find the inside view arguments very persuasive to majorly update, much like the median to AI experts, who are around 2%.
To go into your questions specifically.
AGI is closer to a very intelligent human than to a naive optimiser.
I don’t see why this is required, I’m not arguing p(doom) is 0.
AGI either can’t or “chooses” not to cause an x-risk.
What makes you think this? To me it just sounds like ungrounded anthropomorphism.
Sure, it’s uncertain. But we’re 100% of the reference class of general intelligences. Most AI scenarios seem to lean very heavily on the “naive optimiser” though which means low-ish credence in them a priori.
In reality, I guess both these views are wrong or it’s a spectrum with AGI somewhere along it.
We are a product of billions of years of biological evolution. AI shares none of that history in its architecture. Are all planets habitable like the Earth? Apply the Copernican Revolution to mindspace.
Yes, high uncertainty here. Problem is that your credence on AI being a strong optimiser is a ceiling on p(doom| AGI) under every scenario I’ve read
What makes you think it’s unlikely that strong optimisers will come about?
Prior: most specific hypotheses are wrong. Update: we don’t have strong evidence in any direction. Conclusion: more likely than not this is wrong.
This seems like a case of different prior distributions. I think it’s a specific hypothesis to say that strong optimisers won’t happen (i.e. there has to be a specific reason for this, otherwise it’s the default, for convergent instrumental reasons).