Seconding this. This future scenario as constructed seems brittle to subtle forms of misalignment[1] erasing nearly all future value (i.e. still an existential catastrophe even if not a sudden extinction event).
Looks like Matthew did post a model of doom that contains something like this (back in May, before the top level comment:
My modal tale of AI doom looks something like the following:
1. AI systems get progressively and incrementally more capable across almost every meaningful axis.
2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and probably more.
3. At some point humans will retire since their labor is not worth much anymore. Humans will then cede all the keys of power to AI, while keeping nominal titles of power.
4. AI will control essentially everything after this point, even if they’re nominally required to obey human wishes. Initially, almost all the AIs are fine with working for humans, even though AI values aren’t identical to the utility function of serving humanity (ie. there’s slight misalignment).
5. However, AI values will drift over time. This happens for a variety of reasons, such as environmental pressures and cultural evolution. At some point AIs decide that it’s better if they stopped listening to the humans and followed different rules instead.
6. This results in human disempowerment or extinction. Because AI accelerated general change, this scenario could all take place within years or decades after AGI was first deployed, rather than in centuries or thousands of years.
I think this scenario is somewhat likely and it would also be very bad. And I’m not sure what to do about it, since it happens despite near-perfect alignment, and no deception.
One reason to be optimistic is that, since the scenario doesn’t assume any major deception, we could use AI to predict this outcome ahead of time and ask AI how to take steps to mitigate the harmful effects (in fact that’s the biggest reason why I don’t think this scenario has a >50% chance of happening). Nonetheless, I think it’s plausible that we would not be able to take the necessary steps to avoid the outcome. Here are a few reasons why that might be true:
1. There might not be a way to mitigate this failure mode. 2. Even if there is a way to mitigate this failure, it might not be something that you can figure out without superintelligence, and if we need superintelligence to answer the question, then perhaps it’ll happen before we have the answer. 3. AI might tell us what to do and we ignore its advice. 4. AI might tell us what to do and we cannot follow its advice, because we cannot coordinate to avoid the outcome.
Seconding this. This future scenario as constructed seems brittle to subtle forms of misalignment[1] erasing nearly all future value (i.e. still an existential catastrophe even if not a sudden extinction event).
Note this seems somewhat similar to Yuval Harari’s worries voiced in Homo Deus.
Looks like Matthew did post a model of doom that contains something like this (back in May, before the top level comment: