If we define “doom” as “some AI(s) take over the world suddenly without our consent, and then quickly kill everyone” then my p(doom) is in the single digits. (If we define it as human extinction or disempowerment more generally, regardless of the cause, then I have a higher probability, especially over very long time horizons.)
The scenario that I find most likely in the future looks like this:
Although AI gets progressively better at virtually all capabilities, there aren’t any further sudden jumps in general AI capabilities, much greater in size than the jump from GPT-3.5 to GPT-4. (Which isn’t to say that AI progress will be slow.)
As a result of (1), researchers roughly know what AI is capable of doing in the near-term future, and how it poses a risk to us during the relevant parts of the takeoff.
Before AI becomes dangerous enough to pose a takeover risk, politicians pass legislation regulating AI deployment. This legislation is wide reaching, and allows the government to extensively monitor compute resources and software.
AI labs spend considerable amounts of money (many billions of dollars) on AI safety to ensure they can pass government audits, and win the public’s trust. (Another possibility is that foundation model development is taken over by the government.)
People are generally very cautious about deploying AI in safety-critical situations. AI takes over important management roles only after people become very comfortable with the technology.
AI alignment is not an intractable problem, and SGD naturally finds human-friendly agents when rewarding models for good behavior in an exhaustive set of situations that they might reasonably expect to encounter. This is especially true when combined with whatever clever tricks we come up with in the future. While catastrophic forms of deception are compatible with the behavior we reward AIs for, it is usually simpler to just “be good” than to lie.
Even though some value misalignment slips through the cracks after auditing AIs using e.g. mechanistic interpretability tools, AI misalignment is typically slight rather than severe. This means that most AIs aren’t interested in killing all humans.
Eventually, competitive pressures force people to adopt AIs to automate just about every possible type of job, including management, giving AIs effective control of the world.
Humans retire as their wages fall to near zero. A large welfare state is constructed to pay income to people who did not own capital prior to the AI transition period. Even though inequality becomes very high after this transition, the vast majority of people become better off in material terms than the norm in 2023.
The world continues to evolve with AIs in control. Even though humans are retired, history has not yet ended.
I’d be curious about what happens after 10. How long so biological humans survive? How long can they said to be “in control” of AI systems such that some group of humans could change the direction of civilization if they wanted to? How likely is deliberate misuse of AI to cause an existential catastrophe, relative to slowly losing control of society? What are the positive visions of the future, and which are the most negative?
Seconding this. This future scenario as constructed seems brittle to subtle forms of misalignment[1] erasing nearly all future value (i.e. still an existential catastrophe even if not a sudden extinction event).
Looks like Matthew did post a model of doom that contains something like this (back in May, before the top level comment:
My modal tale of AI doom looks something like the following:
1. AI systems get progressively and incrementally more capable across almost every meaningful axis.
2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and probably more.
3. At some point humans will retire since their labor is not worth much anymore. Humans will then cede all the keys of power to AI, while keeping nominal titles of power.
4. AI will control essentially everything after this point, even if they’re nominally required to obey human wishes. Initially, almost all the AIs are fine with working for humans, even though AI values aren’t identical to the utility function of serving humanity (ie. there’s slight misalignment).
5. However, AI values will drift over time. This happens for a variety of reasons, such as environmental pressures and cultural evolution. At some point AIs decide that it’s better if they stopped listening to the humans and followed different rules instead.
6. This results in human disempowerment or extinction. Because AI accelerated general change, this scenario could all take place within years or decades after AGI was first deployed, rather than in centuries or thousands of years.
I think this scenario is somewhat likely and it would also be very bad. And I’m not sure what to do about it, since it happens despite near-perfect alignment, and no deception.
One reason to be optimistic is that, since the scenario doesn’t assume any major deception, we could use AI to predict this outcome ahead of time and ask AI how to take steps to mitigate the harmful effects (in fact that’s the biggest reason why I don’t think this scenario has a >50% chance of happening). Nonetheless, I think it’s plausible that we would not be able to take the necessary steps to avoid the outcome. Here are a few reasons why that might be true:
1. There might not be a way to mitigate this failure mode. 2. Even if there is a way to mitigate this failure, it might not be something that you can figure out without superintelligence, and if we need superintelligence to answer the question, then perhaps it’ll happen before we have the answer. 3. AI might tell us what to do and we ignore its advice. 4. AI might tell us what to do and we cannot follow its advice, because we cannot coordinate to avoid the outcome.
I think that given the possibility of brain emulation, the division between AIs and humans you are drawing here may not be so clear in the longer term. Does that play into your model at all, or do you expect that even human emulations with various cognitive upgrades will be totally unable to compete with pure AIs?
I don’t expect human brain emulations to be competitive with pure software AI. The main reason is that by the time we have the ability to simulate the human brain, I expect our AIs will already be better than humans at almost any cognitive task. We still haven’t simulated the simplest of organisms, and there are some good a priori reasons to think that software is easier to improve than brain emulation technology.
I definitely think we could try to merge with AIs to try to keep up with the pace of the world in general, but I don’t think this approach would allow us to surpass ordinary software progress.
I agree with you that pure software AGI is very likely to happen sooner than brain emulation.
I’m wondering about your scenario for the farther future, near the point when humans start to retire from all jobs. I think that at this point, many humans would be understandably afraid of the idea that AIs could take over. People are not stupid and many are obsessed with security. At this point, brain emulation would be possible. It seems to me that there would therefore be large efforts in making those emulations competitive with pure software AI in important ways (not all ways of course, but some important ones, involving things like judgment). Possibly involving regulation to aid this process. Of course it is just a guess, but it seems likely to me that this would work to some extent. However, this may stretch the definition of what we currently consider a human in some ways.
If we define “doom” as “some AI(s) take over the world suddenly without our consent, and then quickly kill everyone” then my p(doom) is in the single digits. (If we define it as human extinction or disempowerment more generally, regardless of the cause, then I have a higher probability, especially over very long time horizons.)
The scenario that I find most likely in the future looks like this:
Although AI gets progressively better at virtually all capabilities, there aren’t any further sudden jumps in general AI capabilities, much greater in size than the jump from GPT-3.5 to GPT-4. (Which isn’t to say that AI progress will be slow.)
As a result of (1), researchers roughly know what AI is capable of doing in the near-term future, and how it poses a risk to us during the relevant parts of the takeoff.
Before AI becomes dangerous enough to pose a takeover risk, politicians pass legislation regulating AI deployment. This legislation is wide reaching, and allows the government to extensively monitor compute resources and software.
AI labs spend considerable amounts of money (many billions of dollars) on AI safety to ensure they can pass government audits, and win the public’s trust. (Another possibility is that foundation model development is taken over by the government.)
People are generally very cautious about deploying AI in safety-critical situations. AI takes over important management roles only after people become very comfortable with the technology.
AI alignment is not an intractable problem, and SGD naturally finds human-friendly agents when rewarding models for good behavior in an exhaustive set of situations that they might reasonably expect to encounter. This is especially true when combined with whatever clever tricks we come up with in the future. While catastrophic forms of deception are compatible with the behavior we reward AIs for, it is usually simpler to just “be good” than to lie.
Even though some value misalignment slips through the cracks after auditing AIs using e.g. mechanistic interpretability tools, AI misalignment is typically slight rather than severe. This means that most AIs aren’t interested in killing all humans.
Eventually, competitive pressures force people to adopt AIs to automate just about every possible type of job, including management, giving AIs effective control of the world.
Humans retire as their wages fall to near zero. A large welfare state is constructed to pay income to people who did not own capital prior to the AI transition period. Even though inequality becomes very high after this transition, the vast majority of people become better off in material terms than the norm in 2023.
The world continues to evolve with AIs in control. Even though humans are retired, history has not yet ended.
I’d be curious about what happens after 10. How long so biological humans survive? How long can they said to be “in control” of AI systems such that some group of humans could change the direction of civilization if they wanted to? How likely is deliberate misuse of AI to cause an existential catastrophe, relative to slowly losing control of society? What are the positive visions of the future, and which are the most negative?
Seconding this. This future scenario as constructed seems brittle to subtle forms of misalignment[1] erasing nearly all future value (i.e. still an existential catastrophe even if not a sudden extinction event).
Note this seems somewhat similar to Yuval Harari’s worries voiced in Homo Deus.
Looks like Matthew did post a model of doom that contains something like this (back in May, before the top level comment:
I think that given the possibility of brain emulation, the division between AIs and humans you are drawing here may not be so clear in the longer term. Does that play into your model at all, or do you expect that even human emulations with various cognitive upgrades will be totally unable to compete with pure AIs?
I don’t expect human brain emulations to be competitive with pure software AI. The main reason is that by the time we have the ability to simulate the human brain, I expect our AIs will already be better than humans at almost any cognitive task. We still haven’t simulated the simplest of organisms, and there are some good a priori reasons to think that software is easier to improve than brain emulation technology.
I definitely think we could try to merge with AIs to try to keep up with the pace of the world in general, but I don’t think this approach would allow us to surpass ordinary software progress.
I agree with you that pure software AGI is very likely to happen sooner than brain emulation.
I’m wondering about your scenario for the farther future, near the point when humans start to retire from all jobs. I think that at this point, many humans would be understandably afraid of the idea that AIs could take over. People are not stupid and many are obsessed with security. At this point, brain emulation would be possible. It seems to me that there would therefore be large efforts in making those emulations competitive with pure software AI in important ways (not all ways of course, but some important ones, involving things like judgment). Possibly involving regulation to aid this process. Of course it is just a guess, but it seems likely to me that this would work to some extent. However, this may stretch the definition of what we currently consider a human in some ways.