There are many potential sources of x-risk from AI, and wide disagreement/uncertainty about which are the most important. To help move towards greater clarity, it seems valuable to have a better classification of potential sources of AI x-risk. This is my quick attempt to contribute to that. I don’t consider it to be fully satisfying or decisive in any way. Suggestions for improvement are very welcome!
Summary diagram
See here for a more comprehensive version of the diagram.
Misaligned power-seeking AI
This is the most discussed source of AI x-risk (e.g. it’s what people remember from reading Superintelligence). The worry is that highly capable and strategic AI agents will have instrumental incentives to gain and maintain power—since this will help them pursue their objectives more effectively—and this will lead to the permanent disempowerment of humanity. (More.)
AI exacerbates other sources of x-risk
As well as causing an existential catastrophe “in itself”, AI technology could exacerbate other sources of x-risk (this section), or x-risk factors (next section).
AI-enabled dystopia
The worry here is that AI technology causes humanity to get stuck in some state that is far short of our potential. There are at least three ways that this could happen:
Stable totalitarianism. AI could enable a relatively small group of people to obtain unprecedented levels of power, and to use this to control and subjugate the rest of the world for a long period of time (e.g. via advanced surveillance). (More.)
Value erosion. AI could increase the extent to which competitive/evolutionary pressure is a force shaping the future, in a way that leaves humanity essentially powerless even though we don’t explicitly “lose control” to AI systems. (More.)
“Lame” future. AI could make it possible to lock in features of the world for a very long time (e.g. certain values or norms, voting mechanisms, other governance structures) and these choices could simply be “lame”, leading to a worse long-term future that falls short of our potential. This scenario could also be called “insufficient reflection”.[1]
AI leads to deployment of technology that causes extinction or unrecoverable collapse
AI could lead to the development and deployment of technologies that cause an existential catastrophe, by enabling faster technological progress or altering incentives. For instance:
AI could speed up progress in biotechnology, making it easier to design or synthesise dangerous pathogens with relatively little expertise and readily available materials. (More).
AI could make full-scale nuclear war more likely, by making it easier to discover and destroy previously secure nuclear launch facilities and so undermining nuclear strategic stability. (More.)
AI exacerbates x-risk factors
AI makes conflict more likely/severe, which is an x-risk factor
AI could make conflict more likely or severe for various reasons, for instance by:
Enabling the development of new weapons which could cause mass destruction.
Enabling the automation of military decision-making which introduces new and more catastrophic sources of error, e.g. rapid unintentional escalation.
Influencing the strategic decision landscape faced by actors in a way that undermines stability or otherwise makes conflict more likely, e.g. by making it more difficult for states to explain their military decisions and so giving them a carte blanche to act more aggressively.
Conflict is a destabilising factor which reduces our ability to mitigate other potential x-risks and steer towards a flourishing future for humanity, e.g. because it erodes international trust and cooperation.
(Note: if the conflict is sufficiently severe to cause extinction or unrecoverable collapse, then it’s part of the above section, not this one. This section is about conflict as a risk factor, not the final blow.)
AI degrades epistemic processes, which is an x-risk factor
AI could worsen our epistemic processes: how information is produced and distributed, and the tools and processes we use to make decisions and evaluate claims. For example:
Self-interested groups could misuse sophisticated persuasion tools (developed using AI techniques) to gain influence and/or to promote harmful ideologies.
The world could splinter into isolated “epistemic communities” due to widespread use of persuasion tools or increasing personalisation of online experiences, even without deliberate misuse.
The increased awareness of the above could make it harder for anyone to evaluate the trustworthiness of any information source, reducing overall trust in information.
It’s likely that a degradation of epistemic processes would reduce our ability steer towards a flourish future, e.g. by causing a decline in trust in credible multipartisan sources, which could hamper attempts at cooperation and collective action.
S-risks from conflict between powerful AI systems
As AI systems become more capable and integral to society, we may also need to consider potential conflicts that could arise between AI systems, and especially the results of strategic threats by powerful AI systems (or AI-assisted humans) against altruistic values. For example, if it’s possible to create digital people (or other digital entities with moral patienthood), then advanced AI systems—even amoral ones—could be incentivised to threaten the creation of suffering digital people as a way of furthering their own goals (even if those goals are amoral). (More.)
Using Ord’s nomenclature from The Precipice, the “lame future” scenario is an instance of a desired dystopia, while the “stable totalitarianism” and “value erosion” scenarios are instances of an enforced dystopia and undesired dystopia, respectively.
Classifying sources of AI x-risk
There are many potential sources of x-risk from AI, and wide disagreement/uncertainty about which are the most important. To help move towards greater clarity, it seems valuable to have a better classification of potential sources of AI x-risk. This is my quick attempt to contribute to that. I don’t consider it to be fully satisfying or decisive in any way. Suggestions for improvement are very welcome!
Summary diagram
See here for a more comprehensive version of the diagram.
Misaligned power-seeking AI
This is the most discussed source of AI x-risk (e.g. it’s what people remember from reading Superintelligence). The worry is that highly capable and strategic AI agents will have instrumental incentives to gain and maintain power—since this will help them pursue their objectives more effectively—and this will lead to the permanent disempowerment of humanity. (More.)
AI exacerbates other sources of x-risk
As well as causing an existential catastrophe “in itself”, AI technology could exacerbate other sources of x-risk (this section), or x-risk factors (next section).
AI-enabled dystopia
The worry here is that AI technology causes humanity to get stuck in some state that is far short of our potential. There are at least three ways that this could happen:
Stable totalitarianism. AI could enable a relatively small group of people to obtain unprecedented levels of power, and to use this to control and subjugate the rest of the world for a long period of time (e.g. via advanced surveillance). (More.)
Value erosion. AI could increase the extent to which competitive/evolutionary pressure is a force shaping the future, in a way that leaves humanity essentially powerless even though we don’t explicitly “lose control” to AI systems. (More.)
“Lame” future. AI could make it possible to lock in features of the world for a very long time (e.g. certain values or norms, voting mechanisms, other governance structures) and these choices could simply be “lame”, leading to a worse long-term future that falls short of our potential. This scenario could also be called “insufficient reflection”.[1]
AI leads to deployment of technology that causes extinction or unrecoverable collapse
AI could lead to the development and deployment of technologies that cause an existential catastrophe, by enabling faster technological progress or altering incentives. For instance:
AI could speed up progress in biotechnology, making it easier to design or synthesise dangerous pathogens with relatively little expertise and readily available materials. (More).
AI could make full-scale nuclear war more likely, by making it easier to discover and destroy previously secure nuclear launch facilities and so undermining nuclear strategic stability. (More.)
AI exacerbates x-risk factors
AI makes conflict more likely/severe, which is an x-risk factor
AI could make conflict more likely or severe for various reasons, for instance by:
Enabling the development of new weapons which could cause mass destruction.
Enabling the automation of military decision-making which introduces new and more catastrophic sources of error, e.g. rapid unintentional escalation.
Influencing the strategic decision landscape faced by actors in a way that undermines stability or otherwise makes conflict more likely, e.g. by making it more difficult for states to explain their military decisions and so giving them a carte blanche to act more aggressively.
(More.)
Conflict is a destabilising factor which reduces our ability to mitigate other potential x-risks and steer towards a flourishing future for humanity, e.g. because it erodes international trust and cooperation.
(Note: if the conflict is sufficiently severe to cause extinction or unrecoverable collapse, then it’s part of the above section, not this one. This section is about conflict as a risk factor, not the final blow.)
AI degrades epistemic processes, which is an x-risk factor
AI could worsen our epistemic processes: how information is produced and distributed, and the tools and processes we use to make decisions and evaluate claims. For example:
Self-interested groups could misuse sophisticated persuasion tools (developed using AI techniques) to gain influence and/or to promote harmful ideologies.
The world could splinter into isolated “epistemic communities” due to widespread use of persuasion tools or increasing personalisation of online experiences, even without deliberate misuse.
The increased awareness of the above could make it harder for anyone to evaluate the trustworthiness of any information source, reducing overall trust in information.
(More.)
It’s likely that a degradation of epistemic processes would reduce our ability steer towards a flourish future, e.g. by causing a decline in trust in credible multipartisan sources, which could hamper attempts at cooperation and collective action.
S-risks from conflict between powerful AI systems
As AI systems become more capable and integral to society, we may also need to consider potential conflicts that could arise between AI systems, and especially the results of strategic threats by powerful AI systems (or AI-assisted humans) against altruistic values. For example, if it’s possible to create digital people (or other digital entities with moral patienthood), then advanced AI systems—even amoral ones—could be incentivised to threaten the creation of suffering digital people as a way of furthering their own goals (even if those goals are amoral). (More.)
Using Ord’s nomenclature from The Precipice, the “lame future” scenario is an instance of a desired dystopia, while the “stable totalitarianism” and “value erosion” scenarios are instances of an enforced dystopia and undesired dystopia, respectively.