Even if power-seeking APS systems are deployed, it’s not obvious that they would permanently disempower humanity. We may be able to stop the system in its tracks (by either literally or metaphorically “pulling the plug”). First, we need to consider the mechanisms by which AI systems attempt to takeover (i.e. disempower) humanity. Second, we need to consider various risk factors for a successful takeover attempt.
Hacking computer systems.…
Persuading, manipulating or coercing humans.…
Gain broad social influence… For instance, AI systems might be able to engage in electoral manipulation, steering voters towards policymakers less willing or able to prevent AIs systems being integrated into other key places of power.
Gaining access to money… If misaligned systems are rolled out into financial markets, they may be able to siphon off money without human detection.
Developing advanced technologies… An AI system adept at the science, engineering and manufacturing of nanotechnology, along with access to the physical world, might be able to rapidly construct and deploy dangerous nanosystems, leading to a “gray goo” scenario described by Drexler (1986).
I think the key weakness in this part of the argument is that it overlooks lawful, non-predatory strategies for satisfying goals. As a result, you give the impression that any AI that has non-human goals will, by default, take anti-social actions that harm others in pursuit of their goals. I believe this idea is false.
The concept of instrumental convergence, even if true[1], does not generally imply that almost all power-seeking agents will achieve their goals through nefarious means. Ordinary trade, compromise, and acting through the legal system (rather than outside of it) are usually rational means of achieving your goals.
Certainly among humans, a desire for resources (e.g. food, housing, material goods) does not automatically imply that humans will universally converge on unlawful or predatory behavior to achieve their goals. That’s because there are typically more benign ways of accomplishing these goals than theft or social manipulation. In other words, we can generally get what we want in a way that is not negative-sum and does not hurt other people as a side effect.
To the extent you think power-seeking behavior among humans is usually positive-sum, but will become negative-sum when in manifests in AIs, this premise needs to be justified. One cannot explain the positive sum-nature of the existing human world by positing that humans are aligned with each other and have pro-social values, as this appears to be a poor explanation for why humans obey the law.
Indeed, the legal system itself can be seen as a way for power-seeking misaligned agents to compromise on a framework that allows agents within it to achieve their goals efficiently, without hurting others. In a state of full mutual inter-alignment with other agents, criminal law would largely be unnecessary. Yet it is necessary, because humans in fact do not share all their goals with each other.
It is likely, of course, that AIs will exceed human intelligence. But this fact alone does not imply that AIs will take unlawful actions to pursue their goals, since the legal system could become better at coping with more intelligent agents at the same time AIs are incorporated into it.
We could imagine an analogous case in which genetically engineered humans are introduced into the legal system. As these modified humans get smarter over time, and begin taking on roles within the legal system itself, our institutions would adapt, and likely become more capable of policing increasingly sophisticated behavior. In this scenario, as in the case of AI, “smarter” does not imply a proclivity towards predatory and unlawful behavior in pursuit of one’s goals.
I personally doubt that the instrumental convergence thesis is true as it pertains to “sufficiently intelligent” AIs which were not purposely trained to have open-ended goals. I do not expect, for example, that GPT-5 or GPT-6 will spontaneously develop a desire to acquire resources or preserve their own existence, unless they are subject to specific fine-tuning that would reinforce those impulses.
From the full report,
I think the key weakness in this part of the argument is that it overlooks lawful, non-predatory strategies for satisfying goals. As a result, you give the impression that any AI that has non-human goals will, by default, take anti-social actions that harm others in pursuit of their goals. I believe this idea is false.
The concept of instrumental convergence, even if true[1], does not generally imply that almost all power-seeking agents will achieve their goals through nefarious means. Ordinary trade, compromise, and acting through the legal system (rather than outside of it) are usually rational means of achieving your goals.
Certainly among humans, a desire for resources (e.g. food, housing, material goods) does not automatically imply that humans will universally converge on unlawful or predatory behavior to achieve their goals. That’s because there are typically more benign ways of accomplishing these goals than theft or social manipulation. In other words, we can generally get what we want in a way that is not negative-sum and does not hurt other people as a side effect.
To the extent you think power-seeking behavior among humans is usually positive-sum, but will become negative-sum when in manifests in AIs, this premise needs to be justified. One cannot explain the positive sum-nature of the existing human world by positing that humans are aligned with each other and have pro-social values, as this appears to be a poor explanation for why humans obey the law.
Indeed, the legal system itself can be seen as a way for power-seeking misaligned agents to compromise on a framework that allows agents within it to achieve their goals efficiently, without hurting others. In a state of full mutual inter-alignment with other agents, criminal law would largely be unnecessary. Yet it is necessary, because humans in fact do not share all their goals with each other.
It is likely, of course, that AIs will exceed human intelligence. But this fact alone does not imply that AIs will take unlawful actions to pursue their goals, since the legal system could become better at coping with more intelligent agents at the same time AIs are incorporated into it.
We could imagine an analogous case in which genetically engineered humans are introduced into the legal system. As these modified humans get smarter over time, and begin taking on roles within the legal system itself, our institutions would adapt, and likely become more capable of policing increasingly sophisticated behavior. In this scenario, as in the case of AI, “smarter” does not imply a proclivity towards predatory and unlawful behavior in pursuit of one’s goals.
I personally doubt that the instrumental convergence thesis is true as it pertains to “sufficiently intelligent” AIs which were not purposely trained to have open-ended goals. I do not expect, for example, that GPT-5 or GPT-6 will spontaneously develop a desire to acquire resources or preserve their own existence, unless they are subject to specific fine-tuning that would reinforce those impulses.