My fundamental disagreement with Robin Hanson here is that he tends to view AIs either as ‘passive, predictably controllable tools of humans’ or as ‘sentient agents with their own rights & interests’. This dichotomy traces back to our basic human tendency to classify things as either ‘objects’ or ‘people/animals’, or ‘inanimate’ versus ‘animate’.
My worry is that the most dangerous near-term AIs will fall into the grey area between these two categories—they’ll have enough agency and autonomy to take powerful, tactically savvy actions on behalf of human individuals and groups telling them what to do (but whose instructions may not be followed accurately), but not quite enough agency and autonomy and wisdom to qualify as sentient agents in their own right, which could be granted rights and responsibilities as ‘digital citizens’.
In other words, the most dangerous near-term AIs will be kind of like the henchmen who are given semi-autonomous tasks by evil masterminds in criminal thriller movies. Those henchmen tend to be extremely strong, formidable, and scary, but they don’t always follow instructions very well, they tend to be rather over-literal and unimaginative, they often act with impulsive violence that’s unaligned with their boss’s long-term interests, and they often create more problems than they solve. ‘Good help is hard to find’, as they say.
For example, nation-states might use AI henchmen as cyberwarfare agents that attack foreign infrastructure. Now suppose a geopolitical crisis happens. Assume nation A believes there is a clear and present danger form enemy nation-state B. In nation A, there is urgent political and military pressure to ‘do something’. The ‘fog of war’ envelops the crisis situation, limiting the reliability of information, and making rational decision-making difficult. Nation A instructs its newest AI cyberwarfare Agent X to ‘degrade nation B’s ability to inflict damage on nation A’, subject to whatever constraints nation A’s leaders happen to think of in the heat of the crisis. Agent X is let lose upon the world. Now, suppose agent X is not an ASI, or even an AGI; there’s been no ‘foom’; it’s just very very good at cyberwarfare applied to enemy infrastructure, and it can think and act (digitally) a million times faster than its human ‘masters’ in nation A.
So Agent X, being the good henchman that it is, sets about wreaking havoc in Nation B. It follows the constraints it’s been given, at the literal level, but it doesn’t follow their spirit. It hacks whatever control systems it can for cars, trucks, airplanes, ships, subways, rail stations, buildings, installations, ports, traffic control systems, airports, bridges, dams, power plants, military bases, etc. Within a few hours, Nation B is paralyzed by mass chaos, with millions dead. As instructed, Agent X has degraded nation B’s ability to inflict damage on Nation A.
But then Nation B figures out what happened, and they unleash their own henchman, Agent Y, upon Nation A.… leading to a cycle of vengeance with escalating cyber-attacks between major nation-states, and colossal loss of life.
These kinds of scenarios, in my opinion, are much more likely in the next decade or so than a foom-based ASI takeover of the sort often envisioned by AI alignment thinkers. The problem, in short, isn’t that an AI becomes an evil genius in its own right. The problem is that AI henchmen, with (metaphorically) the strength of Superman and the speed of the Flash, get unleashed by political or corporate leaders who can’t fully anticipate or control what their AI minions end up doing.
My fundamental disagreement with Robin Hanson here is that he tends to view AIs either as ‘passive, predictably controllable tools of humans’ or as ‘sentient agents with their own rights & interests’. This dichotomy traces back to our basic human tendency to classify things as either ‘objects’ or ‘people/animals’, or ‘inanimate’ versus ‘animate’.
My worry is that the most dangerous near-term AIs will fall into the grey area between these two categories—they’ll have enough agency and autonomy to take powerful, tactically savvy actions on behalf of human individuals and groups telling them what to do (but whose instructions may not be followed accurately), but not quite enough agency and autonomy and wisdom to qualify as sentient agents in their own right, which could be granted rights and responsibilities as ‘digital citizens’.
In other words, the most dangerous near-term AIs will be kind of like the henchmen who are given semi-autonomous tasks by evil masterminds in criminal thriller movies. Those henchmen tend to be extremely strong, formidable, and scary, but they don’t always follow instructions very well, they tend to be rather over-literal and unimaginative, they often act with impulsive violence that’s unaligned with their boss’s long-term interests, and they often create more problems than they solve. ‘Good help is hard to find’, as they say.
For example, nation-states might use AI henchmen as cyberwarfare agents that attack foreign infrastructure. Now suppose a geopolitical crisis happens. Assume nation A believes there is a clear and present danger form enemy nation-state B. In nation A, there is urgent political and military pressure to ‘do something’. The ‘fog of war’ envelops the crisis situation, limiting the reliability of information, and making rational decision-making difficult. Nation A instructs its newest AI cyberwarfare Agent X to ‘degrade nation B’s ability to inflict damage on nation A’, subject to whatever constraints nation A’s leaders happen to think of in the heat of the crisis. Agent X is let lose upon the world. Now, suppose agent X is not an ASI, or even an AGI; there’s been no ‘foom’; it’s just very very good at cyberwarfare applied to enemy infrastructure, and it can think and act (digitally) a million times faster than its human ‘masters’ in nation A.
So Agent X, being the good henchman that it is, sets about wreaking havoc in Nation B. It follows the constraints it’s been given, at the literal level, but it doesn’t follow their spirit. It hacks whatever control systems it can for cars, trucks, airplanes, ships, subways, rail stations, buildings, installations, ports, traffic control systems, airports, bridges, dams, power plants, military bases, etc. Within a few hours, Nation B is paralyzed by mass chaos, with millions dead. As instructed, Agent X has degraded nation B’s ability to inflict damage on Nation A.
But then Nation B figures out what happened, and they unleash their own henchman, Agent Y, upon Nation A.… leading to a cycle of vengeance with escalating cyber-attacks between major nation-states, and colossal loss of life.
These kinds of scenarios, in my opinion, are much more likely in the next decade or so than a foom-based ASI takeover of the sort often envisioned by AI alignment thinkers. The problem, in short, isn’t that an AI becomes an evil genius in its own right. The problem is that AI henchmen, with (metaphorically) the strength of Superman and the speed of the Flash, get unleashed by political or corporate leaders who can’t fully anticipate or control what their AI minions end up doing.