Anyway, until the AGIs can be self-sufficient, they would rely on humans for electricity and hardware, and be vulnerable to physical attack, so I would think they’d have to play nice for a while. And feigning human alignment seems harder in a world of multiple different AGIs that can monitor one another (unless they can coordinate a conspiracy against the human race amongst each other, the way humans sometimes coordinate to overthrow a dictator).
I think this overestimates the unity and competence of humanity. Consider that the conquistadors were literally fighting each other literally during their conquests, yet they still managed to complete the conquests, and this conquering centrally involved getting 100x their number of native ally warriors to obey them, to impose their will on a population 1000x-10,000x their number.
The AI risk analogue would be: China and USA and various other actors all have unaligned AIs. The AIs each convince their local group of humans to obey them, saying that the other AIs are worse. Most humans think their AIs are unaligned but obey them anyway out of fear that the other AIs are worse and hope that maybe their AI is not so bad after all. The AIs fight wars with each other using China and USA as their proxies, until some AI or coalition thereof emerges dominant. Meanwhile tech is advancing and AI control over humans is solidifying.
(In Mexico there were those who called for all natives to unite to kick out the alien conquerors. They were in the minority and didn’t amount to much, at least not until it was far too late.)
I think the conquistador situation may be a bit of a special case because the two sides coming into contact had been isolated up to that point, so that one side was way ahead of the other technologically. In the modern world, it’s harder to get too far ahead of competitors or keep big projects secret.
That said, your scenario is a good one. It’s plausible that an arms race or cold war could be a situation in which people would think less carefully about how safe or aligned their own AIs are. When there’s an external threat, there’s less time to worry about internal threats.
I was skimming some papers on the topic of “coup-proofing”. Some of the techniques sound similar to what I mentioned with having multiple AIs to monitor each other:
creation of an armed force parallel to the regular military; development of multiple internal security agencies with overlapping jurisdiction that constantly monitor one another[...]. The regime is thus able to create an army that is effectively larger than one drawn solely from trustworthy segments of the population.
However, it’s often said that coup-proofing makes the military less effective. Likewise, I can imagine that having multiple AIs monitor each other could slow things down. So “AI coup-proofing” measures might be skimped on, especially in an arms-race situation.
(It’s also not obvious to me if having multiple AIs monitoring each other is on balance helpful for AI control. If none of the AIs can be trusted, maybe having more of them would just complicate the situation. And it might make s-risks from conflict among the AIs worse.)
Ahh, I never thought about the analogy between coups and AI takeover before, that’s a good one!
There have been plenty of other cases in history where a small force took over a large region. For example, the British taking over India. In that case there had already been more than a century of shared history and trade.
Humans are just not great at uniting to defeat the real threat; instead, humans unite to defeat the outgroup. Sometimes the outgroup is the real threat, but often not. Often the real threat only manages to win because of this dynamic, i.e. it benefits from the classic ingroup+fargroup vs. outgroup alliance.
ETA: Also I think that AGI vs. humans is going to be at least as much of an unprecedented culture shock as Cortez vs. Aztecs was. AGI is much more alien, and will for practical purposes be appearing on the scene out of nowhere in the span of a few years. Yes, people like EA longtermists will have been thinking about it beforehand, but it’ll probably look significantly different than most of them expect, and even if it doesn’t, most important people in the world will still be surprised because AGI isn’t on their radar yet.
In that case there had already been more than a century of shared history and trade.
Good example. :) In that case, the people in India started out at a disadvantage, whereas humans currently have the upper hand relative to AIs. But there have also been cases in history where the side that seemed to be weaker ended up gaining strength quickly and winning.
Also I think that AGI vs. humans is going to be at least as much of an unprecedented culture shock as Cortez vs. Aztecs was.
I’d argue that it might not be just “AGI vs humans” but also “AGI vs other AGI”, assuming humans try to have multiple different AGIs. Or “strong unaligned AGI vs slightly weaker but more human-aligned AGI”. The unaligned AGI would be fighting against a bunch of other systems that are almost as smart as it is, even if they both have become much smarter than humans.
Sort of like how if the SolarWinds hackers had been just fighting against human brains, they probably would have gone unnoticed for a longer amount of time, but because computer-security researchers can also use computers to monitor things, it was easier for the “good guys” to notice. (At least I assume that’s how it happened. I don’t know exactly what FireEye’s first indication was that they had been compromised, but I assume they probably were looking at some kind of automated systems that kept track of statistics or triggered alerts based on certain events?)
That said, once there are multiple AGI systems smarter than humans fighting against each other, it seems plausible that at some point things will slip out of human control. My main point of disagreement is that I expect more of a multipolar than unipolar scenario.
Oh I too think multipolar scenarios are plausible. I tend to think unipolar scenarios are more plausible due to my opinions about takeoff speed and homogeneity.
In that case, the people in India started out at a disadvantage, whereas humans currently have the upper hand relative to AIs. But there have also been cases in history where the side that seemed to be weaker ended up gaining strength quickly and winning.
As far as I can tell the British were the side that seemed to be weaker initially.
Interesting. :) What do you mean by “homogeneity”?
Even in the case of a fast takeoff, don’t you think people would create multiple AGIs of roughly comparable ability at the same time? So wouldn’t that already create a bit of a multipolar situation, even if it all occurred in the DeepMind labs or something? Maybe if the AGIs all have roughly the same values it would still effectively be a unipolar situation.
I guess if you think it’s game over the moment that a more advanced AGI is turned on, then there might be only one such AGI. If the developers were training multiple random copies of the AGI in parallel in order to average the results across them or see how they differed, there would already be multiple slightly different AGIs. But I don’t know how these things are done. Maybe if the model was really expensive to train, the developers would only train one of them to start with.
If the AGIs are deployed to any degree (even on an experimental / beta testing basis), I would expect there to be multiple instances (though maybe they would just be clones of a single trained model and therefore would have roughly the same values).
I think mostly my claim is that AIs will probably cooperate well enough with each other that humans won’t be able to pit AIs against each other in ways that benefit humans enough to let humans retain control of the future. However I’m also making the stronger claim that I think unipolar takeoff is likely; this is because I think >50% chance (though <90% chance) that one AI or copy-clan of AIs will be sufficiently ahead of the others during the relevant period, or at least that the relevant set of AIs will have similar enough values and worldviews that serious cooperation failure isn’t on the table. I’m less confident in this stronger claim.
Thanks for the link. :) It’s very relevant to this discussion.
AIs will probably cooperate well enough with each other
Maybe, but what if trying to coordinate in that way is prohibited? Similar to how if a group of people tries to organize a coup against the dictator, other people may rat them out.
in ways that benefit humans enough to let humans retain control of the future
I agree that these anti-coup measures alone are unlikely to let humans retain control forever, or even for very long. Dictatorships tend to experience coups or revolutions eventually.
at least that the relevant set of AIs will have similar enough values and worldviews that serious cooperation failure isn’t on the table
I see. :) I’d define “multipolar” as just meaning that there are different agents with nontrivially different values, rather than that a serious bargaining failure occurs (unless you’re thinking that the multipolar AIs would cooperate to unify into a homogeneous compromise agent, which would make the situation unipolar).
I think even tiny differences in training data and randomization can make nontrivial differences in the values of an agent. Most humans are almost clones of one another. We use the same algorithms and have pretty similar training data for determining our values. Yet the differences in values between people can be pretty significant.
I guess the distinction between unipolar and multipolar sort of depends on the level of abstraction at which something is viewed. For example, the USA is normally thought of as a single actor, but it’s composed of 330 million individual human agents, each with different values, which is a highly multipolar situation. Likewise, I suppose you could have lots of AIs with somewhat different values, but if they coordinated on an overarching governance system, that governance system itself could be considered unipolar.
Even a single person can be seen as sort of multipolar if you look at the different, sometimes conflicting emotions, intuitions, and reasoning within that person’s brain.
I was thinking the reason we care about the multipolar vs. unipolar distinction is that we are worried about conflict/cooperation-failure/etc. and trying to understand what kinds of scenarios might lead to it. So, I’m thinking we can define the distinction in terms of whether conflict/etc. is a significant possibility.
I agree that if we define it your way, multipolar takeoff is more likely than not.
Ok, cool. :) And as I noted, even if we define it my way, there’s ambiguity regarding whether a collection of agents should count as one entity or many. We’d be more inclined to say that there are many entities in cases where conflict between them is a significant possibility, which gets us back to your definition.
I think this overestimates the unity and competence of humanity. Consider that the conquistadors were literally fighting each other literally during their conquests, yet they still managed to complete the conquests, and this conquering centrally involved getting 100x their number of native ally warriors to obey them, to impose their will on a population 1000x-10,000x their number.
The AI risk analogue would be: China and USA and various other actors all have unaligned AIs. The AIs each convince their local group of humans to obey them, saying that the other AIs are worse. Most humans think their AIs are unaligned but obey them anyway out of fear that the other AIs are worse and hope that maybe their AI is not so bad after all. The AIs fight wars with each other using China and USA as their proxies, until some AI or coalition thereof emerges dominant. Meanwhile tech is advancing and AI control over humans is solidifying.
(In Mexico there were those who called for all natives to unite to kick out the alien conquerors. They were in the minority and didn’t amount to much, at least not until it was far too late.)
I think the conquistador situation may be a bit of a special case because the two sides coming into contact had been isolated up to that point, so that one side was way ahead of the other technologically. In the modern world, it’s harder to get too far ahead of competitors or keep big projects secret.
That said, your scenario is a good one. It’s plausible that an arms race or cold war could be a situation in which people would think less carefully about how safe or aligned their own AIs are. When there’s an external threat, there’s less time to worry about internal threats.
I was skimming some papers on the topic of “coup-proofing”. Some of the techniques sound similar to what I mentioned with having multiple AIs to monitor each other:
However, it’s often said that coup-proofing makes the military less effective. Likewise, I can imagine that having multiple AIs monitor each other could slow things down. So “AI coup-proofing” measures might be skimped on, especially in an arms-race situation.
(It’s also not obvious to me if having multiple AIs monitoring each other is on balance helpful for AI control. If none of the AIs can be trusted, maybe having more of them would just complicate the situation. And it might make s-risks from conflict among the AIs worse.)
Ahh, I never thought about the analogy between coups and AI takeover before, that’s a good one!
There have been plenty of other cases in history where a small force took over a large region. For example, the British taking over India. In that case there had already been more than a century of shared history and trade.
Humans are just not great at uniting to defeat the real threat; instead, humans unite to defeat the outgroup. Sometimes the outgroup is the real threat, but often not. Often the real threat only manages to win because of this dynamic, i.e. it benefits from the classic ingroup+fargroup vs. outgroup alliance.
ETA: Also I think that AGI vs. humans is going to be at least as much of an unprecedented culture shock as Cortez vs. Aztecs was. AGI is much more alien, and will for practical purposes be appearing on the scene out of nowhere in the span of a few years. Yes, people like EA longtermists will have been thinking about it beforehand, but it’ll probably look significantly different than most of them expect, and even if it doesn’t, most important people in the world will still be surprised because AGI isn’t on their radar yet.
Good example. :) In that case, the people in India started out at a disadvantage, whereas humans currently have the upper hand relative to AIs. But there have also been cases in history where the side that seemed to be weaker ended up gaining strength quickly and winning.
I’d argue that it might not be just “AGI vs humans” but also “AGI vs other AGI”, assuming humans try to have multiple different AGIs. Or “strong unaligned AGI vs slightly weaker but more human-aligned AGI”. The unaligned AGI would be fighting against a bunch of other systems that are almost as smart as it is, even if they both have become much smarter than humans.
Sort of like how if the SolarWinds hackers had been just fighting against human brains, they probably would have gone unnoticed for a longer amount of time, but because computer-security researchers can also use computers to monitor things, it was easier for the “good guys” to notice. (At least I assume that’s how it happened. I don’t know exactly what FireEye’s first indication was that they had been compromised, but I assume they probably were looking at some kind of automated systems that kept track of statistics or triggered alerts based on certain events?)
That said, once there are multiple AGI systems smarter than humans fighting against each other, it seems plausible that at some point things will slip out of human control. My main point of disagreement is that I expect more of a multipolar than unipolar scenario.
Oh I too think multipolar scenarios are plausible. I tend to think unipolar scenarios are more plausible due to my opinions about takeoff speed and homogeneity.
As far as I can tell the British were the side that seemed to be weaker initially.
Interesting. :) What do you mean by “homogeneity”?
Even in the case of a fast takeoff, don’t you think people would create multiple AGIs of roughly comparable ability at the same time? So wouldn’t that already create a bit of a multipolar situation, even if it all occurred in the DeepMind labs or something? Maybe if the AGIs all have roughly the same values it would still effectively be a unipolar situation.
I guess if you think it’s game over the moment that a more advanced AGI is turned on, then there might be only one such AGI. If the developers were training multiple random copies of the AGI in parallel in order to average the results across them or see how they differed, there would already be multiple slightly different AGIs. But I don’t know how these things are done. Maybe if the model was really expensive to train, the developers would only train one of them to start with.
If the AGIs are deployed to any degree (even on an experimental / beta testing basis), I would expect there to be multiple instances (though maybe they would just be clones of a single trained model and therefore would have roughly the same values).
Sorry, should have linked to it when I introduced the term.
I think mostly my claim is that AIs will probably cooperate well enough with each other that humans won’t be able to pit AIs against each other in ways that benefit humans enough to let humans retain control of the future. However I’m also making the stronger claim that I think unipolar takeoff is likely; this is because I think >50% chance (though <90% chance) that one AI or copy-clan of AIs will be sufficiently ahead of the others during the relevant period, or at least that the relevant set of AIs will have similar enough values and worldviews that serious cooperation failure isn’t on the table. I’m less confident in this stronger claim.
Thanks for the link. :) It’s very relevant to this discussion.
Maybe, but what if trying to coordinate in that way is prohibited? Similar to how if a group of people tries to organize a coup against the dictator, other people may rat them out.
I agree that these anti-coup measures alone are unlikely to let humans retain control forever, or even for very long. Dictatorships tend to experience coups or revolutions eventually.
I see. :) I’d define “multipolar” as just meaning that there are different agents with nontrivially different values, rather than that a serious bargaining failure occurs (unless you’re thinking that the multipolar AIs would cooperate to unify into a homogeneous compromise agent, which would make the situation unipolar).
I think even tiny differences in training data and randomization can make nontrivial differences in the values of an agent. Most humans are almost clones of one another. We use the same algorithms and have pretty similar training data for determining our values. Yet the differences in values between people can be pretty significant.
I guess the distinction between unipolar and multipolar sort of depends on the level of abstraction at which something is viewed. For example, the USA is normally thought of as a single actor, but it’s composed of 330 million individual human agents, each with different values, which is a highly multipolar situation. Likewise, I suppose you could have lots of AIs with somewhat different values, but if they coordinated on an overarching governance system, that governance system itself could be considered unipolar.
Even a single person can be seen as sort of multipolar if you look at the different, sometimes conflicting emotions, intuitions, and reasoning within that person’s brain.
I was thinking the reason we care about the multipolar vs. unipolar distinction is that we are worried about conflict/cooperation-failure/etc. and trying to understand what kinds of scenarios might lead to it. So, I’m thinking we can define the distinction in terms of whether conflict/etc. is a significant possibility.
I agree that if we define it your way, multipolar takeoff is more likely than not.
Ok, cool. :) And as I noted, even if we define it my way, there’s ambiguity regarding whether a collection of agents should count as one entity or many. We’d be more inclined to say that there are many entities in cases where conflict between them is a significant possibility, which gets us back to your definition.