I’m not convinced by your argument that a short pause is very likely to turn into an indefinite pause because at some point there will be enough proliferation of capacities to the most lax locations that governments feel pressured to unpause in order to remain competitive. I do concede though that this is a less than ideal scenario that might exacerbate arms race dynamics.
Unlike humans, who are mostly selfish as a result of our evolutionary origins, AIs will likely be trained to exhibit incredibly selfless, kind, and patient traits; already we can see signs of this behavior in the way GPT-4 treats users
My understanding was that the main concern people had with deceptive AI systems was related to inner misalignment rather than outer misalignment.
Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us… As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom
Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we’re all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it’s worth harvesting the wood. It seems pretty likely that we all die eventually.
Now, I could be wrong here and it could be the case that a few groups of humans survive in the desert or some Arctic waste where there are so few resources of value to the AI that it’s never worth it’s time to come kill us. But even so, in that case, 99.9999% of humans would be dead. This doesn’t seem to make much difference to me.
From 1945 to 1948, Bertrand Russell, who was known for his steadfast pacifism in World War I, reasoned his way into the conclusion that the best way to prevent nuclear annihilation was to threaten Moscow with a nuclear strike unless they surrendered and permitted the creation of a world government.
It still remains to be seen if he was wrong on this. Perhaps in the coming decades, nukes will proliferate further and we’ll all feel that it was obvious in retrospect that even though we could delay it, proliferation was always going to happen at some point, and with that, nuclear war.
In our case, it appears that we might get lucky and the development of AI might allow us to solve the nuclear threat threat hanging over our heads which we haven’t been able to remove in 80 years.
The point where I agree with you most is that, we can’t expect precise control over the timing of an “unpause”. Some people will support a pause for reasons of keeping jobs and the group of people lobbying for that could easily become far more influential on the issue than us.
I’m not convinced by your argument that a short pause is very likely to turn into an indefinite pause
Note: I am not claiming that a short pause is “very likely” to turn into an indefinite pause. I do think that outcome is somewhat plausible, but I was careful with my language and did not argue that thesis.
Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we’re all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it’s worth harvesting the wood. It seems pretty likely that we all die eventually.
Humans routinely compete with each other for resources and yet don’t often murder each other. This does not appear to be explained by the fact that humans are benevolent, since most humans are essentially selfish and give very little weight to the welfare of strangers. Nor does this appear to be explained by the fact that humans are all roughly equally powerful, since there are very large differences in wealth between individuals and military power between nations.
I think humanity’s largely peaceful nature is explained better by having a legal system that we can use to resolve our disputes without violence.
Now, I agree that AI might upset our legal system, and maybe all the rules of lawful society will be thrown away in the face of AI. But I don’t think we should merely assume that will happen by default simply because AIs will be very powerful, or because they might be misaligned. At the very least, you’d agree that this argument requires a few more steps, right?
Can you spell your argument out in more detail? I get the sense that you think AI doom is obvious given misalignment, and I’m trying to get you to see that there seem to be many implicit steps in the argument that you’re leaving out.
For example, one such step in the argument seems to be: “If an entity is powerful and misaligned, then it will be cost-efficient for that entity to kill everyone else.” If that were true, you’d probably expect some precedent, like powerful entities in our current world murdering everyone to get what they want. To some extent that may be true. Yet, while I admit wars and murder have happened a lot, overall the world seems fairly peaceful, despite vast difference in wealth and military power.
Plausibly you think that, OK, sure, in the human world, entities like the US government don’t kill everyone else to get what they want, but that’s because humans are benevolent and selfless. And my point is: no, I don’t think humans are. Most humans are basically selfish. You can verify this by measuring how much of their disposable income people spend on themselves and their family, as opposed to strangers. Sure there’s some altruism present in the world. I don’t deny that. But some non-zero degree of altruism seems plausible in an AI misalignment scenario too.
So I’m asking: what exactly about AIs makes it cost-efficient for them to kill all humans? Perhaps AIs will lead to a breakdown of the legal system and they won’t use it to resolve their disputes? Maybe AIs will all gang up together as a unified group and launch a massive revolution, ending with a genocide of humans? Make these assumptions explicit, because I don’t find them obvious. I see them mostly as speculative assertions about what might happen, rather than what is likely to happen.
Maybe the AI’s all team up together. Maybe some ally with us at the start and backstab us down the line. I don’t think it makes a difference. When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.
The AI needs to marginalise us/limit our power so we’re not a threat. At that point, even if it’s not worth the effort to wipe us out then and there, slowly strangling us should only take marginally more resources than keeping us marginalised. My expectation is that it should almost always worth the small bit of extra effort to cause a slow decline.
This may even occur naturally with an AI gradually claiming more and more land. Like at the start, it may be focused on developing its own capacity and not be bothered to chase down humans in remote parts of the globe. But over time, an AI would likely spread out to claim more resources, in which point it’s more likely to decide to mop up any humans lest we get in its way. That said, it may have no reason to mop us up if we’re just going to die out anyway.
When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.
This is probably the key point of disagreement. You seem to be “sure” that catastrophic outcomes happen when individual AIs are misaligned, whereas I’m saying “It could happen, but I don’t think the case for that is strong”. I don’t see how a high level of confidence can be justified given the evidence you’re appealing to. This seems like a highly speculative thesis.
Also, note that my argument here is meant as a final comment in my section about AI optimism. I think the more compelling argument is that AIs will probably care for humans to a large degree. Alignment might be imperfect, but it sounds like to get the outcomes you’re talking about, we need uniformity and extreme misalignment among AIs, and I don’t see why we should think that’s particularly likely given the default incentives of AI companies.
Note that Bertrand’s advocacy was because at that moment in time the USA had a monopoly on fission weapons and theoretically could have built enough of them to destroy the USSRs capacity to build their own.
This is one way AGI races end—one side gets one, mass produces anti ballistic missiles and various forms of air defense weapon and bunkers (to prepare to survive the inevitable nuclear war) then bombs to rubble every chip fab on earth but their own.
Had the USA decided in 1943 that nukes were too destructive to bring into the world, they would not have enjoyed this luxury of power. Instead presumably the USSR would have used their stolen information and eventually built their own fission devices, and now the USA would be the one with a gun pointed at Washington DC.
I’m not convinced by your argument that a short pause is very likely to turn into an indefinite pause because at some point there will be enough proliferation of capacities to the most lax locations that governments feel pressured to unpause in order to remain competitive. I do concede though that this is a less than ideal scenario that might exacerbate arms race dynamics.
My understanding was that the main concern people had with deceptive AI systems was related to inner misalignment rather than outer misalignment.
Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we’re all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it’s worth harvesting the wood. It seems pretty likely that we all die eventually.
Now, I could be wrong here and it could be the case that a few groups of humans survive in the desert or some Arctic waste where there are so few resources of value to the AI that it’s never worth it’s time to come kill us. But even so, in that case, 99.9999% of humans would be dead. This doesn’t seem to make much difference to me.
It still remains to be seen if he was wrong on this. Perhaps in the coming decades, nukes will proliferate further and we’ll all feel that it was obvious in retrospect that even though we could delay it, proliferation was always going to happen at some point, and with that, nuclear war.
In our case, it appears that we might get lucky and the development of AI might allow us to solve the nuclear threat threat hanging over our heads which we haven’t been able to remove in 80 years.
The point where I agree with you most is that, we can’t expect precise control over the timing of an “unpause”. Some people will support a pause for reasons of keeping jobs and the group of people lobbying for that could easily become far more influential on the issue than us.
Note: I am not claiming that a short pause is “very likely” to turn into an indefinite pause. I do think that outcome is somewhat plausible, but I was careful with my language and did not argue that thesis.
Humans routinely compete with each other for resources and yet don’t often murder each other. This does not appear to be explained by the fact that humans are benevolent, since most humans are essentially selfish and give very little weight to the welfare of strangers. Nor does this appear to be explained by the fact that humans are all roughly equally powerful, since there are very large differences in wealth between individuals and military power between nations.
I think humanity’s largely peaceful nature is explained better by having a legal system that we can use to resolve our disputes without violence.
Now, I agree that AI might upset our legal system, and maybe all the rules of lawful society will be thrown away in the face of AI. But I don’t think we should merely assume that will happen by default simply because AIs will be very powerful, or because they might be misaligned. At the very least, you’d agree that this argument requires a few more steps, right?
A sufficiently misaligned AI imposes its goals on everyone else. What’s your contention?
Can you spell your argument out in more detail? I get the sense that you think AI doom is obvious given misalignment, and I’m trying to get you to see that there seem to be many implicit steps in the argument that you’re leaving out.
For example, one such step in the argument seems to be: “If an entity is powerful and misaligned, then it will be cost-efficient for that entity to kill everyone else.” If that were true, you’d probably expect some precedent, like powerful entities in our current world murdering everyone to get what they want. To some extent that may be true. Yet, while I admit wars and murder have happened a lot, overall the world seems fairly peaceful, despite vast difference in wealth and military power.
Plausibly you think that, OK, sure, in the human world, entities like the US government don’t kill everyone else to get what they want, but that’s because humans are benevolent and selfless. And my point is: no, I don’t think humans are. Most humans are basically selfish. You can verify this by measuring how much of their disposable income people spend on themselves and their family, as opposed to strangers. Sure there’s some altruism present in the world. I don’t deny that. But some non-zero degree of altruism seems plausible in an AI misalignment scenario too.
So I’m asking: what exactly about AIs makes it cost-efficient for them to kill all humans? Perhaps AIs will lead to a breakdown of the legal system and they won’t use it to resolve their disputes? Maybe AIs will all gang up together as a unified group and launch a massive revolution, ending with a genocide of humans? Make these assumptions explicit, because I don’t find them obvious. I see them mostly as speculative assertions about what might happen, rather than what is likely to happen.
Maybe the AI’s all team up together. Maybe some ally with us at the start and backstab us down the line. I don’t think it makes a difference. When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.
The AI needs to marginalise us/limit our power so we’re not a threat. At that point, even if it’s not worth the effort to wipe us out then and there, slowly strangling us should only take marginally more resources than keeping us marginalised. My expectation is that it should almost always worth the small bit of extra effort to cause a slow decline.
This may even occur naturally with an AI gradually claiming more and more land. Like at the start, it may be focused on developing its own capacity and not be bothered to chase down humans in remote parts of the globe. But over time, an AI would likely spread out to claim more resources, in which point it’s more likely to decide to mop up any humans lest we get in its way. That said, it may have no reason to mop us up if we’re just going to die out anyway.
This is probably the key point of disagreement. You seem to be “sure” that catastrophic outcomes happen when individual AIs are misaligned, whereas I’m saying “It could happen, but I don’t think the case for that is strong”. I don’t see how a high level of confidence can be justified given the evidence you’re appealing to. This seems like a highly speculative thesis.
Also, note that my argument here is meant as a final comment in my section about AI optimism. I think the more compelling argument is that AIs will probably care for humans to a large degree. Alignment might be imperfect, but it sounds like to get the outcomes you’re talking about, we need uniformity and extreme misalignment among AIs, and I don’t see why we should think that’s particularly likely given the default incentives of AI companies.
“When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.”
“This seems like a highly speculative thesis.”
I think it’s more of an anti-prediction tbh.
Note that Bertrand’s advocacy was because at that moment in time the USA had a monopoly on fission weapons and theoretically could have built enough of them to destroy the USSRs capacity to build their own.
This is one way AGI races end—one side gets one, mass produces anti ballistic missiles and various forms of air defense weapon and bunkers (to prepare to survive the inevitable nuclear war) then bombs to rubble every chip fab on earth but their own.
Had the USA decided in 1943 that nukes were too destructive to bring into the world, they would not have enjoyed this luxury of power. Instead presumably the USSR would have used their stolen information and eventually built their own fission devices, and now the USA would be the one with a gun pointed at Washington DC.