I feel the weakest part of this argument, and the weakest part of the AI Safety space generally, is the part where AI kills everyone (part 2, in this case).
You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?
I donât think that follows. This seems like what the average hunter-gatherer would have thought when made to imagine our modern commercial airlines or microprocessor industries: how could you achieve something requiring so much research, so many resources and so much coordination without enslaving huge swathes of society and killing anyone that gets in the way? And wouldnât the knowledge to do these things cause terrible new dangers?
Luckily the peasant is wrong: the path here has led up a slope of gradually increasing quality of life (some disagree).
I think the point is not that it is not conceivable that progress can continue with humans still being alive but with the game theoretic dilemma that whatever we humans want to do is unlikely to be exactly what some super powerful advanced AI would want to do. And because the advanced AI does not need us or depend on us, we simply lose and get to be ingredients for whatever that advanced AI is up to.
Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other. An unaligned advanced AI would not be so. A more appropriate example would be to look at the relationship between humans and insects. I donât know if you noticed but a lot of those are dying out right now because we simply donât care about or depend on them. The point with advanced AI would be that because it is potentially even more removed from us than we are from insects and also much more capable in achieving its goals that this whole competitive process which we all engage in is going to be much more competitive and faster when advanced AIs start playing in the game.
I donât want to be the bearer of bad news but I think it is not that easy to reject this analysis⌠it seems pretty simple and solid. I would love to know if there is some flaw in the reasoning. Would help me sleep better at night!
Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other.
I would much more say that it fails because humans have human values.
Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I donât exactly see why; the obvious hunter-gatherer rejoinder could be âwe built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?â.
Regardless, it doesnât seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because âsafetyâ is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that theyâll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.
You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?
âWow, I canât imagine a way to do something so ambitious without causing lots of carnage in the processâ is definitely not the argument! On the contrary, I think itâs pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.
The instrumental convergence argument isnât âI canât imagine a way to do this without killing everyoneâ; itâs that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values arenât included in the objective being maximized.
If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:
Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
Is âbuild fast-running human whole-brain emulationâ an ambitious enough task to fall under the âsufficiently powerfulâ criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldnât be safe for other similarly-hard tasks?
The step thatâs missing for me is the one where the paperclip maximiser gets the opportunity to kill everyone.
Your talk of âplansâ and the dangers of executing them seems to assume that the AI has all the power it needs to execute the plans. I donât think the AI crowd has done enough to demonstrate how this could happen.
If you drop a naked human in amongst some wolves I donât think the human will do very well despite its different goals and enormous intellectual advantage. Similarly, I donât see how a fledgling sentient AGI on OpenAI servers can take over enough infrastructure that it poses a serious threat. Iâve not seen a convincing theory for how this would happen. Mailorder nanobots seem unrealistic (too hard to simulate the quantum effects in protein chemistry), the AI talking itself out of its box is another suggestion that seems far-fetched (main evidence seems to be some chat games that Yudkowsky played a few times?), a gradual takeover by its voluntary uptake into more an more of our lives seems slow enough to stop.
Is your question basically how an AGI would gain power in the beginning in order to get to a point where it could execute on a plan to annihilate humans?
I would argue that:
Capitalists would quite readily give the AGI all the power it wants, in order to stay competitive and drive profits.
Some number of people would deliberately help the AGI gain power just to âsee what happensâ or specifically to hurt humanity. Think ChaosGPT, or consider the story of David Charles Hahn.
Some number of lonely, depressed, or desperate people could be persuaded over social media to carry out actions in the real world.
Considering these channels, Iâd say that a sufficiently intelligent AGI with as much access to the real world as ChatGPT has now would have all the power needed to increase its power to the point of being able to annihilate humans.
I feel the weakest part of this argument, and the weakest part of the AI Safety space generally, is the part where AI kills everyone (part 2, in this case).
You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?
I donât think that follows. This seems like what the average hunter-gatherer would have thought when made to imagine our modern commercial airlines or microprocessor industries: how could you achieve something requiring so much research, so many resources and so much coordination without enslaving huge swathes of society and killing anyone that gets in the way? And wouldnât the knowledge to do these things cause terrible new dangers?
Luckily the peasant is wrong: the path here has led up a slope of gradually increasing quality of life (some disagree).
I think the point is not that it is not conceivable that progress can continue with humans still being alive but with the game theoretic dilemma that whatever we humans want to do is unlikely to be exactly what some super powerful advanced AI would want to do. And because the advanced AI does not need us or depend on us, we simply lose and get to be ingredients for whatever that advanced AI is up to.
Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other. An unaligned advanced AI would not be so. A more appropriate example would be to look at the relationship between humans and insects. I donât know if you noticed but a lot of those are dying out right now because we simply donât care about or depend on them. The point with advanced AI would be that because it is potentially even more removed from us than we are from insects and also much more capable in achieving its goals that this whole competitive process which we all engage in is going to be much more competitive and faster when advanced AIs start playing in the game.
I donât want to be the bearer of bad news but I think it is not that easy to reject this analysis⌠it seems pretty simple and solid. I would love to know if there is some flaw in the reasoning. Would help me sleep better at night!
I would much more say that it fails because humans have human values.
Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I donât exactly see why; the obvious hunter-gatherer rejoinder could be âwe built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?â.
Regardless, it doesnât seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because âsafetyâ is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that theyâll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.
âWow, I canât imagine a way to do something so ambitious without causing lots of carnage in the processâ is definitely not the argument! On the contrary, I think itâs pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.
The instrumental convergence argument isnât âI canât imagine a way to do this without killing everyoneâ; itâs that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values arenât included in the objective being maximized.
If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:
Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
Is âbuild fast-running human whole-brain emulationâ an ambitious enough task to fall under the âsufficiently powerfulâ criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldnât be safe for other similarly-hard tasks?
The step thatâs missing for me is the one where the paperclip maximiser gets the opportunity to kill everyone.
Your talk of âplansâ and the dangers of executing them seems to assume that the AI has all the power it needs to execute the plans. I donât think the AI crowd has done enough to demonstrate how this could happen.
If you drop a naked human in amongst some wolves I donât think the human will do very well despite its different goals and enormous intellectual advantage. Similarly, I donât see how a fledgling sentient AGI on OpenAI servers can take over enough infrastructure that it poses a serious threat. Iâve not seen a convincing theory for how this would happen. Mailorder nanobots seem unrealistic (too hard to simulate the quantum effects in protein chemistry), the AI talking itself out of its box is another suggestion that seems far-fetched (main evidence seems to be some chat games that Yudkowsky played a few times?), a gradual takeover by its voluntary uptake into more an more of our lives seems slow enough to stop.
Is your question basically how an AGI would gain power in the beginning in order to get to a point where it could execute on a plan to annihilate humans?
I would argue that:
Capitalists would quite readily give the AGI all the power it wants, in order to stay competitive and drive profits.
Some number of people would deliberately help the AGI gain power just to âsee what happensâ or specifically to hurt humanity. Think ChaosGPT, or consider the story of David Charles Hahn.
Some number of lonely, depressed, or desperate people could be persuaded over social media to carry out actions in the real world.
Considering these channels, Iâd say that a sufficiently intelligent AGI with as much access to the real world as ChatGPT has now would have all the power needed to increase its power to the point of being able to annihilate humans.