Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other.
I would much more say that it fails because humans have human values.
Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I don’t exactly see why; the obvious hunter-gatherer rejoinder could be ‘we built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?’.
Regardless, it doesn’t seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because “safety” is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that they’ll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.
You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?
‘Wow, I can’t imagine a way to do something so ambitious without causing lots of carnage in the process’ is definitely not the argument! On the contrary, I think it’s pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.
The instrumental convergence argument isn’t ‘I can’t imagine a way to do this without killing everyone’; it’s that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values aren’t included in the objective being maximized.
If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:
Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
Is ‘build fast-running human whole-brain emulation’ an ambitious enough task to fall under the ‘sufficiently powerful’ criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldn’t be safe for other similarly-hard tasks?
The step that’s missing for me is the one where the paperclip maximiser gets the opportunity to kill everyone.
Your talk of “plans” and the dangers of executing them seems to assume that the AI has all the power it needs to execute the plans. I don’t think the AI crowd has done enough to demonstrate how this could happen.
If you drop a naked human in amongst some wolves I don’t think the human will do very well despite its different goals and enormous intellectual advantage. Similarly, I don’t see how a fledgling sentient AGI on OpenAI servers can take over enough infrastructure that it poses a serious threat. I’ve not seen a convincing theory for how this would happen. Mailorder nanobots seem unrealistic (too hard to simulate the quantum effects in protein chemistry), the AI talking itself out of its box is another suggestion that seems far-fetched (main evidence seems to be some chat games that Yudkowsky played a few times?), a gradual takeover by its voluntary uptake into more an more of our lives seems slow enough to stop.
Is your question basically how an AGI would gain power in the beginning in order to get to a point where it could execute on a plan to annihilate humans?
I would argue that:
Capitalists would quite readily give the AGI all the power it wants, in order to stay competitive and drive profits.
Some number of people would deliberately help the AGI gain power just to “see what happens” or specifically to hurt humanity. Think ChaosGPT, or consider the story of David Charles Hahn.
Some number of lonely, depressed, or desperate people could be persuaded over social media to carry out actions in the real world.
Considering these channels, I’d say that a sufficiently intelligent AGI with as much access to the real world as ChatGPT has now would have all the power needed to increase its power to the point of being able to annihilate humans.
I would much more say that it fails because humans have human values.
Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I don’t exactly see why; the obvious hunter-gatherer rejoinder could be ‘we built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?’.
Regardless, it doesn’t seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because “safety” is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that they’ll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.
‘Wow, I can’t imagine a way to do something so ambitious without causing lots of carnage in the process’ is definitely not the argument! On the contrary, I think it’s pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.
The instrumental convergence argument isn’t ‘I can’t imagine a way to do this without killing everyone’; it’s that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values aren’t included in the objective being maximized.
If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:
Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
Is ‘build fast-running human whole-brain emulation’ an ambitious enough task to fall under the ‘sufficiently powerful’ criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldn’t be safe for other similarly-hard tasks?
The step that’s missing for me is the one where the paperclip maximiser gets the opportunity to kill everyone.
Your talk of “plans” and the dangers of executing them seems to assume that the AI has all the power it needs to execute the plans. I don’t think the AI crowd has done enough to demonstrate how this could happen.
If you drop a naked human in amongst some wolves I don’t think the human will do very well despite its different goals and enormous intellectual advantage. Similarly, I don’t see how a fledgling sentient AGI on OpenAI servers can take over enough infrastructure that it poses a serious threat. I’ve not seen a convincing theory for how this would happen. Mailorder nanobots seem unrealistic (too hard to simulate the quantum effects in protein chemistry), the AI talking itself out of its box is another suggestion that seems far-fetched (main evidence seems to be some chat games that Yudkowsky played a few times?), a gradual takeover by its voluntary uptake into more an more of our lives seems slow enough to stop.
Is your question basically how an AGI would gain power in the beginning in order to get to a point where it could execute on a plan to annihilate humans?
I would argue that:
Capitalists would quite readily give the AGI all the power it wants, in order to stay competitive and drive profits.
Some number of people would deliberately help the AGI gain power just to “see what happens” or specifically to hurt humanity. Think ChaosGPT, or consider the story of David Charles Hahn.
Some number of lonely, depressed, or desperate people could be persuaded over social media to carry out actions in the real world.
Considering these channels, I’d say that a sufficiently intelligent AGI with as much access to the real world as ChatGPT has now would have all the power needed to increase its power to the point of being able to annihilate humans.