This is great, thank you. Honestly it feels a little telling that this has barely been explored? Despite being THE x-risk? I get that the intervention point happens before it gets to this point, but knowing the problem is pretty core to prevention.
A force smarter/more powerful than us is scary, no matter what form it takes. But we (EA) feels a little swept up in one particular vision of AI timelines that doesn’t feel terribly grounded. I understand its important to assume the worst, but its also important to imagine what would be realistic and then intermingle the two. Maybe this is why the EA approach to AI risk feels blinkered to me. So much focus is on the worst possible outcome and far less on the most plausible outcome?
(or maybe I’m just outside the circles and all this is ground being trodden, I’m just not privy to it)
I agree that I’d love to see more work on this! (And I agree that the last story I talk about, of a very fast takeoff AI system with particularly advanced capabilities, seems unlikely to me—although others disagree, and think this “worst case” is also the most likely outcome.)
It’s worth noting again though that any particular story is unlikely to be correct. We’re trying to forecast the future, and good ways of forecasting should feel uncertain at the end, because we don’t know what the future will hold. Also, good work on this will (in my opinion) give us ideas about what many possible scenarios will look like . This sort of work (e.g. the first half of this article, rather than the second), often feels less concrete, but is, I think, more likely to be correct—and can inform actions that target many possible scenarios rather than one single unlikely event.
All that said, I’m excited to see work like OpenPhil’s nearcasting project which I find particularly clarifying and which will, I hope, improve our ability to prevent a catastrophe.
This profile by 80k is pretty bad in terms of just glossing over all the intermediate steps and reducing it all to “But one day, every single person in the world suddenly dies.”
Universal Paperclips is slightly better about this, showing the process of the AI gaining our trust before betraying us, but the key power-grab step is still reduced to just “release the hypnodrones”.
There are other places that have fleshed out the details of how misaligned power-seeking might play out, such as Holden Karnofsky’s post AI Could Defeat All Of Us Combined.
That particular story, in which I write “one day, every single person in the world suddenly dies”, is about a fast takeoff self-improvement scenario. In such scenarios, a sudden takeover is exactly what we should expect to occur, and the intermediate steps set out by Holden and others don’t apply to such scenarios. Any guessing about what sort of advanced technology would do this necessarily makes the scenario less likely, and I think such guesses (e.g. “hypnodrones”) are extremely likely to be false and aren’t useful or informative.
For what it’s worth, I personally agree that slow takeoff scenarios like those described by Holden (or indeed those I discuss in the rest of this article) are far more likely. That’s why I focus many different ways in which an AI could take over—rather than on any particular failure story. And, as I discuss, any particular combination of steps is necessarily less likely than the claim that any or all of these capabilities could be used.
But a significant fraction of people working on AI existential safety disagree with both of us, and think that a story which literally claims that a sufficiently advanced system will suddenly kill all humans is the most likely way for this catastrophe to play out! That’s why I also included a story which doesn’t explain these intermediate steps, even though my inside view is that this is less likely to occur.
I’m one of the AI researchers worried about fast takeoff. Yes, it’s probably incorrect to pick any particular sudden-death scenario and say it’s how it’ll happen, but you can provide some guesses and a better illustration of one or more possibilities. For example, have you read Valuable Humans In Transit? https://qntm.org/transit
This is great, thank you. Honestly it feels a little telling that this has barely been explored? Despite being THE x-risk? I get that the intervention point happens before it gets to this point, but knowing the problem is pretty core to prevention.
A force smarter/more powerful than us is scary, no matter what form it takes. But we (EA) feels a little swept up in one particular vision of AI timelines that doesn’t feel terribly grounded. I understand its important to assume the worst, but its also important to imagine what would be realistic and then intermingle the two. Maybe this is why the EA approach to AI risk feels blinkered to me. So much focus is on the worst possible outcome and far less on the most plausible outcome?
(or maybe I’m just outside the circles and all this is ground being trodden, I’m just not privy to it)
I agree that I’d love to see more work on this! (And I agree that the last story I talk about, of a very fast takeoff AI system with particularly advanced capabilities, seems unlikely to me—although others disagree, and think this “worst case” is also the most likely outcome.)
It’s worth noting again though that any particular story is unlikely to be correct. We’re trying to forecast the future, and good ways of forecasting should feel uncertain at the end, because we don’t know what the future will hold. Also, good work on this will (in my opinion) give us ideas about what many possible scenarios will look like . This sort of work (e.g. the first half of this article, rather than the second), often feels less concrete, but is, I think, more likely to be correct—and can inform actions that target many possible scenarios rather than one single unlikely event.
All that said, I’m excited to see work like OpenPhil’s nearcasting project which I find particularly clarifying and which will, I hope, improve our ability to prevent a catastrophe.
This profile by 80k is pretty bad in terms of just glossing over all the intermediate steps and reducing it all to “But one day, every single person in the world suddenly dies.”
Universal Paperclips is slightly better about this, showing the process of the AI gaining our trust before betraying us, but the key power-grab step is still reduced to just “release the hypnodrones”.
There are other places that have fleshed out the details of how misaligned power-seeking might play out, such as Holden Karnofsky’s post AI Could Defeat All Of Us Combined.
That particular story, in which I write “one day, every single person in the world suddenly dies”, is about a fast takeoff self-improvement scenario. In such scenarios, a sudden takeover is exactly what we should expect to occur, and the intermediate steps set out by Holden and others don’t apply to such scenarios. Any guessing about what sort of advanced technology would do this necessarily makes the scenario less likely, and I think such guesses (e.g. “hypnodrones”) are extremely likely to be false and aren’t useful or informative.
For what it’s worth, I personally agree that slow takeoff scenarios like those described by Holden (or indeed those I discuss in the rest of this article) are far more likely. That’s why I focus many different ways in which an AI could take over—rather than on any particular failure story. And, as I discuss, any particular combination of steps is necessarily less likely than the claim that any or all of these capabilities could be used.
But a significant fraction of people working on AI existential safety disagree with both of us, and think that a story which literally claims that a sufficiently advanced system will suddenly kill all humans is the most likely way for this catastrophe to play out! That’s why I also included a story which doesn’t explain these intermediate steps, even though my inside view is that this is less likely to occur.
I’m one of the AI researchers worried about fast takeoff. Yes, it’s probably incorrect to pick any particular sudden-death scenario and say it’s how it’ll happen, but you can provide some guesses and a better illustration of one or more possibilities. For example, have you read Valuable Humans In Transit? https://qntm.org/transit