The paper is an interesting read, but I think that it unfortunately isn’t of much practical value down to the omission of a crucial consideration:
The paper rests on the assumption that alignment/control of artificial superintelligence (ASI) is possible. This has not been theoretically established, let alone assessed to be practically likely in the time we have before an intelligence explosion. As far as I know, there aren’t any sound supporting arguments for the assumption (and you don’t reference any), and in fact there are good arguments on the other side for why aligning or controlling ASI is fundamentally impossible.
AI Takeover is listed first in the Grand Challenges section, but it trumps all the others because it is the default outcome. You even say “we should expect AIs that can outsmart humans”, and “There are reasonable arguments for expecting misalignment, and subsequent takeover, as the ‘default’ outcome (without concerted efforts to prevent it).”, and “There is currently no widely agreed-upon solution to the problems of aligning and controlling advanced AI systems, and so leading experts currently see the risk of AI takeover as substantial.” I still don’t understand where the ~10% estimates are coming from though; [fn 93:] “just over 50% of respondents assigned a subjective probability of 10% or more to the possibility that, “human inability to control future advanced Al systems causing human extinction or similarly permanent and severe disempowerment of the human species” (Grace et al., ‘Thousands of AI Authors on the Future of AI’.)”]. They seem logically unfounded. What is happening in the other ~90%? I didn’t get any satisfactory answers when asking here a while back.
You say “In this paper, we won’t discuss AI takeover risk in depth, but that’s because it is already well-discussed elsewhere.” It’s fine that you want to talk about other stuff in the paper, but that doesn’t make it any less of a crucial consideration that overwhelms concern for all of the other issues!
You conclude by saying that “Many are admirably focused on preparing for a single challenge, like misaligned AI takeover… But focusing on one challenge is not the same as ignoring all others: if you are a single-issue voter on AI, you are probably making a mistake.” I disagree, because alignment of ASI hasn’t been shown to even be solvable in principle! It is the single most important issue by far. The others don’t materialise because they assume humans will be in control of ASI for the most part (which is very unlikely to happen). The only practical solution (which is also dissolves nearly all the other issues identified in the paper) is to prevent ASI from being built[1]. We need a well enforced global moratorium on ASI as soon as possible.
At least until either it can be built safely, or the world collectively decides to take whatever risk remains after a consensus on an alignment/control solution is reached. At which point the other issues identified in the paper become relevant.
Thanks for the comment. I agree that if you think AI takeover is the overwhelmingly most likely outcome from developing ASI, then preventing takeover (including by preventing ASI) should be your strong focus. Some comments, though —
Just because failing at alignment undermines ~every other issue, doesn’t mean that working on alignment is the only or overwehelmingly most important thing.[1] Tractability and likelihood also matters.
I’m not sure I buy that things are so stark as “there are no arguments against AI takeover”, see e.g. Katja Grace’s post here. I also think there are cases where someone presents you with an argument that superficially drives toward a conclusion that sounds unlikely, and it’s legitimate to be skeptical of the conclusion even if you can’t spell out exactly where the argument is going wrong (e.g. the two-envelope “paradox”). That’s not to say you can justify not engaging with the theoretical arguments whenever you’re uncomfortable with where they point, just that humility deducing bold claims about the future on theoretical grounds cuts both ways.
Relatedly, I don’t think you don’t need to be able to describe alternative outcomes in detail to reject a prediction about how the world goes. If I tell someone the world will be run by dolphins in the year 2050, and they disagree, I can reply, “oh yeah, well you tell me what the world looks like in 2050”, and their failure to describe their median world in detail doesn’t strongly support the dolphin hypothesis.[2]
“Default” doesn’t necessarily mean “unconditionally likely” IMO. Here I take it to mean something more like “conditioning on no specific response and/or targeted countermeasures”. Though I guess it’s baked into the meaning of “default” that it’s unconditionally plausible (like, ⩾5%?) — it would be misleading to say “the default outcome from this road trip is that we all die (if we don’t steer out of oncoming traffic)”.
In theory, one could work on making outcomes from AI takeover less bad, as well as making them less likely (though less clear what this looks like).
Altogether, I think you’re coming from a reasonable but different position, that takeover risk from ASI is very high (sounds like 60–99% given ASI?) I agree that kinds of preparedness not focused on avoiding takeover look less important on this view (largely because they matter in fewer worlds). I do think this axis of disagreement might not be as sharp as it seems, though — suppose person A has 60% p(takeover) and person B is on 1%. Assuming the same marginal tractability and neglectedness between takeover and non-takeover work, person A thinks that takeover-focused work is 60× more important; but non-takeover work is 40/99≈0.4 times as important, compared to person B.
By (stupid) analogy, all the preparations for a wedding would be undermined if the couple got into a traffic accident on the way to the ceremony; this does not justify spending ~all the wedding budget on car safety.
Again by analogy, there were some superficially plausible arguments in the 1970s or thereabouts that population growth would exceed the world’s carrying capacity, and we’d run out of many basic materials, and there would be a kind of system collapse by 2000. The opponents of these arguments were not able to describe the ways that the world could avoid these dire fates in detail (they could not describe the specific tech advances which could raise agricultural productivity, or keep materials prices relatively level, for instance).
By (stupid) analogy, all the preparations for a wedding would be undermined if the couple got into a traffic accident on the way to the ceremony; this does not justify spending ~all the wedding budget on car safety.
This is a stupid analogy! (Traffic accidents aren’t very likely.) A better analogy would be “all the preparations for a wedding would be undermined if the couple weren’t able to to be together because one was stranded on Mars with no hope of escape. This justifies spending all the wedding budget on trying to rescue them.” Or perhaps even better: “all the preparations for a wedding would be undermined if the couple probably won’t be able to be together, because one taking part in a mission to Mars that half the engineers and scientists on the guest list are convinced will be a death trap (for detailed technical reasons). This justifies spending all the wedding budget on trying to stop the mission from going ahead.”
If not, what are the main obstacles to reaching existential security from here?
and collected the obstacles, you might assemble a list like this one, which might update you toward AI x-risk being “overwhelmingly likely”. (Personally, if I had to put a number on it, I’d say 80%.)
Your next point seems somewhat of a straw man?
If I tell someone the world will be run by dolphins in the year 2050, and they disagree, I can reply, “oh yeah, well you tell me what the world looks like in 2050”
No, the correct reply is that dolphins won’t run the world because they can’t develop technology down to their physical form (no opposable thumbs etc), and they won’t be able to evolve their physical form in such a short time (even with help from human collaborators)[1]. i.e. an object level rebuttal.
The opponents of these arguments were not able to describe the ways that the world could avoid these dire fates in detail
No, but they had sound theoretical arguments. I’m saying these are lacking when it comes to why it’s possible to align/control/not go extinct from ASI.
Altogether, I think you’re coming from a reasonable but different position, that takeover risk from ASI is very high (sounds like 60–99% given ASI?)
I do think this axis of disagreement might not be as sharp as it seems, though — suppose person A has [9]0% p(takeover) and person B is on 1%. Assuming the same marginal tractability and neglectedness between takeover and non-takeover work, person A thinks that takeover-focused work is [9]0× more important; but non-takeover work is 10/99≈0.[1] times as important, compared to person B.
But it’s worse than this, because the only viable solution to avoid takeover is to stop building ASI, in which case the non-takeover work is redundant (we can mostly just hope to luck out with one of the exotic factors).
And they won’t be able to be helped by ASIs either, because the control/alignment problem will remain unsolved (and probably unsolvable, for reasons x, y, z...)
This is a stupid analogy! (Traffic accidents aren’t very likely.)
Oh, I didn’t mean to imply that I think AI takeover risk is on par with traffic accident-risk. I was just illustrating the abstract point that the mere presence of a mission-ending risk doesn’t imply spending everything to prevent it. I am guessing you agree with this abstract point (but furthermore think that AI takeover risk is extremely high, and as such we should ~entirely focus on preventing it).
I think Wei Dei’s reply articulates my position well:
Maybe I’m splitting hairs, but “x-risk could be high this century as a result of AI” is not the same claim as “x-risk from AI takeover is high this century”, and I read you as making the latter claim (obviously I can’t speak for Wei Dai).
No, the correct reply is that dolphins won’t run the world because they can’t develop technology
That’s right, and I do think the dolphin example was too misleading and straw-man-ish. The point I was trying to illustrate, though, is not that there is no way to refute the dolphin theory, but that failing to adequately describe the alternative outcome(s) doesn’t especially support the dolphin theory, because trying to accurately describe the future is just generally extremely hard.
No, but they had sound theoretical arguments. I’m saying these are lacking when it comes to why it’s possible to align/control/not go extinct from ASI.
Got it. I guess I see things as messier than this — I see people with very high estimates of AI takeover risk advancing arguments, and I see others advancing skeptical counter-arguments (example), and before engaging with these arguments a lot and forming one’s own views, I think it’s not obvious which sets of arguments are fundamentally unsound.
But it’s worse than this, because the only viable solution to avoid takeover is to stop building ASI, in which case the non-takeover work is redundant (we can mostly just hope to luck out with one of the exotic factors).
The paper is an interesting read, but I think that it unfortunately isn’t of much practical value down to the omission of a crucial consideration:
The paper rests on the assumption that alignment/control of artificial superintelligence (ASI) is possible. This has not been theoretically established, let alone assessed to be practically likely in the time we have before an intelligence explosion. As far as I know, there aren’t any sound supporting arguments for the assumption (and you don’t reference any), and in fact there are good arguments on the other side for why aligning or controlling ASI is fundamentally impossible.
AI Takeover is listed first in the Grand Challenges section, but it trumps all the others because it is the default outcome. You even say “we should expect AIs that can outsmart humans”, and “There are reasonable arguments for expecting misalignment, and subsequent takeover, as the ‘default’ outcome (without concerted efforts to prevent it).”, and “There is currently no widely agreed-upon solution to the problems of aligning and controlling advanced AI systems, and so leading experts currently see the risk of AI takeover as substantial.” I still don’t understand where the ~10% estimates are coming from though; [fn 93:] “just over 50% of respondents assigned a subjective probability of 10% or more to the possibility that, “human inability to control future advanced Al systems causing human extinction or similarly permanent and severe disempowerment of the human species” (Grace et al., ‘Thousands of AI Authors on the Future of AI’.)”]. They seem logically unfounded. What is happening in the other ~90%? I didn’t get any satisfactory answers when asking here a while back.
You say “In this paper, we won’t discuss AI takeover risk in depth, but that’s because it is already well-discussed elsewhere.” It’s fine that you want to talk about other stuff in the paper, but that doesn’t make it any less of a crucial consideration that overwhelms concern for all of the other issues!
You conclude by saying that “Many are admirably focused on preparing for a single challenge, like misaligned AI takeover… But focusing on one challenge is not the same as ignoring all others: if you are a single-issue voter on AI, you are probably making a mistake.” I disagree, because alignment of ASI hasn’t been shown to even be solvable in principle! It is the single most important issue by far. The others don’t materialise because they assume humans will be in control of ASI for the most part (which is very unlikely to happen). The only practical solution (which is also dissolves nearly all the other issues identified in the paper) is to prevent ASI from being built[1]. We need a well enforced global moratorium on ASI as soon as possible.
At least until either it can be built safely, or the world collectively decides to take whatever risk remains after a consensus on an alignment/control solution is reached. At which point the other issues identified in the paper become relevant.
Thanks for the comment. I agree that if you think AI takeover is the overwhelmingly most likely outcome from developing ASI, then preventing takeover (including by preventing ASI) should be your strong focus. Some comments, though —
Just because failing at alignment undermines ~every other issue, doesn’t mean that working on alignment is the only or overwehelmingly most important thing.[1] Tractability and likelihood also matters.
I’m not sure I buy that things are so stark as “there are no arguments against AI takeover”, see e.g. Katja Grace’s post here. I also think there are cases where someone presents you with an argument that superficially drives toward a conclusion that sounds unlikely, and it’s legitimate to be skeptical of the conclusion even if you can’t spell out exactly where the argument is going wrong (e.g. the two-envelope “paradox”). That’s not to say you can justify not engaging with the theoretical arguments whenever you’re uncomfortable with where they point, just that humility deducing bold claims about the future on theoretical grounds cuts both ways.
Relatedly, I don’t think you don’t need to be able to describe alternative outcomes in detail to reject a prediction about how the world goes. If I tell someone the world will be run by dolphins in the year 2050, and they disagree, I can reply, “oh yeah, well you tell me what the world looks like in 2050”, and their failure to describe their median world in detail doesn’t strongly support the dolphin hypothesis.[2]
“Default” doesn’t necessarily mean “unconditionally likely” IMO. Here I take it to mean something more like “conditioning on no specific response and/or targeted countermeasures”. Though I guess it’s baked into the meaning of “default” that it’s unconditionally plausible (like, ⩾5%?) — it would be misleading to say “the default outcome from this road trip is that we all die (if we don’t steer out of oncoming traffic)”.
In theory, one could work on making outcomes from AI takeover less bad, as well as making them less likely (though less clear what this looks like).
Altogether, I think you’re coming from a reasonable but different position, that takeover risk from ASI is very high (sounds like 60–99% given ASI?) I agree that kinds of preparedness not focused on avoiding takeover look less important on this view (largely because they matter in fewer worlds). I do think this axis of disagreement might not be as sharp as it seems, though — suppose person A has 60% p(takeover) and person B is on 1%. Assuming the same marginal tractability and neglectedness between takeover and non-takeover work, person A thinks that takeover-focused work is 60× more important; but non-takeover work is 40/99≈0.4 times as important, compared to person B.
By (stupid) analogy, all the preparations for a wedding would be undermined if the couple got into a traffic accident on the way to the ceremony; this does not justify spending ~all the wedding budget on car safety.
Again by analogy, there were some superficially plausible arguments in the 1970s or thereabouts that population growth would exceed the world’s carrying capacity, and we’d run out of many basic materials, and there would be a kind of system collapse by 2000. The opponents of these arguments were not able to describe the ways that the world could avoid these dire fates in detail (they could not describe the specific tech advances which could raise agricultural productivity, or keep materials prices relatively level, for instance).
Thanks for the reply.
This is a stupid analogy! (Traffic accidents aren’t very likely.) A better analogy would be “all the preparations for a wedding would be undermined if the couple weren’t able to to be together because one was stranded on Mars with no hope of escape. This justifies spending all the wedding budget on trying to rescue them.” Or perhaps even better: “all the preparations for a wedding would be undermined if the couple probably won’t be able to be together, because one taking part in a mission to Mars that half the engineers and scientists on the guest list are convinced will be a death trap (for detailed technical reasons). This justifies spending all the wedding budget on trying to stop the mission from going ahead.”
I think Wei Dei’s reply articulates my position well:
Your next point seems somewhat of a straw man?
No, the correct reply is that dolphins won’t run the world because they can’t develop technology down to their physical form (no opposable thumbs etc), and they won’t be able to evolve their physical form in such a short time (even with help from human collaborators)[1]. i.e. an object level rebuttal.
No, but they had sound theoretical arguments. I’m saying these are lacking when it comes to why it’s possible to align/control/not go extinct from ASI.
I’d say ~90% (and the remaining 10% is mostly exotic factors beyond our control [footnote 10 of linked post]).
But it’s worse than this, because the only viable solution to avoid takeover is to stop building ASI, in which case the non-takeover work is redundant (we can mostly just hope to luck out with one of the exotic factors).
And they won’t be able to be helped by ASIs either, because the control/alignment problem will remain unsolved (and probably unsolvable, for reasons x, y, z...)
Oh, I didn’t mean to imply that I think AI takeover risk is on par with traffic accident-risk. I was just illustrating the abstract point that the mere presence of a mission-ending risk doesn’t imply spending everything to prevent it. I am guessing you agree with this abstract point (but furthermore think that AI takeover risk is extremely high, and as such we should ~entirely focus on preventing it).
Maybe I’m splitting hairs, but “x-risk could be high this century as a result of AI” is not the same claim as “x-risk from AI takeover is high this century”, and I read you as making the latter claim (obviously I can’t speak for Wei Dai).
That’s right, and I do think the dolphin example was too misleading and straw-man-ish. The point I was trying to illustrate, though, is not that there is no way to refute the dolphin theory, but that failing to adequately describe the alternative outcome(s) doesn’t especially support the dolphin theory, because trying to accurately describe the future is just generally extremely hard.
Got it. I guess I see things as messier than this — I see people with very high estimates of AI takeover risk advancing arguments, and I see others advancing skeptical counter-arguments (example), and before engaging with these arguments a lot and forming one’s own views, I think it’s not obvious which sets of arguments are fundamentally unsound.
Makes sense.