Pablo comments on Draft report on existential risk from power-seeking AI

Pablo 30 Apr 2021 22:40 UTC
15 points
0 ∶ 0
From my perspective, the world just looks like the kind of world where “existential catastrophe from misaligned, power-seeking AI by 2070” is true.
Could you clarify what you mean by this? I think I don’t understand what the word “true”, italicized, is supposed to mean here. Are you just reporting the impression (i.e. a belief not adjusted to account for other people’s beliefs) that you are ~100% certain an existential catastrophe from misaligned, power-seeking AI will (by default) occur by 2070? Or are you saying that this is what prima facie seems to you to be the case, when you extrapolate naively from current trends? The former seems very overconfident (even conditional on an existential catastrophe occurring by that date, it is far from certain that it will be caused by misaligned AI), whereas the latter looks pretty uninformative, given that it leaves open the possibility that the estimate will be substantially revised downward after additional considerations are incorporated (and you do note that you think “there’s a decent chance of exciting surprises”). Or perhaps you meant neither of these things?
I guess the most helpful thing (at least to someone like me who’s trying to make sense of this apparent disagreement between you and Joe) would be for you to state explicitly what probability assignment you think the totality of the evidence warrants (excluding evidence derived from the fact that other reasonable people have beliefs about this), so that one can then judge whether the discrepancy between your estimate and Joe’s is so significant that it suggests “some mistake in methodology” on your part or his, rather than a more mundane mistake.
- RobBensinger 1 May 2021 23:02 UTC
  7 points
  0 ∶ 0
  Parent
  Could you clarify what you mean by this? I don’t understand what the word “true”, italicized, is supposed to mean here.
  A pattern I think I’ve seen with a fair number of EAs is that they’ll start with a pretty well-calibrated impression of how serious AGI risk is; but then they’ll worry that if they go around quoting a P(doom) like “25%” or “70%” (especially if the cause is something as far-fetched as AI), they’ll look like a crackpot. So the hypothetical EA tries to find a way to justify a probability more like 1-10%, so they can say the moderate-sounding “AI disaster is unlikely, but the EV is high”, rather than the more crazy-sounding “AI disaster is likely”.
  This obviously isn’t the only reason people assign low probabilities to AI x-catastrophe, and I don’t at all know whether that pattern applies here (and I haven’t read Joe’s replies here yet); and it’s rude to open a conversation by psychologizing. Still, I wanted to articulate some perspectives from which there’s less background pressure to try to give small probabilities to crazy-sounding scenarios, on the off chance that Joe or some third party found it helpful:
  - A 5% probability of disaster isn’t any more or less confident/extreme/radical than a 95% probability of disaster; in both cases you’re sticking your neck out to make a very confident prediction.
  - If AGI doom were likely, what additional evidence would we expect to see? If we wouldn’t necessarily expect additional evidence, then why are we confident of low P(doom) in the first place?
  - AGI doom is just a proposition like any other, and we should think about it in the ‘ordinary way’ (for lack of a better way of summarizing this point).
  The latter two points especially are what I was trying (and probably failing) to communicate with “‘existential catastrophe from misaligned, power-seeking AI by 2070’ is true.”
  I guess the most helpful thing (at least to someone like me who’s trying to make sense of this apparent disagreement between you and Joe) would be for you to state explicitly what probability assignment you think the totality of the evidence warrants
  Define a ‘science AGI’ system as one that can match top human thinkers in at least two big ~unrelated hard-science fields (e.g., particle physics and organic chemistry).
  If the first such systems are roughly as opaque as 2020′s state-of-the-art ML systems (e.g., GPT-3) and the world order hasn’t already been upended in some crazy way (e.g., there isn’t a singleton), then I expect an AI-mediated existential catastrophe with >95% probability.
  I don’t have an unconditional probability that feels similarly confident/stable to me, but I think those two premises have high probability, both individually and jointly. This isn’t the same proposition Joe was evaluating, but it maybe illustrates why I have a very different high-level take on “probability of existential catastrophe from misaligned, power-seeking AI”.
  - Paul_Christiano 3 May 2021 19:31 UTC
    31 points
    0 ∶ 0
    Parent
    A 5% probability of disaster isn’t any more or less confident/extreme/radical than a 95% probability of disaster; in both cases you’re sticking your neck out to make a very confident prediction.
    “X happens” and “X doesn’t happen” are not symmetrical once I know that X is a specific event. Most things at the level of specificity of “humans build an AI that outmaneuvers humans to permanently disempower them” just don’t happen.
    The reason we are even entertaining this scenario is because of a special argument that it seems very plausible. If that’s all you’ve got—if there’s no other source of evidence than the argument—then you’ve just got to start talking about the probability that the argument is right.
    And the argument actually is a brittle and conjunctive thing. (Humans do need to be able to build such an AI by the relevant date, they do need to decide to do so, the AI they build does need to decide to disempower humans notwithstanding a prima facie incentive for humans to avoid that outcome.)
    That doesn’t mean this is the argument or that the argument is brittle in this way—there might be a different argument that explains in one stroke why several of these things will happen. In that case, it’s going to be more productive to talk about that.
    (For example, in the context of the multi-stage argument undershooting success probabilities, it’s that people will be competently trying to achieve X and most of uncertainty is estimating how hard and how effectively people are trying—which is correlated across steps. So you would do better by trying to go for the throat and reason about the common cause of each success, and you will always lose if you don’t see that structure.)
    And of course some of those steps may really just be quite likely and one shouldn’t be deterred from putting high probabilities on highly-probable things. E.g. it does seem like people have a very strong incentive to build powerful AI systems (and moreover the extrapolation suggesting that we will be able to build powerful AI systems is actually about the systems we observe in practice and already goes much of the way to suggesting that we will do so). Though I do think that the median MIRI staff-member’s view is overconfident on many of these points.
  - Rohin Shah 8 May 2021 16:04 UTC
    23 points
    0 ∶ 0
    Parent
    If AGI doom were likely, what additional evidence would we expect to see?
    Humans are pursuing convergent instrumental subgoals much more. (Related question: will AGIs want to take over the world?)
    A lot more anti-aging research is going on.
    Children’s inheritances are ~always conditional on the child following some sort of rule imposed by the parent, intended to further the parent’s goals after their death.
    Holidays and vacations are rare; when they are taken it is explicitly a form of rejuvenation before getting back to earning tons of money.
    Humans look like they are automatically strategic.
    Humans are way worse at coordination. (Related question: can humans coordinate to prevent AI risk?)
    Nuclear war happened some time after WW2.
    Airplanes crash a lot more.
    Unions never worked.
    Economic incentives point strongly towards generality rather than specialization. (Related question: how general will AI systems be? Will they be capable of taking over the world?)
    Universities don’t have “majors”, instead they just teach you how to be more generally intelligent.
    (Really the entire world would look hugely different if this were the case; I struggle to imagine it.)
    There’s probably more, I haven’t thought very long about it.
    (Before responses of the form “what about e.g. the botched COVID response?”, let me note that this is about additional evidence; I’m not denying that there is existing evidence.)
  - RobBensinger 1 May 2021 23:23 UTC
    2 points
    0 ∶ 0
    Parent
    My basic perspective here is pretty well-captured by Being Half-Rational About Pascal’s Wager is Even Worse. In particular:
    [...] Where the heck did Fermi get that 10% figure for his ‘remote possibility’ [that neutrons may be emitted in the fission of uranium], especially considering that fission chain reactions did in fact turn out to be possible? [...] So far as I know, there was no physical reason whatsoever to think a fission chain reaction was only a ten percent probability. They had not been demonstrated experimentally, to be sure; but they were still the default projection from what was already known. If you’d been told in the 1930s that fission chain reactions were impossible, you would’ve been told something that implied new physical facts unknown to current science (and indeed, no such facts existed).
    [...]
    I mention all this because it is dangerous to be half a rationalist, and only stop making one of the two mistakes. If you are going to reject impractical ‘clever arguments’ that would never work in real life, and henceforth not try to multiply tiny probabilities by huge payoffs, then you had also better reject all the clever arguments that would’ve led Fermi or Szilard to assign probabilities much smaller than ten percent. (Listing out a group of conjunctive probabilities leading up to taking an important action, and not listing any disjunctive probabilities, is one widely popular way of driving down the apparent probability of just about anything.)
    [...]
    I don’t believe in multiplying tiny probabilities by huge impacts. But I also believe that Fermi could have done better than saying ten percent, and that it wasn’t just random luck mixed with overconfidence that led Szilard and Rabi to assign higher probabilities than that. Or to name a modern issue which is still open, Michael Shermer should not have dismissed the possibility of molecular nanotechnology, and Eric Drexler will not have been randomly lucky when it turns out to work: taking current physical models at face value imply that molecular nanotechnology ought to work, and if it doesn’t work we’ve learned some new fact unknown to present physics, etcetera. Taking the physical logic at face value is fine, and there’s no need to adjust it downward for any particular reason; if you say that Eric Drexler should ‘adjust’ this probability downward for whatever reason, then I think you’re giving him rules that predictably give him the wrong answer. Sometimes surface appearances are misleading, but most of the time they’re not.
    A key test I apply to any supposed rule of reasoning about high-impact scenarios is, “Does this rule screw over the planet if Reality actually hands us a high-impact scenario?” and if the answer is yes, I discard it and move on. The point of rationality is to figure out which world we actually live in and adapt accordingly, not to rule out certain sorts of worlds in advance.
    There’s a doubly-clever form of the argument wherein everyone in a plausibly high-impact position modestly attributes only a tiny potential possibility that their face-value view of the world is sane, and then they multiply this tiny probability by the large impact, and so they act anyway and on average worlds in trouble are saved. I don’t think this works in real life—I don’t think I would have wanted Leo Szilard to think like that. I think that if your brain really actually thinks that fission chain reactions have only a tiny probability of being important, you will go off and try to invent better refrigerators or something else that might make you money. And if your brain does not really feel that fission chain reactions have a tiny probability, then your beliefs and aliefs are out of sync and that is not something I want to see in people trying to handle the delicate issue of nuclear weapons. But in any case, I deny the original premise[....]
    And finally, I once again state that I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk. We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises. If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead. You cannot justifiably trade off tiny probabilities of x-risk improvement against efforts that do not effectuate a happy intergalactic civilization, but there is nonetheless no need to go on tracking tiny probabilities when you’d expect there to be medium-sized probabilities of x-risk reduction.
    [...]
    To clarify, “Don’t multiply tiny probabilities by large impacts” is something that I apply to large-scale projects and lines of historical probability. On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody’s dumping effort into it then you should dump more effort than currently into it. On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but “probability of an ok outcome”, i.e., the probability of ending up with a happy intergalactic civilization). In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal’s Wager. Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort. It would only be Pascal’s Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.
    - RobBensinger 1 May 2021 23:27 UTC
      5 points
      0 ∶ 0
      Parent
      + in Hero Licensing:
      [...] The multiple-stage fallacy is an amazing trick, by the way. You can ask people to think of key factors themselves and still manipulate them really easily into giving answers that imply a low final answer, because so long as people go on listing things and assigning them probabilities, the product is bound to keep getting lower. Once we realize that by continually multiplying out probabilities the product keeps getting lower, we have to apply some compensating factor internally so as to go on discriminating truth from falsehood.
      You have effectively decided on the answer to most real-world questions as “no, a priori” by the time you get up to four factors, let alone ten. It may be wise to list out many possible failure scenarios and decide in advance how to handle them—that’s Murphyjitsu—but if you start assigning “the probability that X will go wrong and not be handled, conditional on everything previous on the list having not gone wrong or having been successfully handled,” then you’d better be willing to assign conditional probabilities near 1 for the kinds of projects that succeed sometimes—projects like Methods. Otherwise you’re ruling out their success a priori, and the “elicitation” process is a sham.
      Frankly, I don’t think the underlying methodology is worth repairing. I don’t think it’s worth bothering to try to make a compensating adjustment toward higher probabilities. We just shouldn’t try to do “conjunctive breakdowns” of a success probability where we make up lots and lots of failure factors that all get informal probability assignments. I don’t think you can get good estimates that way even if you try to compensate for the predictable bias. [...]