Rohin Shah comments on My highly personal skepticism braindump on existential risk from artificial intelligence.

Rohin Shah 25 Jan 2023 16:16 UTC
19 points
9 ∶ 2
If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard.
Surely not. Neither of those make any arguments about AI, just about software generally. If you literally think those two are sufficient arguments for concluding “AI kills us with high probability” I don’t see why you don’t conclude “Powerpoint kills us with high probability”.
- RobBensinger 25 Jan 2023 17:51 UTC
  7 points
  3 ∶ 0
  Parent
  Yep! To be explicit, I was assuming that general intelligence is very powerful, that you can automate it, and that it isn’t (e.g.) friendly by default.
  - sphor 27 Jan 2023 15:23 UTC
    5 points
    0 ∶ 0
    Parent
    I’m not sure I understand what statements like “general intelligence is very powerful” mean even though it seems to be a crucial part of the argument. Can you explain more concretely what you mean by this? E.g. What is “general intelligence”? What are the ways in which it is and isn’t powerful?
    - RobBensinger 30 Jan 2023 4:14 UTC
      10 points
      1 ∶ 0
      Parent
      By “general intelligence” I mean “whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems”.
      Human brains aren’t perfectly general, and not all narrow AIs/animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to billions of wildly novel tasks.
      To get more concrete:
      AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems (which might be solved by the AGI’s programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking) that are different in kind from what AlphaGo solves.
      E.g., the physical world is too complex to simulate in full detail, unlike a Go board state. An effective general intelligence needs to be able to model the world at many different levels of granularity, and strategically choose which levels are relevant to think about, as well as which specific pieces/aspects/properties of the world at those levels are relevant to think about.
      More generally, being a general intelligence requires an enormous amount of laserlike strategicness about which thoughts you do or don’t think: a large portion of your compute needs to be ruthlessly funneled into exactly the tiny subset of questions about the physical world that bear on the question you’re trying to answer or the problem you’re trying to solve. If you fail to be ruthlessly targeted and efficient in “aiming” your cognition at the most useful-to-you things, you can easily spend a lifetime getting sidetracked by minutiae / directing your attention at the wrong considerations / etc.
      And given the variety of kinds of problems you need to solve in order to navigate the physical world well / do science / etc., the heuristics you use to funnel your compute to the exact right things need to themselves be very general, rather than all being case-specific. (Whereas we can more readily imagine that many of the heuristics AlphaGo uses to avoid thinking about the wrong aspects of the game state, or thinking about the wrong topics altogether, are Go-specific heuristics.)
      GPT-3 is a very impressive reasoner in a different sense (it successfully recognizes many patterns in human language, including a lot of very subtle or conjunctive ones like “when A and B and C and D and E and F and G and H and I are all true, humans often say X”), but it too isn’t doing the “model full physical world-states and trajectories thereof” thing (though an optimal predictor of human text would need to be a general intelligence, and a superhumanly capable one at that).
      Some examples of abilities I expect humans to only automate once we’ve built AGI (if ever):
      The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment.
      The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field.
      In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.)
      (Of course, if your brain has all the basic mental machinery required to do other sciences, that doesn’t mean that you have the knowledge required to actually do well in those sciences. An artificial general intelligence could lack physics ability for the same reason many smart humans can’t solve physics problems.)
      When I say “general intelligence is very powerful”, a lot of what I mean is that science is very powerful, and that having all the sciences at once is a lot more powerful than the sum of each science’s impact.
      (E.g., because different sciences can synergize, and because you can invent new scientific fields and subfields, and more generally chain one novel insight into dozens of other new insights that critically depended on the first insight.)
      Another large piece of what I mean is that general intelligence is a very high-impact sort of thing to automate because AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention.
      80K gives the (non-representative) example of how AlphaGo and its immediate successors compared to the human ability range on Go:
      In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat.
      I expect “general STEM AI” to blow human science ability out of the water in a similar fashion. Reasons for this include:
      Software (unlike human intelligence) scales with more compute.
      Current ML uses far more compute to find reasoners than to run reasoners. This is very likely to hold true for AGI as well.
      We probably have more than enough compute already, and are mostly waiting on new ideas for how to get to AGI efficiently, as opposed to waiting on more hardware to throw at old ideas.
      Empirically, humans aren’t near a cognitive ceiling, and even narrow AI often suddenly blows past the human reasoning ability range on the task it’s designed for. It would be weird if scientific reasoning were an exception.
      See also AlphaGo Zero and the Foom Debate.
      Empirically, human brains are full of cognitive biases and inefficiencies. It’s doubly weird if scientific reasoning is an exception even though it’s visibly a mess with tons of blind spots, inefficiencies, motivated cognitive processes, and historical examples of scientists and mathematicians taking decades to make technically simple advances.
      Empirically, human brains are extremely bad at some of the most basic cognitive processes underlying STEM. E.g., consider that human brains can barely do basic mental math at all.
      Human brains underwent no direct optimization for STEM ability in our ancestral environment, beyond things like “can distinguish four objects in my visual field from five objects”. In contrast, human engineers can deliberately optimize AGI systems’ brains for math, engineering, etc. capabilities.
      More generally, the sciences (and many other aspects of human life, like written language) are a very recent development. So evolution has had very little time to refine and improve on our reasoning ability in many of the ways that matter.
      Human engineers have an enormous variety of tools available to build general intelligence that evolution lacked. This is often noted as a reason for optimism that we can align AGI to our goals, even though evolution failed to align humans to its “goal”. It’s additionally a reason to expect AGI to have greater cognitive ability, if engineers try to achieve great cognitive ability.
      The hypothesis that AGI will outperform humans has a disjunctive character: there are many different advantages that individually suffice for this, even if AGI doesn’t start off with any other advantages. (E.g., speed, math ability, scalability with hardware, skill at optimizing hardware...)
      See also Sources of advantage for digital intelligence.
  - Mau 27 Jan 2023 1:56 UTC
    2 points
    2 ∶ 0
    Parent
    Nitpick: doesn’t the argument you made also assume that there’ll be a big discontinuity right before AGI? That seems necessary for the premise about “extremely novel software” (rather than “incrementally novel software”) to hold.
    - RobBensinger 27 Jan 2023 22:07 UTC
      5 points
      1 ∶ 1
      Parent
      I do think that AGI will be developed by methods that are relatively novel. Like, I’ll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I’ll be more surprised still if all of the core ideas are >12 years old.
      (Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don’t expect us to build AGI by just throwing more compute and data at GPT-n.)
      Separately and with more confidence, I’m expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of “click” moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.
      “Superhuman AI isn’t just GPT-3 but thinking faster and remembering more things” (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3′s brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI’s brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.
      I’m happy to add more points like these to the stew so they can be talked about. “Your list of reasons for thinking AGI risk is high didn’t explicitly mention X” is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn’t explicitly mention “intelligence is a property of matter rather than of souls imparted into particular animal species by God”, “AGI isn’t thousands of years in the future”, “most random goals would produce bad outcomes if optimized by a superintelligence”...)
      Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say “ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn’t mention [X] by name”, as opposed to speaking as though there’s an objectively correct level of granularity.
      Like, the original thing I said was:
      If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard. (And we have to do this eventually, even if we luck out and get a few years to play with weak AGIs before strong AGIs arrive.)
      Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:
      For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect.
      The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn’t obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn’t calling obviously stupid. (E.g., the OP didn’t say that it’s obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)
      The OP didn’t issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there’s to be any objective answer), so I didn’t try to explicitly write out obvious popularly held beliefs like “AGI is more powerful than PowerPoint”. I’m happy to do that if someone wants to shift the conversation there, but hopefully it’s obvious why I didn’t do that originally.
      - Mau 31 Jan 2023 23:20 UTC
        6 points
        1 ∶ 0
        Parent
        Fair! Sorry for the slow reply, I missed the comment notification earlier.
        
        I could have been clearer in what I was trying to point at with my comment. I didn’t mean to fault you for not meeting an (unmade) challenge to list all your assumptions—I agree that would be unreasonable.
        
        Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption—enough that the argument alone (along with other, largely uncontroversial assumptions) doesn’t make it “quite easy to reach extremely dire forecasts about AGI.” (Though I was thinking more about 90%+ forecasts.)
        
        (That assumption—i.e. the main claims in the 3rd paragraph of your response—seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)