RobertM comments on High-level hopes for AI alignment

RobertM Dec 23, 2022, 8:48 AM
15 points
1 ∶ 2
“I don’t currently have much sympathy for someone who’s highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover … are under 10% or over 90%).”
I find this difficult to square with the fact that:
- Absent highly specific victory conditions, the default (P = 1 - ε) outcome is takeover.
- Of the three possibilities you list, interpretability seems like the only one that’s actually seen any traction, but:
  - there hasn’t actually been very much progress beyond toy problems
  - it’s not clear why we should expect it to generalize to future paradigms
  - we have no idea how to use any “interpretation” to actually get to a better endpoint
  - interpretability, by itself, is insufficient to avoid takeover, since you lose as soon as any other player in the game messes up even once
- The other potential hopes you enumerate require people in the world to attempt to make a specific thing happen. For most of them, not only is practically nobody working on making any of those specific things happen, but many people are actively working in the opposite direction. In particular, with respect to the “Limited AI” hope, the leading AI labs are pushing quite hard on generality, rather than on narrow functionality. This has obviously paid off in terms of capability gains over “narrow” approaches. Being able to imagine a world where something else is happening does not tell us how to get to that world.
I can imagine having an “all things considered” estimation (integrating model uncertainty, other people’s predictions, etc) of under 90% for failure. But I don’t understand writing off the epistemic position of someone who has an “inside view” estimation of >90% failure, especially given the enormous variation of probability distributions that people have over timelines (which I agree are an important, though not overwhelming, factor when it comes to estimating chances of failure). Indeed, an “extreme” inside view estimation conditional on short timelines seems much less strange to me than a “moderate” one. (The only way a “moderate” estimation makes sense to me is if it’s downstream of predicting the odds of success for a specific research agenda, such as in John Wentworth’s The Plan − 2022 Update, and I’m not even sure one could justifiably give a specific research agenda 50% odds of success nearly a decade out as the person who came up with it, let alone anyone looking in from the outside.)
- Holden Karnofsky Mar 18, 2023, 1:00 AM
  6 points
  2 ∶ 1
  Parent
  Civilizational collapse would be a historically unprecedented event, and the future is very hard to predict; on those grounds alone, putting the odds of civilizational collapse above 90% seems like it requires a large burden of proof/argumentation. I don’t think “We can’t name a specific, likely-seeming path to success now” is enough to get there—I think there are many past risks to civilization that people worried about in advance, didn’t see clear paths to dealing with, and yet didn’t end up being catastrophic. Furthermore, I do envision some possible paths to success, e.g. https://www.lesswrong.com/posts/jwhcXmigv2LTrbBiB/success-without-dignity-a-nearcasting-story-of-avoiding
  - Habryka [Deactivated]Mar 18, 2023, 2:54 AM
    11 points
    2 ∶ 1
    Parent
    Civilizational collapse would be a historically unprecedented event, and the future is very hard to predict;
    I don’t find this reasoning very compelling, mostly on the basis of “this can’t go on”-type logic. Like, we basically know that the next century will be “historically unprecedented”. Indeed, it would be really surprising if the next century would not be unprecedented, since humanity has never remotely been in a similar starting position.
    We can’t sustain current growth levels, stagnation at any specific point would be quite weird, and sudden collapse would as you say also be historically unprecedented. I don’t have any stories about the future that seem plausible to me that are not historically unprecedented, so I don’t really understand how something being unprecedented could establish a prior against it. And there are definitely outside-view stories you can tell under which civilizational collapse would be more business than usual than other types of stories.
    We are in the middle of a huge exponential growth curve. Any path from here seems quite wild. This means something being wild can’t be the primary reason why something has a low prior.
    - dsj Mar 19, 2023, 5:47 AM
      9 points
      3 ∶ 0
      Parent
      A better argument is that the wildness of the next century means our models of the future are untrustworthy, which should make us pretty suspicious of any claim that something is the P = 1 - ε outcome without a watertight case for the proposition.
      There doesn’t seem to be such a watertight case for AI takeover. Most threat models^[1] rest heavily on the assumption that transformative AI will be single-mindedly optimizing for some (misspecified or mislearned) utility function, as opposed to e.g. following a bunch of contextually-activated policies^[2]. While this is plausible, and thus warrants significant effort to prevent, it’s far from clear that this is even the most likely outcome “absent highly specific conditions”, never mind a near certainty.
      ^
      e.g. Cotra and Ngo et al
      ^
      as proposed e.g. by shard theory
      - Habryka [Deactivated]Mar 19, 2023, 5:41 PM
        16 points
        3 ∶ 0
        Parent
        Yep, I think this reasoning is better, and is closer to why I don’t assign 1-ε probability to doom.
        The sad thing is that the remaining uncertainty is something that is much harder to work with. Like, I think most of the worlds where we are fine are worlds where I am deeply confused about a lot of stuff, deeply confused about the drivers of civilization, deeply confused about how to reason well, deeply confused about what I care about and whether AI doom even matters. I find it hard to plan around those worlds.
        [ ]
        
        [deleted]
    - davidc Mar 18, 2023, 6:11 AM
      5 points
      2 ∶ 0
      Parent
      
      We can’t sustain current growth levels
      
      Is this about GDP growth or something else? Sustaining 2% GDP growth for a century (or a few) seems reasonably plausible?
      - Habryka [Deactivated]Mar 18, 2023, 6:37 AM
        1 point
        1 ∶ 0
        Parent
        I agree that one or two centuries is pretty plausible, but I think it starts getting quite wild within a few more. 300 years of 2% growth is ~380x. 400 years of 2% growth is ~3000x.
        You pretty quickly reach at least a solar-system spanning civilization to be able to get there, and then quite quickly a galaxy-spanning one, and then you just can’t do it within the rules of known physics at all anymore. I agree that 2 centuries of 2% growth is not totally implausible without anything extremely wild happening, but all of that of course would still involve a huge amount of “historically unprecedented” things happening.
        davidc Mar 18, 2023, 1:20 PM
        1 point
        0 ∶ 0
        Parent
        Makes sense.
    - Holden Karnofsky Mar 21, 2023, 1:55 AM
      3 points
      0 ∶ 0
      Parent
      My point with the observation you quoted wasn’t “This would be unprecedented, therefore there’s a very low prior probability.” It was more like: “It’s very hard to justify >90% confidence on anything without some strong base rate to go off of. In this case, we have no base rate to go off of; we’re pretty wildly guessing.” I agree something weird has to happen fairly “soon” by zoomed-out historical standards, but there are many possible candidates for what the weird thing is (I also endorse dsj’s comment below).
  - RobertM Mar 20, 2023, 6:13 AM
    3 points
    1 ∶ 0
    Parent
    Separate from object-level disagreements, my crux that people can have inside-view models which “rule out” other people’s models (as well as outside-view considerations) in a way that leads to assigning very high likelihoods (i.e. 99%+) to certain outcomes.
    The fact that they haven’t successfully communicated their models to you is certainly a reason for you to not update strongly in their direction, but it doesn’t mean much for their internal epistemic stance.