Holly_Elmore comments on Critiques of prominent AI safety labs: Conjecture

Holly_Elmore 15 Jun 2023 22:41 UTC
4 points
2 ∶ 0
Does this really make you feel safe? This reads to me as a possible reason for optimism, but hardly reassures me that the worst won’t happen or that this author isn’t just failing to imagine what could lead to strong instrumental convergence (including different training regimes becoming popular).
- Sharmake 16 Jun 2023 0:27 UTC
  −1 points
  0 ∶ 2
  Parent
  Basically, kind of. The basic issue is that instrumental convergence, and especially effectively unbounded instrumental convergence is a central assumption of why AI is uniquely dangerous, compared to other technologies like biotechnology. And in particular, without the instrumental convergence assumption, or at least with the instrumental convergence assumption being too weak to make the case for doom, unlike what Superintelligence told you, matters a lot because it kind of ruins a lot of our inferences of why AI would likely doom us, like deception or unbounded powerseeking and many more inferences that EAs/rationalists did, without high prior probability on it already.
  
  This means we have to fall back on our priors for general technology being good or bad, and unless one already has high prior probability that AI is doomy, then we should update quite a lot downwards on our probability of doom by AI. It’s probably not as small as our original prior, but I suspect it’s enough to change at least some of our priorities.
  
  Remember, the fact that the instrumental convergence assumption was essentially used as though it was an implicit axiom on many of our subsequent beliefs turning out to be not as universal as we thought, nor as much evidence of AI doom as we thought (indeed even with instrumental convergence, we don’t actually get enough evidence to move us towards high probability of doom, without more assumptions.) is pretty drastic, as a lot of beliefs around the dangerousness of AI rely on the essentially unbounded instrumental convergence and unbounded powerseeking/deception assumptions.
  
  So to answer your question, the answer depends on what your original priors were on technology and AI being safe.
  
  It does not mean that we can’t go extinct, but it does mean that we were probably way overestimating the probability of going extinct.
  - Greg_Colbourn 16 Jun 2023 11:56 UTC
    9 points
    2 ∶ 0
    Parent
    Even setting aside instrumental convergence (to be clear, I don’t think we can safely do this), there is still misuse risk and multi-agent coordination that needs solving to avoid doom (or at least global catastrophe).
    - Sharmake 16 Jun 2023 18:09 UTC
      1 point
      0 ∶ 2
      Parent
      I agree, but that implies pretty different things than what is currently being done, and still implies that the danger from AI is overestimated, which bleeds into other things.
  - Holly_Elmore 16 Jun 2023 9:09 UTC
    4 points
    1 ∶ 0
    Parent
    I guess my real question is “how can you feel safe accepting the idea that ML or RL agents won’t show instrumental convergence?” Are you saying AIs trained this way won’t be agents? Because i don’t understand how we could call something AGI that doesn’t figure out it’s own solutions to reach it’s goals, and I don’t see how it can do that without stumbling on things that are generally good for achieving goals.
    
    And regardless of whatever else you’re saying, how can you feel safe that the next training regime won’t lead to instrumental convergence?
    - Sharmake 16 Jun 2023 18:48 UTC
      −3 points
      0 ∶ 1
      Parent
      
      Are you saying AIs trained this way won’t be agents?
      
      Not especially. If I had to state it simply, it’s that massive space for instrumental goals isn’t useful today, and plausibly in the future for capabilities, so we have at least some reason to not worry about misalignment AI risk as much as we do today.
      
      In particular, it means that we shouldn’t assume instrumental goals to appear by default, and to avoid overrelying on non-empirical approaches like your intuition or imagination. We have to take things on a case-by-case basis, rather than using broad judgements.
      
      Note that instrumental convergence/instrumental goals isn’t a binary, but rather a space, where more space for instrumental goals being useful for capabilities is continuously bad, rather than a sharp binary of instrumental goals being active or not active.
      
      My claim is that the evidence we have is evidence against much space for instrumental convergence being useful for capabilities, and I expect this trend to continue, at least partially as AI progresses.
      
      Yet I suspect that this isn’t hitting at your true worry, and I want to address it today. I suspect that your true worry is this quote below:
      
      And regardless of whatever else you’re saying, how can you feel safe that the next training regime won’t lead to instrumental convergence?
      
      And while I can’t answer that question totally, I’d like to suggest going on a walk, drinking water, or in the worst case getting mental help from a professional. But try to stop the loop of never feeling safe around something.
      
      The reason I’m suggesting this is because the problem with acting on your need to feel safe is that the following would happen:
      
      This would, if adopted leave us vulnerable to arbitrarily high demands for safety, possibly crippling AI use cases, and as a general policy I’m not a fan of actions that would result in arbitrarily high demands for something, at least without scrutinizing it very heavily, and would require way, way more evidence than just a feeling.
      
      We have no reason to assume that people’s feelings of safety or unsafety actually are connected to the real evidence of whether AI is safe, or whether misalignment risk of AI is big problem. Your feelings are real, but I don’t trust that your feeling of unsafety of AI is telling me anything else other than your feelings about something. This is fine, to the extent that it isn’t harming you materially, but it’s an important thing to note here.
      
      Kaj Sotala made a similar post, which talks about why you should mostly feel safe. It’s a different discussion than my comment, but the post below may be useful:
      
      https://www.lesswrong.com/posts/pPLcrBzcog4wdLcnt/most-people-should-probably-feel-safe-most-of-the-time
      
      EDIT 1: I deeply hope you can feel better, no matter what happens in the AI space.
      
      EDIT 2: One thing to keep in mind in general is that in typical cases, when claims that something is more or less anything based on x evidence, this is usually smoothly less or more, rather than something going to zero of something or all of something, so in this case I’m claiming that AI is less dangerous, probably a lot less dangerous, but it doesn’t mean we totally erase the danger, it just means that things are more safe and things have gotten smoothly better based on our evidence to date.