Ofer comments on Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Ofer 8 May 2022 12:56 UTC
6 points
0 ∶ 0
Hey there!

And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I’d encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there’s a strong reason for a system optimizing those objectives to get to the states that give them many more options.

After spending a lot of time on understanding that work, my impression is that the main theorems in the paper are very complicated and are limited in ways that were not reasonably explained. (To the point that, probably, very few people understand the main theorems and what environments they are applicable for, even though the work has been highly praised within the AI alignment community).
- RyanCarey 8 May 2022 13:41 UTC
  10 points
  0 ∶ 0
  Parent
  Have you explained your thoughts somewhere? It’d more productive to hash out the disagreement rather than generically casting shade!
  - Ofer 8 May 2022 18:15 UTC
    25 points
    0 ∶ 0
    Parent
    Thanks, you’re right. There’s this long thread, but I’ll try to explain the issues here more concisely. I think the theorems have the following limitations that were not reasonably explained in the paper (and some accompanying posts):
    
    The theorems are generally not applicable for stochastic environments (despite the paper and some related posts suggesting otherwise).
    The theorems may not be applicable if there are cycles in the state graph of the MDP (other than self-loops in terminal states); for example:
    The theorems are not applicable in states from which a reversible action can be taken.
    The theorems are not applicable in states from which only one action (that is not POWER-seeking) allows to reach a cycle of a given length.
    
    I’m not arguing that the theorems don’t prove anything useful. I’m arguing that it’s very hard for the readers of the paper (and some accompanying posts) to understand what the theorems actually prove. Readers need to understand about 20 formal definitions that build on each other to understand the theorems. I also argue that the lack of explanations about what the theorems actually prove, and some of the informal claims that were made about the theorems, are not reasonable (and cause the theorems to appear more impressive). Here’s an example for such an informal claim (taken from this post):
    
    Not all environments have the right symmetries
    
    But most ones we think about seem to
    What links here?
    Ofer's comment on Video and Transcript of Presentation on Existential Risk from Power-Seeking AI by Joe Carlsmith (LessWrong; 8 May 2022 11:49 UTC; 1 point)
    - Charles He 8 May 2022 19:00 UTC
      6 points
      0 ∶ 0
      Parent
      To onlookers, I want to say that:
      This isn’t exactly what Ofer is complaining about, but one take on the issue, that math can be overstated, poorly socialized, misleading or overbearing, is a common critique in domains that use a lot of applied math (theoretical econ, interdisciplinary biology) that borrows from pure math, physics, etc.
      It depends on things (well, sort of your ideology, style, and academic politics TBH) but I think the critique can often be true.
      Although to be fair, this particular one critique seems much more specific and it seems like Ofer might be talking past Alex Turner and his meaning (but I have no actual idea of the math or the claims)
      The tone of the original post is pretty normal or moderate, and isn’t “casting shade”.
      but it might be consistent with issues like:
      this person has some agenda that is unhelpful and unreasonable;
      they are just a gadfly;
      they don’t really “get it” but know enough to fool themselves and pick at things forever.
      But these issues apply to my account too. I think the tone is pretty good to me.