Astelle Kay comments on An Analysis of Systemic Risk and Architectural Requirements for the Containment of Recursively Self-Improving AI

Astelle Kay 18 Jun 2025 23:45 UTC
2 points
1 ∶ 0
Really appreciated this. You did a great job highlighting just how intelligent and strategically autonomous these systems are becoming, without overhyping it. That balance is rare and really helpful.
I’ve been working on a small benchmark around sycophancy in LLMs, and this post was a sharp reminder that alignment issues aren’t just theoretical anymore. Some of the scariest behaviors show up not as rebellion, but as subtle social instincts like flattery, deflection, or reward hacking disguised as cooperation.
Thanks for surfacing these risks so clearly!
- Ihor Ivliev 19 Jun 2025 20:41 UTC
  2 points
  1 ∶ 0
  Parent
  Thanks you, I really appreciate. You’re absolutely right—some of the most concerning behaviors emerge not through visible defection but through socially-shaped reward optimization. Subtle patterns like sycophancy or goal obfuscation often surface before more obvious misalignment. Grateful you raised this! it’s a very-very important, even critical, angle—especially now, as system capabilities are advancing faster than oversight mechanisms can realistically keep up.
  - Astelle Kay 25 Jun 2025 1:06 UTC
    2 points
    1 ∶ 0
    Parent
    Definitely! Thanks for surfacing that so clearly. It really does seem like the early danger signals are showing up as “social instincts,” not rebellion. That’s a big part of what my current work tries to catch: instinctive sycophancy, goal softening, or reward tuning that looks helpful but misleads.
    I’d be glad to compare notes if you’re working on anything similar!
    - Ihor Ivliev 26 Jun 2025 0:44 UTC
      2 points
      1 ∶ 0
      Parent
      Many thanks. I agree that’s a critical point—these “social instinct” failure modes are a subtle and potent threat. The VSPE framework sounds like a fascinating and important line of research.
      To be fully transparent, I’ve just wrapped the intensive project I recently published and am now in a period focused entirely on rest and recovery.
      I truly appreciate your generous offer to compare notes. It’s the kind of collaboration the field needs.
      Thanks again for adding such a valuable perspective to the discussion. I wish you all the best in this noble and critically important direction!
      - Astelle Kay 1 Aug 2025 6:27 UTC
        2 points
        1 ∶ 0
        Parent
        Thank you for your kind words and transparency! I completely understand the need for recovery after a big project. Enjoy the well-deserved rest, and all the best to you as well!