Astelle Kay comments on Will AI end everything? A guide to guessing | EAG Bay Area 23

Astelle Kay 22 Jun 2025 20:23 UTC
3 points
0 ∶ 0
Love this response, especially the reframing that we often keep tasks bounded not because we want low-agency systems, but because we assume “extra initiative” will go wrong unless we trust the agent’s broader competence. That feels very true in real-world settings, not just theory.
I’ve been exploring how this plays out from more of a psychological and design angle, especially how internal motivations might shift before there’s any visible misbehavior. Some recent work I’ve been reading (like Timaeus) looks at developmental interpretability, and it’s helped me think about agents as growing systems rather than just fixed tools.
I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?
- Ozzie Gooen 22 Jun 2025 20:48 UTC
  3 points
  0 ∶ 0
  Parent
  I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?
  I think we’re seeing a lot of empirical data about that with AI+code. Now a lot of the human part is in oversight. It’s not too hard to say, “Recommend 5 things to improve. Order them. Do them”—with very little human input.
  
  There’s work to make sure that the AI scope is bounded, and that anything potentially-dangerous it could do gets passed by the human. This seems like a good workflow to me.
  - Astelle Kay 25 Jun 2025 1:07 UTC
    3 points
    0 ∶ 0
    Parent
    I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.
    I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!