Quick thing I flagged: > Probably a spectrum is far too simple a way of thinking of this. Probably it’s more complicated, but I think economic forces probably push more toward the middle of the spectrum, not the very extreme end, for the reason that: suppose you’re employing someone to plan your wedding, you would probably like them to stick to the wedding planning and, you know, choose flowers and music and stuff like that, and not try to fix any problems in your family at the moment so that the seating arrangement can be better. [You want them to] understand what the role is and only do that. Maybe it would be better if they were known to be very good at this, so things could be different in the future, but it seems to be [that] economic forces push away from zero agency but also away from very high agency.
I think this intuition is misleading. In most cases we can imagine, a wedding planner that attempts to do other things would be bad at it, and thus undesirable. There’s a trope of people doing more than they’re tasked for, which often goes badly—one reason is that given that they were tasked with something specific, the requester probably assumed that they would mess up other things.
If the agent is good enough to actually do a good job at other things, the situation would look different from the start. If I knew that this person who can do “wedding planning” is also awesome at doing many things, including helping with my finances and larger family issues, then I’d probably ask them something more broad, like, “just make my life better”.
In cases where I trust someone to do a good job at many broad things in my life or business, I typically assign tasks accordingly.
Now, with GPT4, I’m definitely asking it to do many kinds of tasks.
I think economic forces are pushing for many bounded workflows, but because that’s just more effective and economical—it’s easier to make a great experience at “AI for writing”—not because people really otherwise would want it that way.
Love this response, especially the reframing that we often keep tasks bounded not because we want low-agency systems, but because we assume “extra initiative” will go wrong unless we trust the agent’s broader competence. That feels very true in real-world settings, not just theory.
I’ve been exploring how this plays out from more of a psychological and design angle, especially how internal motivations might shift before there’s any visible misbehavior. Some recent work I’ve been reading (like Timaeus) looks at developmental interpretability, and it’s helped me think about agents as growing systems rather than just fixed tools.
I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?
I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?
I think we’re seeing a lot of empirical data about that with AI+code. Now a lot of the human part is in oversight. It’s not too hard to say, “Recommend 5 things to improve. Order them. Do them”—with very little human input.
There’s work to make sure that the AI scope is bounded, and that anything potentially-dangerous it could do gets passed by the human. This seems like a good workflow to me.
I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.
I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!
Thanks for writing this up!
Quick thing I flagged:
> Probably a spectrum is far too simple a way of thinking of this. Probably it’s more complicated, but I think economic forces probably push more toward the middle of the spectrum, not the very extreme end, for the reason that: suppose you’re employing someone to plan your wedding, you would probably like them to stick to the wedding planning and, you know, choose flowers and music and stuff like that, and not try to fix any problems in your family at the moment so that the seating arrangement can be better. [You want them to] understand what the role is and only do that. Maybe it would be better if they were known to be very good at this, so things could be different in the future, but it seems to be [that] economic forces push away from zero agency but also away from very high agency.
I think this intuition is misleading. In most cases we can imagine, a wedding planner that attempts to do other things would be bad at it, and thus undesirable. There’s a trope of people doing more than they’re tasked for, which often goes badly—one reason is that given that they were tasked with something specific, the requester probably assumed that they would mess up other things.
If the agent is good enough to actually do a good job at other things, the situation would look different from the start. If I knew that this person who can do “wedding planning” is also awesome at doing many things, including helping with my finances and larger family issues, then I’d probably ask them something more broad, like, “just make my life better”.
In cases where I trust someone to do a good job at many broad things in my life or business, I typically assign tasks accordingly.
Now, with GPT4, I’m definitely asking it to do many kinds of tasks.
I think economic forces are pushing for many bounded workflows, but because that’s just more effective and economical—it’s easier to make a great experience at “AI for writing”—not because people really otherwise would want it that way.
Love this response, especially the reframing that we often keep tasks bounded not because we want low-agency systems, but because we assume “extra initiative” will go wrong unless we trust the agent’s broader competence. That feels very true in real-world settings, not just theory.
I’ve been exploring how this plays out from more of a psychological and design angle, especially how internal motivations might shift before there’s any visible misbehavior. Some recent work I’ve been reading (like Timaeus) looks at developmental interpretability, and it’s helped me think about agents as growing systems rather than just fixed tools.
I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?
I think we’re seeing a lot of empirical data about that with AI+code. Now a lot of the human part is in oversight. It’s not too hard to say, “Recommend 5 things to improve. Order them. Do them”—with very little human input.
There’s work to make sure that the AI scope is bounded, and that anything potentially-dangerous it could do gets passed by the human. This seems like a good workflow to me.
I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.
I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!