Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

Epistemic status: speculative. I’m no expert in delegation nor in CAIS. I think I have good intuitions on entrepreneurship and user design. I started reading and thinking about negotiation agents in 2017. I read too many psychology papers in my spare time.

I’ll give a short talk at the AI Safety Discussion Day on Monday 14 December.
→ Please comment with feedback on what I might be missing below to inform my talk.

Drexler posits that artificial general intelligence may be developed recursively through services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) technical report, however, doesn’t cover dynamics where software companies are incentivised to develop personalised services.

The more a company ends up personalising an AI service around completing tasks according to individual users’ needs, the more an instantiation of that service will resemble an agent acting on a user’s behalf. From this general argument follows that both continued increases in customers’ demand for personalised services and in companies’ capacity to process information to supply them (as this whitepaper suggests) will, all else equal, result in more agent-like services.

Neat example: Asana Inc. is developing its online task-management platform into a virtual assistant that helps subscribed teams automate task allocation and prepares documents for employees to focus on their intended tasks. Improved team productivity and employee satisfaction in turn enable Asana to expand their subscriber base and upgrade models.

Counterexamples: Utilities in cloud compute or broadband internet can’t add sufficient value through mass customisation or face legal backlash if they do. Online media and networking services like Google or Facebook rely on third parties for revenue, making it hard to earn the trust of users and get paid to personalise interconnected interfaces.

CAIS neglects what I dub delegated agents: agents designed to act on a person’s behalf.
A next-generation software company could develop and market delegated agents that

  • elicit and model a user’s preferences within and across relevant contexts.

  • build trust with the user to represent their interests within a radius of influence.

  • plan actions autonomously in interaction with other agents that represent other consumers, groups with shared interests, and governance bodies.

Developments in commercially available delegated agents – such as negotiation agents and virtual assistants – will come with new challenges and opportunities for deploying AI designs that align with shared human values and assist us to make wiser decisions.

There’s a bunch of research on delegation spanning decades that I haven’t yet seen discussed in the AI Safety community. Negotiation agents, for example, serve as clean toy models for inquiring into how users can delegate to agents in complex, multi-party exchanges. This paper disentangles dimensions across which negotiation agents must perform to be adopted widely: domain knowledge and preference elicitation, user trust, and long-term perspective.

Center for Human-Compatible AI researchers have published mechanisms to keep humans in the loop such as cooperative IRL, and recently discussed multi-multi delegation in a research considerations paper. As in the CAIS report, the mentioned considerations appear rather decoupled from human contexts in which agent-like designs need to be developed and used to delegate work.


In my talk, I’ll explain outside research I read and practical scenarios/​concrete hypotheses I tried to come up with from the perspective of a software company and its user base. Then, let’s discuss research considerations!

Before you read further:
Think up your own scenario where a user would start delegating their work to an agent.

A scenario

A ‘pure’ delegated agent may start out as a personal service hosted through an encrypted AWS account. Wealthy, tech-savvy early adopters pay a monthly fee to use it as an extension of themselves – to pre-process information and automate decisions on their behalf.

The start-up’s founders recognise that their new tool is much more intimate and intrusive than good ol’ GMail and Facebook (which show ads to anonymised user segments). To market it successfully, they invest in building trust with target users. They design the delegated agent to assuage their user’s fears around data privacy and unfeeling autonomous algorithms, leave control firmly in the user’s hands, explain its actions, and prevent outsiders from snooping or interfering in how it acts on the user’s behalf (or at least give consistent impressions thereof). This instils founder effects in terms of the company’s core expected design and later directions of development.

Research directions that may be relevant to existential safety

  • Narrow value learning: Protocols for eliciting preferences that are user time/​input-efficient, user-approved/​friendly and context-sensitive (reducing elicitation fatigue, and ensuring that users know how to interact and don’t disengage). Models for building accurate (hierarchical?) and interpretable (semi-symbolic?) representations of the user’s preferences on the fly within the service’s defined radius of influence.

  • Defining delegation: How to define responsibility and derive enforceable norms in cases where a person and an agent acting on its behalf collaborate on exercising control and alternate in the taking of initiative?

  • Heterogeneity of influence: How much extra negotiation power or other forms of influence does paying extra for a more sophisticated and computationally powerful delegated agent with more access to information offer? Where does it make sense for groups to pool funds to pay for a delegated agent to represent shared interests? To what extent does being an early mover or adopter in this space increase later influence?

  • Governance and enforcement: How to coordinate the distribution of punishments and rewards to heterogeneous delegated agents (and to the users who choose which designs to buy so they have skin in the game) such that they steer away from actions that impose negative externalities (including hidden systemic risks) onto other, less-represented persons and towards cooperating on creating positive externalities? See this technical paper if that question interests you.

  • Emergence of longer-term goals: Drexler argues for a scenario where services are developed that complete tasks within bounded times (including episodic RL).
    Will a service designed to act on behalf of consumers or coalitions converge on a bounded planning horizon? Would the average planning horizon of a delegated agent be longer than that of ‘conventional’ CAIS? How would stuff like instrumental convergence and Goodharting look like in a messy system of users buying delegated agents that complete tasks across longer time horizons but flexibly elicit and update their model of the users’ preferences and enforcers’ policies?


Cross-posted from LessWrong