Introducing the AI Objectives Institute’s Research: Differential Paths toward Safe and Beneficial AI

Cross-posted to Progress Forum here.

This is a post about how the AI Objectives Institute plans to implement a differential technological development strategy in its research and development projects. For a complete list of projects, see our Research Brief.

With a seeming Cambrian explosion of capabilities from the progress in generative models, there has been an increased need for differential exploration – of alternative (more alignable and governable) paradigms, applications of current capabilities for increasing the sanity waterline, and experimenting with institutional configurations and ways of deploying capabilities directed robustly towards human (and sentient) flourishing.

The AI Objectives Institute (AOI) aims to fill the role we think is necessary here. Using differential technological development and exploration as a guiding principle, we’ll work on improving strategies for prioritization models of safe and beneficial deployment of capabilities, and will conduct applied AI research on solutions we identify based on those strategies. Our goal is to make both theoretical and applied progress towards the following goals:

  • Applying current and future AI to solving problems of alignment and governance

  • Safeguarding the human race against the unknown consequences of technological development

  • Steering AI toward applications that benefit humanity

In this post, we’ll limit discussion of differential technological development to our thoughts about applying that principle to AI and other transformative technologies. For more general discussions of differential strategies, see:


If after reading this you want to collaborate with us, or if you’d rather send us feedback directly instead of commenting on this post, reach out to hello@objective.is.

Differential Development for AI & Social Systems

Differential technological development (DTD) is particularly appealing as a strategy for progress on beneficial AI deployment because:

  • It doesn’t necessitate perfect prediction of the impacts of new technology, so it can still be applied with some degree of efficacy in situations of high uncertainty (such as the future of AI).

  • It’s applicable at any point in the process of technological development – though earlier applications may be more impactful, there is not a point at which the strategy becomes irrelevant.

  • It suggests ways to make progress without directly opposing existing incentive gradients, which can complement strategies that involve changing those incentives.

For questions about beneficial adoption of AI, the advantages of a DTD approach go beyond just mitigating risks from new technology – because the scope of these questions includes the dynamics of existing social systems, such as institutions and governance. This means that making differential progress on beneficial deployment will likely result in progress towards solving existing problems in human systems, making us better at responding to new technology efficiently and effectively.

While it’s true that all technological progress has social effects, highly capable AI holds potential for more direct effects via integration with decision-making systems that were previously run by exclusively human labor – or, in other words, the effects are direct because we will use AI to change how we organize ourselves. If we can find solutions to problems with the dynamics of technical systems, they may generalize to the dynamics of social systems, or vice versa – resulting in both specific progress on integrating AI systems, and improvements to decision-making processes that will continue guiding that integration through the future. There’s more detail on this line of thinking in our whitepaper.

Taxonomies for Differential Development of AI

We propose the following categories to think about DTD in this context:

1. Paradigms for safe, more governable systems: These include frameworks, technical insights, and principles for how to deploy systems safely and in service of their intended goals. Specific outcomes from interpretability research fall into this category, as do projects like the Open Agency Architecture (see Example Projects below). Exploring alternative paradigms can be especially useful work for an organization not focussed on scaling up capabilities, since these organizations can operate outside of the paradigm lock-in effects that may occur within competitive dynamics.

2. Support for civilizational resilience by raising the sanity waterline: This category includes all technologies that make human populations more likely to notice threats from new technology, effectively defend themselves against (or avert) those threats, work towards their own best interests within increasingly novel and confusing contexts, and make effective and informed decisions at critical moments. Examples of this group include tools for identifying machine-generated content, and tools that help people clarify their own desired outcomes.

3. Prefiguration of post-TAI beneficial sociotechnical outcomes: The presence of novel artificial capabilities gives us an opportunity to experiment with how these capabilities may be integrated into our societies in beneficial and corrigible ways, without harmful and corrosive effects. Informed and prescient work on this front can provide valuable direction and insights for beneficial deployment, especially as the tech tree of emerging capabilities becomes clearer to us.

AOI’s work includes efforts across these categories, with a focus on integration of social and technical systems. At the moment, AOI doesn’t have plans to pursue work on technical alignment or interpretability research that outside of such integration – progress on those fronts is essential, but so is laying the groundwork for incorporating those insights into our plans for AI governance.

It’s also worth noting that we plan to do no additional research into scaling the capabilities of the models we apply – our aim is to apply existing models to problems of beneficial deployment, not to build more powerful AI ourselves.

Background on the AI Objectives Institute

The AI Objectives Institute is a non-profit research and development lab working to ensure that people can thrive in a world of rapidly deployed, extremely capable AI systems. ​​Founded by the late internet security pioneer Peter Eckersley, we collaborate with teams across disciplines, developing tools and programs that put humanity in a better position to benefit from AI technologies and avoid large scale disasters.

One way to think about humanity’s current position is as an escape room: we have limited time to find our way out of the situation, and our search for clues – for ways to govern AI systems – depends on maintaining the integrity of our collective deliberation under pressure. To make it out of the room, we must solve the relevant puzzles while communicating with our teammates and keeping our wits about us in a strange and overstimulating world. AOI’s three research avenues map roughly to the requirements of problem-solving capacity, cooperation, and autonomy necessary for success, and we aim to make differential progress in each:

  • Sociotechnical alignment: promoting AI development paradigms that favor governability of institutions and AI systems, including proofs of concept for alternative AI development paradigms.

  • Scalable coordination: promoting ways to make cooperation and deliberation more scalable, including demonstrations of socially beneficial applications of AI for improving collective agency.

  • Human autonomy: research on preserving and enhancing civilizational health and agency, including experiments with resilience-enhancing applications of existing AI capabilities.

As a research incubator, AOI hosts a wide range of applied projects informed by our working groups of internal & affiliated researchers. The combination of theory and practice not only lets us support a variety of different approaches, but also creates feedback loops that help us learn about the problem-spaces we are working in. Our current research focuses include assessment models for the societal impacts of AI, as well as frameworks for researching sociotechnical alignment cruxes like responsible deployment, deliberative processes, and assistive vs deceptive behavior.

AOI’s Theory of Impact

We believe that working towards differential progress along the three research directions above is valuable in multiple but closely interacting ways:

1. Governance matters, and improving societal capacity for governance is one of the best ways to address existential risk. Reducing that risk is a universal public good, while failure to address existential risk is a failure of social coordination. We expect improvements to public deliberation and global coordination systems will reduce existential risk.

2. AI could be massively beneficial, and figuring out how to realize the potential of transformative AI in line with human flourishing should be a high priority. The challenge of making transformative AI good for humanity goes beyond solving technical alignment. We believe that deliberations about how to deploy transformative AI – or perhaps even understanding the possibility space of post-singularity positive outcomes – require serious inquiry into the role advanced AI can play in enhancing human autonomy, well-being, and security.

3. Better governance and use of AI will require experimentation with new AI engineering paradigms. The work of ensuring that AI development remains in service of human flourishing will involve directly confronting engineering challenges that we expect mainstream, large-scale AI research will not pursue. This makes AOI comparatively well-positioned to explore new AI engineering and governance paradigms, develop more governable systems, and influence the broader field.

4. Present-day work on socially beneficial AI will have a significant impact on the sociotechnical development of future AI. We believe that immediately socially beneficial interventions in AI have natural continuity with the long-term alignment of AI to human flourishing, but the relationship between the two can be complex and unexpected. Systematic thinking about principles of civilizational health and agency can help secure the long-term benefits of our interventions, and the foresight that comes of that thinking may complement existing immediacy-focused work.

Example AOI Projects

Below are a few examples of the types of projects we’re working on. For a full list of projects in active development, see our Research Brief.

Even the research brief isn’t a full list of projects we’re currently considering, nor do we think it is a conclusive list of the best possible ideas for achieving safe integration of transformative technology. Our goal is to be exploratory and flexible in what we pursue, maintaining a balance of steady work on the most promising ideas while being responsive to new developments in this rapidly changing area of development. We’re interested in feedback on the current list of projects, and other ideas in line with our mission.

Theory of Beneficial Deployment

Much of AOI’s research focuses on differential technological development of AI capabilities: determining the best order in which to develop new technology, to ensure total benefits outweigh negative effects. In a paper on this subject, we will investigate how we might model the decision process for that differential development as an iterated game. In each round of such a game, civilization can exercise some capabilities that will either deplete or enhance its autonomy in subsequent rounds, affecting its ability to use future capacities responsibly.

Our aim is to use this model in developing a cause prioritization strategy that specifies properties about the world which, if maintained, would ensure that the world keeps going in a better state. One such property would be confidence that algorithmic speech is usually truthful (see Evans et al, Truthful AI: Developing and governing AI that does not lie). With that confidence, we could continue distinguishing true from false information, resulting in less pollution of the information ecology. What other such features, we ask, would be most important in prioritizing differential AI research and deployment?

Open Agency Architecture (Sociotechnical Alignment)

(Note: we’re working on a whole post about collaborating with Davidad on OAA, to be published separately)

The Open Agency Architecture (OAA) proposes a framework for how institutions can adopt highly capable AI. OAA composes bounded, verified components – which can be either human or AI systems – to handle separate parts of the problem-solving process, with human oversight at the points of connection. Building in oversight at this structural level gives OAA systems more flexibility to incorporate new components, allowing for much broader collaboration on building these components, while limiting the risk of misalignment from each new part.

AOI is implementing a proof-of-concept OAA prototype, to create a foundation for coordinating concrete work on beneficial deployment – integrating input from researchers and practitioners across all areas of governance, sociotechnical alignment, and AI capabilities research. We will start by demonstrating that OAA’s modular architecture can achieve parity with – or even outcompete – the current generation of monolithic models in bounded learning environments. Our overall goal is to develop an open-source blueprint for creating institutions that can evolve to incorporate continuously revised and improved processes, including not just transformative AI but also AI-assisted innovations in governance.

Talk to the City (Scalable Coordination)

Talk to the City is a LLM interface for improving collective deliberation and decision-making by analyzing detailed, qualitative responses to questionnaires. It aggregates those responses, clusters distinct viewpoints, and represents each with a LLM chat interface– yielding a simulation of what a citizens’ assembly for that group might look like.

Applying recent advances in AI to problems of group coordination – and ensuring that these technologies are deployed safely and in true alignment with the people they serve – requires technical research into a wide array of questions. Building on a 2022 DeepMind paper on LLMs for collective deliberation, we ask how LLM fine-tuning can be leveraged for:

  1. Finding key disagreement within groups,

  2. Surfacing mutually beneficial possibilities and policies between deliberating parties,

  3. Approaching common understanding,

  4. Identifying confusion and miscommunications between perspectives.

The detail and interactivity our extended LLM-based interface provides can help policymakers uncover misunderstandings, key debates, and areas of common ground in complex public discourse. Use cases range from union decision-making, to determining the needs of recipients in refugee camps, to a collaboration with Metaculus on LLM-assisted deliberations on AI predictions.

Note on our use of LLMs

We recognize that there are risks from misunderstanding or over-relying on the capabilities of current-generation LLMs, which are not yet reliable or intelligible enough for some applications. But we also believe that if used with a clear understanding of their limitations, language models may be a valuable tool for analyzing and simulating discourse at scale. Our intention is to explore this possibility first through small, bounded experiments – which will shed light on the capabilities and the limits of these models, and may reveal the ways in which over-reliance could be problematic.