Ought’s theory of change

Ought is an applied machine learning lab. In this post we summarize our work on Elicit and why we think it’s important.

We’d love to get feedback on how to make Elicit more useful to the EA community, and on our plans more generally.

This post is based on two recent LessWrong posts:

In short

Our mission is to automate and scale open-ended reasoning. To that end, we’re building Elicit, the AI research assistant.

Elicit’s architecture is based on supervising reasoning processes, not outcomes. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.

Over the last year, we built Elicit to support broad reviews of empirical literature. The literature review workflow runs on general-purpose infrastructure for executing compositional language model processes. Going forward, we’ll expand to deep literature reviews, then other research workflows, then general-purpose reasoning.

Our mission

Our mission is to automate and scale open-ended reasoning. If we can improve the world’s ability to reason, we’ll unlock positive impact across many domains including AI governance & alignment, psychological well-being, economic development, and climate change.

As AI advances, the raw cognitive capabilities of the world will increase. The goal of our work is to channel this growth toward good reasoning. We want AI to be more helpful for qualitative research, long-term forecasting, planning, and decision-making than for persuasion, keeping people engaged, and military robotics.

Good reasoning is as much about process as it is about outcomes. In fact, outcomes are unavailable if we’re reasoning about the long term. So we’re generally not training machine learning models end-to-end using outcome data, but building Elicit compositionally and based on human reasoning processes.

The case for process-based ML systems

We can think about machine learning systems on a spectrum from process-based to outcome-based:

  • Process-based systems are built on human-understandable task decompositions, with direct supervision of reasoning steps. More

  • Outcome-based systems are built on end-to-end optimization, with supervision of final results. More

We think that process-based systems are better:

  1. In the short term, process-based ML systems have better differential capabilities: They help us apply ML to tasks where we don’t have access to outcomes. These tasks include long-range forecasting, policy decisions, and theoretical research. More

  2. In the long term, process-based ML systems help avoid catastrophic outcomes from systems gaming outcome measures and are thus more aligned. More

  3. Both process- and outcome-based evaluation are attractors to varying degrees: Once an architecture is entrenched, it’s hard to move away from it. This lock-in applies much more to outcome-based systems. More

  4. Whether the most powerful ML systems will primarily be process-based or outcome-based is up in the air. More

  5. So it’s crucial to push toward process-based training now.

Relative to the potential benefits, we think that process-based systems have gotten surprisingly little explicit attention in the AI alignment community.

How we think about success

We’re pursuing our mission by building Elicit, a process-based AI research assistant.

We succeed if:

  1. Elicit radically increases the amount of good reasoning in the world.

    1. For experts, Elicit pushes the frontier forward.

    2. For non-experts, Elicit makes good reasoning more affordable. People who don’t have the tools, expertise, time, or mental energy to make well-reasoned decisions on their own can do so with Elicit.

  2. Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

Progress in 2021

We’ve made the following progress in 2021:

  1. We built Elicit to support researchers because high-quality research is a bottleneck to important progress and because researchers care about good reasoning processes. More

  2. We identified some building blocks of research (e.g. search, summarization, classification), operationalized them as language model tasks, and connected them in the Elicit literature review workflow. More

  3. On the infrastructure side, we built a streaming task execution engine for running compositions of language model tasks This engine is supporting the literature review workflow in production. More

  4. About 1,500 people use Elicit every month. More

Roadmap for 2022+

Our plans for 2022+:

  1. We expand literature review to digest the full text of papers, extract evidence, judge methodological robustness, and help researchers do deeper evaluations by decomposing questions like “What are the assumptions behind this experimental result?” More

  2. After literature review, we add other research workflows, e.g. evaluating project directions, decomposing research questions, and augmented reading. More

  3. To support these workflows, we refine the primitive tasks through verifier models and human feedback, and expand our infrastructure for running complex task pipelines, quickly adding new tasks, and efficiently gathering human data. More

  4. Over time, Elicit becomes a general-purpose reasoning assistant, transforming any task involving evidence, arguments, plans and decisions. More

We’re hiring for basically all roles—ML engineer, front-end, full-stack, operations, product design, operations, even recruiting. Join our team!