Speaker here. I haven’t reviewed this transcript yet, but shortly after the talk I wrote up these notes (slides + annotations) which I probably endorse more than what I said at the time.
stuhlmueller
Karma: 227
Speaker here. I haven’t reviewed this transcript yet, but shortly after the talk I wrote up these notes (slides + annotations) which I probably endorse more than what I said at the time.
Ought co-founder here. There are two ways Elicit relates to alignment broadly construed:
1 - Elicit informs how to train powerful AI through decomposition
Roughly speaking, there are two ways of training AI systems:
End-to-end training
Decomposition of tasks into human-understandable subtasks
We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.
Elicit is our bet on the compositional approach. We’re testing how feasible it is to decompose large tasks like “figure out the answer to this science question by reading the literature” by breaking them into subtasks like:
Brainstorm subquestions that inform the overall question
Find the most relevant papers for a (sub-)question
Answer a (sub-)question given an abstract for a paper
Summarize answers into a single answer
Over time, more of this decomposition will be done by AI assistants.
At each point in time, we want to push the compositional approach to the limits of current language models, and keep up with (or exceed) what’s possible through end-to-end training. This requires that we overcome engineering barriers in gathering human feedback and orchestrating calls to models in a way that doesn’t depend much on current architectures.
I view this as the natural continuation of our past work where we studied decomposition using human participants. Unlike then, it’s now possible to do this work using language models, and the more applied setting has helped us a lot in reducing the gap between research assumptions and deployment.
2 - Elicit makes AI differentially useful for AI & tech policy, and other high-impact applications
In a world where AI capabilities scale rapidly, I think it’s important that these capabilities can support research aimed at guiding AI development and policy, and more generally help us figure out what’s true and make good plans as much as they help persuade and optimize goals with fast feedback or easy specification.
Ajeya mentions this point in The case for aligning narrowly superhuman models:
Beth mentions the more general point in Risks from AI persuasion under possible interventions:
I’ll write more about how we view our role in the space in Q1 2022.