The things that make submodels easier to align that we’re aiming for:
(Inner alignment) Smaller models, making it less likely that there’s scheming happening that we’re not aware of; making the bottom-up interpretability problem easier
(Outer alignment) More well-specified tasks, making it easier to generate a lot of in-distribution feedback data; making it easier to do targetted red-teaming
Would you share with me some typical example tasks that you’d give a submodel and typical good responses it might give back? (as a vision, so I’ll know what you’re talking about when you’re saying things like “well specified tasks”—I’m not sure if we’re imagining the same thing there. It doesn’t need to be something that already works today)
In a research assistant setting, you could imagine the top-level task being something like “Was this a double-blind study?”, which we might factor out as:
Were the participants blinded?
Was there a placebo?
Which paragraphs relate to placebos?
Does this paragraph state there was a placebo?
…
Did the participants know if they were in the placebo group?
…
Were the researchers blinded?
…
In this example, by the time we get to the “Does this paragraph state there was a placebo?” level, a submodel is given a fairly tractable question-answering task over a given paragraph. A typical response for this example might be a confidence level and text spans pointing to the most relevant phrases.
The goal for Elicit is for it to be a research assistant, leading to more and higher quality research. Literature review is only one small part of that: we would like to add functionality like brainstorming research directions, finding critiques, identifying potential collaborators, …
Beyond that, we believe that factored cognition could scale to lots of knowledge work. Anywhere the tasks are fuzzy, open-ended, or have long feedback loops, we think Elicit (or our next product) could be a fit. Journalism, think-tanks, policy work.
It is, very much. Answering so-called strength of evidence questions accounts for big chunks of researchers’ time today.
The things that make submodels easier to align that we’re aiming for:
(Inner alignment) Smaller models, making it less likely that there’s scheming happening that we’re not aware of; making the bottom-up interpretability problem easier
(Outer alignment) More well-specified tasks, making it easier to generate a lot of in-distribution feedback data; making it easier to do targetted red-teaming
Would you share with me some typical example tasks that you’d give a submodel and typical good responses it might give back? (as a vision, so I’ll know what you’re talking about when you’re saying things like “well specified tasks”—I’m not sure if we’re imagining the same thing there. It doesn’t need to be something that already works today)
In a research assistant setting, you could imagine the top-level task being something like “Was this a double-blind study?”, which we might factor out as:
Were the participants blinded?
Was there a placebo?
Which paragraphs relate to placebos?
Does this paragraph state there was a placebo?
…
Did the participants know if they were in the placebo group?
…
Were the researchers blinded?
…
In this example, by the time we get to the “Does this paragraph state there was a placebo?” level, a submodel is given a fairly tractable question-answering task over a given paragraph. A typical response for this example might be a confidence level and text spans pointing to the most relevant phrases.
Thank you, this was super informative! My understanding of Ought just improved a lot
Once you’re able to answer questions like that, what do you build next?
Is “Was this a double-blind study?” an actual question that your users/customers are very interested in?
If not, could you give me some other example that is?
You’re welcome!
The goal for Elicit is for it to be a research assistant, leading to more and higher quality research. Literature review is only one small part of that: we would like to add functionality like brainstorming research directions, finding critiques, identifying potential collaborators, …
Beyond that, we believe that factored cognition could scale to lots of knowledge work. Anywhere the tasks are fuzzy, open-ended, or have long feedback loops, we think Elicit (or our next product) could be a fit. Journalism, think-tanks, policy work.
It is, very much. Answering so-called strength of evidence questions accounts for big chunks of researchers’ time today.