This paywalled article mentions a $4B valuation for the round:
stuhlmueller
A concrete version of this I’ve been wondering about the last few days: To what extent are the negative results on Debate (single-turn, two-turn) intrinsic to small-context supervision vs. a function of relatively contingent design choices about how people get to interact with the models?
I agree that misuse is a concern. Unlike alignment, I think it’s relatively tractable because it’s more similar to problems people are encountering in the world right now.
To address it, we can monitor and restrict usage as needed. The same tools that Elicit provides for reasoning can also be used to reason about whether a use case constitutes misuse.
This isn’t to say that we might not need to invest a lot of resources eventually, and it’s interestingly related to alignment (“misuse” is relative to some values), but it feels a bit less open-ended.
Elicit is using using the Semantic Scholar Academic Graph dataset. We’re working on expanding to other sources. If there are particular ones that would be helpful, message me?
Have you listened to the 80k episode with Nova DasSarma from Anthropic? They might have cybersecurity roles. The closest we have right now is devops—which, btw, if anyone is reading this comment, we are really bottlenecked on and would love intros to great people.
No, it’s that our case for alignment doesn’t rest on “the system is only giving advice” as a step. I sketched the actual case in this comment.
Oh, forgot to mention Jonathan Uesato at Deepmind who’s also very interested in advancing the ML side of factored cognition.
The things that make submodels easier to align that we’re aiming for:
(Inner alignment) Smaller models, making it less likely that there’s scheming happening that we’re not aware of; making the bottom-up interpretability problem easier
(Outer alignment) More well-specified tasks, making it easier to generate a lot of in-distribution feedback data; making it easier to do targetted red-teaming
For AGI there isn’t much of a distinction between giving advice and taking actions, so this isn’t part of our argument for safety in the long run. But in the time between here and AGI it’s better to focus on supporting reasoning to help us figure out how to manage this precarious situation.
To clarify, here’s how I’m interpreting your question:
“Most technical alignment work today looks like writing papers or theoretical blog posts and addressing problems we’d expect to see with more powerful AI. It mostly doesn’t try to be useful today. Ought claims to take a product-driven approach to alignment research, simultaneously building Elicit to inform and implement its alignment work. Why did Ought choose this approach instead of the former?”
First, I think it’s good for the community to take a portfolio approach and for different teams to pursue different approaches. I don’t think there is a single best approach, and a lot of it comes down to the specific problems you’re tackling and team fit.
For Ought, there’s an unusually good fit between our agenda and Elicit the product—our whole approach is built around human-endorsed reasoning steps, and it’s hard to do that without humans who care about good reasoning and actually want to apply it to solve hard problems. If we were working on ELK I doubt we’d be working on a product.
Second, as a team we just like building things. We have better feedback loops this way and the nearer-term impacts of Elicit on improving quality of reasoning in research and beyond provide concrete motivation in addition to the longer-term impacts.
Some other considerations in favor of taking a product-driven approach are:
Deployment plans help us choose tasks. We did “pure alignment research” when we ran our initial factored cognition experiments. At the time, choosing the right task felt about as hard as choosing the right mechanism or implementing it correctly. For example, we want to study factored cognition—should we factor reasoning about essays? SAT questions? Movie reviews? Forecasting questions? When experiments failed, it was hard to know whether we could have stripped down the task more to better test the mechanism, or whether the mechanism in fact didn’t solve the problem at hand. Our findings seemed brittle and highly dependent on assumptions about the task, unlikely to hold up in future deployment scenarios. Now that we have a much clearer incremental deployment story in mind we can better think about what research is more or less likely to be useful. FWIW I suspect this challenge of task specification is a pretty underrated obstacle for many alignment researchers.
Eventually we’ll have to cross the theory-practice gap. At some point alignment research will likely have to cover the theory-practice gap. There are different ways to do this—we could first develop theoretical foundations, then basic algorithms, then implement them in the real-world, or co-evolve the different parts and propagate constraints between them as we go. I think both ways have pros and cons, but it seems important that some groups pursue the latter, especially in a world with short AI timelines.
Risks with trying to do both are:
Balancing multiple stakeholders. Sometimes an impressive research demonstration isn’t actually useful in the short term, or the most useful things for users don’t teach us anything new about alignment. Models are barely capable enough to be acceptable stand-ins for crowd workers, which limits what we can learn; conversely, the best way to solve some product problems could just be to scale up the models. Overall, our product-research dynamic has been net positive and creates virtuous cycles where the product grounds the research and the research improves the product. But it’s a fine line to tread and a tension we have to actively manage. I can easily imagine other teams or agendas where this would be net negative. I imagine we’d also have a harder time making good choices here if we were a for-profit.
Being too solution-driven. From a product angle, I sometimes worry that we might over-apply the decomposition hammer. But an important part of our research goal is understanding where decomposition is useful / competitive / necessary so it’s probably fine as long as we course-correct quickly.
We’re aiming to shift the balance towards supporting high-quality reasoning. Every tool has some non-zero usefulness for non-central use cases, but seems unlikely that it will be as useful as tools that were made for those use cases.
I found your factored cognition project really interesting, is anyone still researching this? (besides the implementation in Elicit)
Some people who are explicitly interested in working on it: Sam Bowman at NYU, Alex Gray at OpenAI. On the ML side there’s also work like Selection-Inference that isn’t explicitly framed as factored cognition but also avoids end-to-end optimization in favor of locally coherent reasoning steps.
I’d say what we’re afraid of is that we’ll have AI systems that are capable of sophisticated planning but that we don’t know how to channel those capabilities into aligned thinking on vague complicated problems. Ought’s work is about avoiding this outcome.
At this point we could chat about why it’s plausible that we’ll have such capable but unaligned AI systems, or about how Ought’s work is aimed at reducing the risk of such systems. The former isn’t specific to Ought, so I’ll point to Ajeya’s post Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.
I just want to highlight the key assumption Ajeya’s argument rests on: The system is end-to-end optimized on a feedback signal (generally from human evaluations), i.e. all its compute is optimizing a signal that has now way to separate “fake it while in training” from “have the right intent” and so can lead to catastrophic outcomes when the system is deployed.
How does Ought’s work help avoid that outcome?
We’re breaking down complex reasoning into processes with parts that are not jointly end-to-end optimized. This makes it possible to use smaller models for individual parts, makes the computation more transparent, and makes it easier to verify that the parts are indeed implementing the function that we (or future models) think they’re implementing.
You can think of it as interpretability-by-construction: Instead of training a model end-to-end and then trying to see what circuits it learned and whether they’re implementing the right thing, take smaller models that you know are implementing the right thing and compose them (with AI help) into larger systems that are correct not primarily based on empirical performance but based on a priori reasoning.
This is complementary to traditional bottom-up interpretability work: The more decomposition can limit the amount of black-box compute and uninterpretable intermediate state, the less weight rests on circuits-style interpretability and ELK-style proposals.
We don’t think we’ll be able to fully avoid end-to-end training (it’s ML’s magic juice, after all), but we think that reducing it is helpful even on the margin. From our post on supervising process, which has a lot more detail on the points in this comment: “Inner alignment failures are most likely in cases where models don’t just know a few facts we don’t but can hide extensive knowledge from us, akin to developing new branches of science that we can’t follow. With limited compute and limited neural memory, the risk is lower.”
- Aug 9, 2022, 10:16 PM; 2 points) 's comment on AMA: Ought by (
We built Ergo (a Python library for integrating model-based and judgmental forecasting) as part of our work on forecasting. In the course of this work we realized that for many forecasting questions the bottleneck isn’t forecasting infrastructure per se, but the high-quality research and reasoning that goes into creating good forecasts, so we decided to focus on that aspect.
I’m still excited about Ergo-like projects (including Squiggle!). Developing it further would be a valuable contribution to epistemic infrastructure. Ergo is an MIT-licensed open-source project so you can basically do whatever you want with it. As a small team we have to focus on our core project, but if there are signs of life from an Ergo successor (5+ regular users, say) I’d be happy to talk for a few hours about what we learned from Ergo.
Ought is an applied machine learning lab, hiring for:
Our mission is to automate and scale open-ended reasoning. To that end, we’re building Elicit, the AI research assistant. Elicit’s architecture is based on supervising reasoning processes, not outcomes. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.
Over the last year, we built Elicit to support broad reviews of empirical literature. The literature review workflow runs on general-purpose infrastructure for executing compositional language model processes. Going forward, we’ll expand to deep literature reviews, then other research workflows, then general-purpose reasoning. (More here)
We’re also only reporting our current guess for how things will turn out. We’re monitoring how Elicit is used and we’ll study its impacts and the anticipated impacts of future features, and if it turns out that the costs outweigh the benefits we will adjust our plans.
Are you worried that your work will be used for more likely regretable things like
improving the competence of actors who are less altruistic and less careful about unintended consequences (e.g. many companies, militaries and government insitutions), and
Less careful actors: Our goal is for Elicit to help people reason better. We want less careful people to use it and reason better than they would have without Elicit, recognizing more unintended consequences and finding actions that are more aligned with their values. The hope is that if we can make good reasoning cheap enough, people will use it. In a sense, we’re all less careful actors right now.
Less altruistic actors: We favor more altruistic actors in deciding who to work with, give access to, and improve Elicit for. We also monitor use so that we can prevent misuse.
speeding up AI capabilities research, and speeding it up more than AI safety research?
I expect the overall impact on x-risk to be a reduction by (a) causing more and better x-risk reduction thinking to happen and (b) shifting ML efforts to a more alignable paradigm, even if (c) Elicit has a non-zero contribution to ML capabilities.
The implicit claim in the concern about speeding up capabilities is that Elicit has a large impact on capabilities because it is so useful. If that is true, we’d expect that it’s also super useful for other domains e.g. AI safety. The larger Elicit’s impact on (c), the larger the corresponding impacts on (a) and (b).
To shift the balance away from (c) we’ll focus on supporting safety-related research and researchers, especially conceptual research. We’re not doing this very well today but are actively thinking about it and moving in that direction. Given that, it would be surprising if Elicit helped a lot with ML capabilities relative to tools and organizations that are explicitly pushing that agenda.
Have you considered deemphasizing trying to offer a commercially successful product that will find broad application in the world, and focussing more strongly on designing systems that are safe and aligned with human values?
We’re a non-profit so have no obligation to make a commercially successful product. We’ll only focus on it to the extent that it furthers aligned reasoning. That said, I think the best outcome is that we make a widely adopted product that makes it easier for everyone to think through the consequences of their actions and act in alignment with their values.
My notes for managees at Ought: Working with Andreas
New post explaining the connection: Ought’s theory of change.
Another potential windfall I just thought of: the kind of AI scientist system discussed by Bengio in this talk (older writeup). The idea is to build a non-agentic system that uses foundation models and amortized Bayesian inference to create and do inference on compositional and interpretable world models. One way this would be used is for high-quality estimates of p(harm|action) in the context of online monitoring of AI systems, but if it could work it would likely have other profitable use cases as well.