To clarify, here’s how I’m interpreting your question:
“Most technical alignment work today looks like writing papers or theoretical blog posts and addressing problems we’d expect to see with more powerful AI. It mostly doesn’t try to be useful today. Ought claims to take a product-driven approach to alignment research, simultaneously building Elicit to inform and implement its alignment work. Why did Ought choose this approach instead of the former?”
First, I think it’s good for the community to take a portfolio approach and for different teams to pursue different approaches. I don’t think there is a single best approach, and a lot of it comes down to the specific problems you’re tackling and team fit.
For Ought, there’s an unusually good fit between our agenda and Elicit the product—our whole approach is built around human-endorsed reasoning steps, and it’s hard to do that without humans who care about good reasoning and actually want to apply it to solve hard problems. If we were working on ELK I doubt we’d be working on a product.
Second, as a team we just like building things. We have better feedback loops this way and the nearer-term impacts of Elicit on improving quality of reasoning in research and beyond provide concrete motivation in addition to the longer-term impacts.
Some other considerations in favor of taking a product-driven approach are:
Deployment plans help us choose tasks. We did “pure alignment research” when we ran our initial factored cognition experiments. At the time, choosing the right task felt about as hard as choosing the right mechanism or implementing it correctly. For example, we want to study factored cognition—should we factor reasoning about essays? SAT questions? Movie reviews? Forecasting questions? When experiments failed, it was hard to know whether we could have stripped down the task more to better test the mechanism, or whether the mechanism in fact didn’t solve the problem at hand. Our findings seemed brittle and highly dependent on assumptions about the task, unlikely to hold up in future deployment scenarios. Now that we have a much clearer incremental deployment story in mind we can better think about what research is more or less likely to be useful. FWIW I suspect this challenge of task specification is a pretty underrated obstacle for many alignment researchers.
Eventually we’ll have to cross the theory-practice gap. At some point alignment research will likely have to cover the theory-practice gap. There are different ways to do this—we could first develop theoretical foundations, then basic algorithms, then implement them in the real-world, or co-evolve the different parts and propagate constraints between them as we go. I think both ways have pros and cons, but it seems important that some groups pursue the latter, especially in a world with short AI timelines.
Risks with trying to do both are:
Balancing multiple stakeholders. Sometimes an impressive research demonstration isn’t actually useful in the short term, or the most useful things for users don’t teach us anything new about alignment. Models are barely capable enough to be acceptable stand-ins for crowd workers, which limits what we can learn; conversely, the best way to solve some product problems could just be to scale up the models. Overall, our product-research dynamic has been net positive and creates virtuous cycles where the product grounds the research and the research improves the product. But it’s a fine line to tread and a tension we have to actively manage. I can easily imagine other teams or agendas where this would be net negative. I imagine we’d also have a harder time making good choices here if we were a for-profit.
Being too solution-driven. From a product angle, I sometimes worry that we might over-apply the decomposition hammer. But an important part of our research goal is understanding where decomposition is useful / competitive / necessary so it’s probably fine as long as we course-correct quickly.
Another benefit of our product-driven approach is that we aim to provide a positive contribution to the alignment community. By which I mean:
Thanks to amazing prior work in straight alignment research, we already have some idea of anti-patterns and risks that we all want to avoid. What we’re still lacking are safety attractors: i.e. alternative approaches which are competitive with and safer than the current paradigm.
We want for Elicit to be an existence proof that there is a better way to solve certain complex tasks, and for our approach to go on to be adopted by others – because it’s in their self-interest, not because it’s safe.
To clarify, here’s how I’m interpreting your question:
“Most technical alignment work today looks like writing papers or theoretical blog posts and addressing problems we’d expect to see with more powerful AI. It mostly doesn’t try to be useful today. Ought claims to take a product-driven approach to alignment research, simultaneously building Elicit to inform and implement its alignment work. Why did Ought choose this approach instead of the former?”
First, I think it’s good for the community to take a portfolio approach and for different teams to pursue different approaches. I don’t think there is a single best approach, and a lot of it comes down to the specific problems you’re tackling and team fit.
For Ought, there’s an unusually good fit between our agenda and Elicit the product—our whole approach is built around human-endorsed reasoning steps, and it’s hard to do that without humans who care about good reasoning and actually want to apply it to solve hard problems. If we were working on ELK I doubt we’d be working on a product.
Second, as a team we just like building things. We have better feedback loops this way and the nearer-term impacts of Elicit on improving quality of reasoning in research and beyond provide concrete motivation in addition to the longer-term impacts.
Some other considerations in favor of taking a product-driven approach are:
Deployment plans help us choose tasks. We did “pure alignment research” when we ran our initial factored cognition experiments. At the time, choosing the right task felt about as hard as choosing the right mechanism or implementing it correctly. For example, we want to study factored cognition—should we factor reasoning about essays? SAT questions? Movie reviews? Forecasting questions? When experiments failed, it was hard to know whether we could have stripped down the task more to better test the mechanism, or whether the mechanism in fact didn’t solve the problem at hand. Our findings seemed brittle and highly dependent on assumptions about the task, unlikely to hold up in future deployment scenarios. Now that we have a much clearer incremental deployment story in mind we can better think about what research is more or less likely to be useful. FWIW I suspect this challenge of task specification is a pretty underrated obstacle for many alignment researchers.
Eventually we’ll have to cross the theory-practice gap. At some point alignment research will likely have to cover the theory-practice gap. There are different ways to do this—we could first develop theoretical foundations, then basic algorithms, then implement them in the real-world, or co-evolve the different parts and propagate constraints between them as we go. I think both ways have pros and cons, but it seems important that some groups pursue the latter, especially in a world with short AI timelines.
Risks with trying to do both are:
Balancing multiple stakeholders. Sometimes an impressive research demonstration isn’t actually useful in the short term, or the most useful things for users don’t teach us anything new about alignment. Models are barely capable enough to be acceptable stand-ins for crowd workers, which limits what we can learn; conversely, the best way to solve some product problems could just be to scale up the models. Overall, our product-research dynamic has been net positive and creates virtuous cycles where the product grounds the research and the research improves the product. But it’s a fine line to tread and a tension we have to actively manage. I can easily imagine other teams or agendas where this would be net negative. I imagine we’d also have a harder time making good choices here if we were a for-profit.
Being too solution-driven. From a product angle, I sometimes worry that we might over-apply the decomposition hammer. But an important part of our research goal is understanding where decomposition is useful / competitive / necessary so it’s probably fine as long as we course-correct quickly.
Another benefit of our product-driven approach is that we aim to provide a positive contribution to the alignment community. By which I mean:
Thanks to amazing prior work in straight alignment research, we already have some idea of anti-patterns and risks that we all want to avoid. What we’re still lacking are safety attractors: i.e. alternative approaches which are competitive with and safer than the current paradigm.
We want for Elicit to be an existence proof that there is a better way to solve certain complex tasks, and for our approach to go on to be adopted by others – because it’s in their self-interest, not because it’s safe.