I think donors interested in AI safety should consider Ought.
In this post I’ll describe what Ought is currently doing, why I think it’s promising, and give some detail on these asks.
(I am an Ought donor and board member.)
Factored evaluation
Ought’s main project is currently designing and running “factored evaluation” experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:
Consider a principal who wants to incentivize experts to work on hard problems, but finds that the work is too complex for the principal to tell what is good or bad. Can they overcome this problem by dividing the evaluation task into smaller pieces, delegating each of them to other experts, and recursively incentivizing good work on those pieces using the same mechanism?
Here’s what an experiment looks like:
Recruit a pool of “judges,” “helpful experts,” and “malicious experts.”
Start with a question that is too difficult for the judge to answer.
In initial experiments, the questions are difficult because the judge lacks relevant context. For example, the judge may want to answer a question about a long article, while only being allowed to look at a few hundred characters of text.
Limiting the amount of text the judge can read is an artificial restriction. This is intended as a warm-up, an easier analogy for the situation where the experts have knowledge that cannot be explained to a single judge in a reasonable amount of time (or where we want to reduce the cost of judging).
An “honest expert” provides a good answer to that question, a “malicious expert” provides a bad answer to that question, and a judge is tasked with telling which answer was better. The answers themselves may be too large and complex for the judge to look at, in which case they’ll be handed a compact representation of each answer which they can inspect as needed.
In order to tell which answer is better, the judge asks subquestions. For example, they might ask “What’s the best reason to think that answer A is better than answer B?” or “What does the author of article X believe about question Y?”
We answer these subquestions by delegating to the experts. The honest expert answers each subquestion as well as they can. The malicious expert can propose a bad subanswer, and we run the experiment recursively to see if the judge would pick the good or bad subanswer to the subquestion. If the malicious expert can propose subanswers that corrupt the top-level answer, and that are selected by the judge when we run this experiment recursively, then the malicious expert wins and the experiment fails.
This is not Ought’s only project, but it’s currently the largest single focus. Other projects include: exploring how well we can automate the judge’s role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.
Why this is important for AI alignment
ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.
Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought’s current experiments. I’d like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.
Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it’s easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I’m more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.
Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem “can we find a scalable approach to delegation?” It wouldn’t be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.
Ways to help
Web developer
I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:
The success of our factored evaluation experiments depends on Mosaic, the core web interface our experimenters use. We’re hiring a thoughtful full-stack engineer to architect a fundamental redesign of Mosaic that will accommodate flexible experiment setups and improve features like data capture. We want you to be the strategic thinker that can own Mosaic and its future, reasoning through design choices and launching the next versions quickly.
Our benefits and compensation package are at market with similar roles in the Bay Area.
We think the person who will thrive in this role will demonstrate the following:
4-6+ years of experience building complex web apps from scratch in Javascript (React), HTML, and CSS
Ability to reason about and choose between different front-end languages, cloud services, API technologies
Experience managing a small team, squad, or project with at least 3-5 other engineers in various roles
Clear communication about engineering topics to a diverse audience
Excitement around being an early member of a small, nimble research organization, and playing a key role in its success
Passion for the mission and the importance of designing schemes that successfully delegate cognitive work to AI
Experience with functional programming, compilers, interpreters, or “unusual” computing paradigms
Experiment participants
Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well—my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).
We’re looking to hire contractors ($25/hour) to participate in our experiments [...] This is a pretty unique way to help out with AI safety: (i) Remote work with flexible hours—the experiment is turn-based, so you can participate at any time of day (ii) we expect that skill with language will be more important than skill with math or engineering.
If things go well, you’d likely want to devote 5-20 hours/week to this for at least a few months. Participants will need to build up skill over time to play at their best, so we think it’s important that people stick around for a while.
The application takes about 20 minutes. If you pass this initial application stage, we’ll pay you the $25/hour rate for your training and work going forward.
I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it’s the main AI safety project I donate to. You can donate here.
They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.
They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I’ve been happy with their approach to exploratory stages, and I’m tentatively excited about their approach to execution.
Ought: why it matters and ways to help
(Cross-posted from LessWrong)
I think that Ought is one of the most promising projects working on AI alignment. There are several ways that readers can potentially help:
They are recruiting a senior full-stack web developer.
They are recruiting participants for “factored evaluation” experiments.
I think donors interested in AI safety should consider Ought.
In this post I’ll describe what Ought is currently doing, why I think it’s promising, and give some detail on these asks.
(I am an Ought donor and board member.)
Factored evaluation
Ought’s main project is currently designing and running “factored evaluation” experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:
Here’s what an experiment looks like:
Recruit a pool of “judges,” “helpful experts,” and “malicious experts.”
Start with a question that is too difficult for the judge to answer.
In initial experiments, the questions are difficult because the judge lacks relevant context. For example, the judge may want to answer a question about a long article, while only being allowed to look at a few hundred characters of text.
Limiting the amount of text the judge can read is an artificial restriction. This is intended as a warm-up, an easier analogy for the situation where the experts have knowledge that cannot be explained to a single judge in a reasonable amount of time (or where we want to reduce the cost of judging).
An “honest expert” provides a good answer to that question, a “malicious expert” provides a bad answer to that question, and a judge is tasked with telling which answer was better. The answers themselves may be too large and complex for the judge to look at, in which case they’ll be handed a compact representation of each answer which they can inspect as needed.
In order to tell which answer is better, the judge asks subquestions. For example, they might ask “What’s the best reason to think that answer A is better than answer B?” or “What does the author of article X believe about question Y?”
We answer these subquestions by delegating to the experts. The honest expert answers each subquestion as well as they can. The malicious expert can propose a bad subanswer, and we run the experiment recursively to see if the judge would pick the good or bad subanswer to the subquestion. If the malicious expert can propose subanswers that corrupt the top-level answer, and that are selected by the judge when we run this experiment recursively, then the malicious expert wins and the experiment fails.
This is not Ought’s only project, but it’s currently the largest single focus. Other projects include: exploring how well we can automate the judge’s role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.
Why this is important for AI alignment
ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.
Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought’s current experiments. I’d like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.
Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it’s easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I’m more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.
Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem “can we find a scalable approach to delegation?” It wouldn’t be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.
Ways to help
Web developer
I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:
Experiment participants
Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well—my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).
Here is the description from their posting:
Donate
I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it’s the main AI safety project I donate to. You can donate here.
They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.
They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I’ve been happy with their approach to exploratory stages, and I’m tentatively excited about their approach to execution.