Ought: why it matters and ways to help

(Cross-posted from LessWrong)

I think that Ought is one of the most promis­ing pro­jects work­ing on AI al­ign­ment. There are sev­eral ways that read­ers can po­ten­tially help:

In this post I’ll de­scribe what Ought is cur­rently do­ing, why I think it’s promis­ing, and give some de­tail on these asks.

(I am an Ought donor and board mem­ber.)

Fac­tored evaluation

Ought’s main pro­ject is cur­rently de­sign­ing and run­ning “fac­tored eval­u­a­tion” ex­per­i­ments, and build­ing rele­vant in­fras­truc­ture. The goal of these ex­per­i­ments is to an­swer the fol­low­ing ques­tion:

Con­sider a prin­ci­pal who wants to in­cen­tivize ex­perts to work on hard prob­lems, but finds that the work is too com­plex for the prin­ci­pal to tell what is good or bad. Can they over­come this prob­lem by di­vid­ing the eval­u­a­tion task into smaller pieces, del­e­gat­ing each of them to other ex­perts, and re­cur­sively in­cen­tiviz­ing good work on those pieces us­ing the same mechanism?

Here’s what an ex­per­i­ment looks like:

  • Re­cruit a pool of “judges,” “helpful ex­perts,” and “mal­i­cious ex­perts.”

  • Start with a ques­tion that is too difficult for the judge to an­swer.

    • In ini­tial ex­per­i­ments, the ques­tions are difficult be­cause the judge lacks rele­vant con­text. For ex­am­ple, the judge may want to an­swer a ques­tion about a long ar­ti­cle, while only be­ing al­lowed to look at a few hun­dred char­ac­ters of text.

    • Limit­ing the amount of text the judge can read is an ar­tifi­cial re­stric­tion. This is in­tended as a warm-up, an eas­ier anal­ogy for the situ­a­tion where the ex­perts have knowl­edge that can­not be ex­plained to a sin­gle judge in a rea­son­able amount of time (or where we want to re­duce the cost of judg­ing).

  • An “hon­est ex­pert” pro­vides a good an­swer to that ques­tion, a “mal­i­cious ex­pert” pro­vides a bad an­swer to that ques­tion, and a judge is tasked with tel­ling which an­swer was bet­ter. The an­swers them­selves may be too large and com­plex for the judge to look at, in which case they’ll be handed a com­pact rep­re­sen­ta­tion of each an­swer which they can in­spect as needed.

  • In or­der to tell which an­swer is bet­ter, the judge asks sub­ques­tions. For ex­am­ple, they might ask “What’s the best rea­son to think that an­swer A is bet­ter than an­swer B?” or “What does the au­thor of ar­ti­cle X be­lieve about ques­tion Y?”

  • We an­swer these sub­ques­tions by del­e­gat­ing to the ex­perts. The hon­est ex­pert an­swers each sub­ques­tion as well as they can. The mal­i­cious ex­pert can pro­pose a bad sub­an­swer, and we run the ex­per­i­ment re­cur­sively to see if the judge would pick the good or bad sub­an­swer to the sub­ques­tion. If the mal­i­cious ex­pert can pro­pose sub­an­swers that cor­rupt the top-level an­swer, and that are se­lected by the judge when we run this ex­per­i­ment re­cur­sively, then the mal­i­cious ex­pert wins and the ex­per­i­ment fails.

This is not Ought’s only pro­ject, but it’s cur­rently the largest sin­gle fo­cus. Other pro­jects in­clude: ex­plor­ing how well we can au­to­mate the judge’s role on sim­ple ques­tions us­ing ex­ist­ing ML, and think­ing about pos­si­ble de­com­po­si­tion strate­gies and challenges for fac­tored eval­u­a­tion.

Why this is im­por­tant for AI alignment

ML sys­tems are trained by gra­di­ent de­scent to op­ti­mize a mea­surable ob­jec­tive. In the best case (i.e. ig­nor­ing mis­al­igned learned op­ti­miza­tion) they be­have like an ex­pert in­cen­tivized to op­ti­mize that ob­jec­tive. De­sign­ing an ob­jec­tive that in­cen­tivizes ex­perts to re­veal what they know seems like a crit­i­cal step in AI al­ign­ment. I think hu­man ex­perts are of­ten a use­ful anal­ogy for pow­er­ful ML sys­tems, and that we should be us­ing that anal­ogy as much as we can.

Not co­in­ci­den­tally, fac­tored eval­u­a­tion is a ma­jor com­po­nent of my cur­rent best-guess about how to ad­dress AI al­ign­ment, which could liter­ally in­volve train­ing AI sys­tems to re­place hu­mans in Ought’s cur­rent ex­per­i­ments. I’d like to be at the point where fac­tored eval­u­a­tion ex­per­i­ments are work­ing well at scale be­fore we have ML sys­tems pow­er­ful enough to par­ti­ci­pate in them. And along the way I ex­pect to learn enough to sub­stan­tially re­vise the scheme (or to­tally re­ject it), re­duc­ing the need for tri­als in the fu­ture when there is less room for er­ror.

Beyond AI al­ign­ment, it cur­rently seems much eas­ier to del­e­gate work if we get im­me­di­ate feed­back about the qual­ity of out­put. For ex­am­ple, it’s eas­ier to get some­one to run a con­fer­ence that will get a high ap­proval rat­ing, than to run a con­fer­ence that will help par­ti­ci­pants figure out how to get what they ac­tu­ally want. I’m more con­fi­dent that this is a real prob­lem than that our cur­rent un­der­stand­ing of AI al­ign­ment is cor­rect. Even if fac­tored eval­u­a­tion does not end up be­ing crit­i­cal for AI al­ign­ment I think it would likely im­prove the ca­pa­bil­ity of AI sys­tems that help hu­man­ity cope with long-term challenges, rel­a­tive to AI sys­tems that help de­sign new tech­nolo­gies or ma­nipu­late hu­mans. I think this kind of differ­en­tial progress is im­por­tant.

Beyond AI, I think that hav­ing a clearer un­der­stand­ing of how to del­e­gate hard open-ended prob­lems would be a good thing for so­ciety, and it seems worth­while to have a mod­est group work­ing on the rel­a­tively clean prob­lem “can we find a scal­able ap­proach to del­e­ga­tion?” It wouldn’t be my high­est pri­or­ity if not for the rele­vance to AI, but I would still think Ought is at­tack­ing a nat­u­ral and im­por­tant ques­tion.

Ways to help

Web developer

I think this is likely to be the most im­pact­ful way for some­one with sig­nifi­cant web de­vel­op­ment ex­pe­rience to con­tribute to AI al­ign­ment right now. Here is the de­scrip­tion from their job post­ing:

The suc­cess of our fac­tored eval­u­a­tion ex­per­i­ments de­pends on Mo­saic, the core web in­ter­face our ex­per­i­menters use. We’re hiring a thought­ful full-stack en­g­ineer to ar­chi­tect a fun­da­men­tal re­design of Mo­saic that will ac­com­mo­date flex­ible ex­per­i­ment se­tups and im­prove fea­tures like data cap­ture. We want you to be the strate­gic thinker that can own Mo­saic and its fu­ture, rea­son­ing through de­sign choices and launch­ing the next ver­sions quickly.
Our benefits and com­pen­sa­tion pack­age are at mar­ket with similar roles in the Bay Area.
We think the per­son who will thrive in this role will demon­strate the fol­low­ing:
4-6+ years of ex­pe­rience build­ing com­plex web apps from scratch in Javascript (Re­act), HTML, and CSS
Abil­ity to rea­son about and choose be­tween differ­ent front-end lan­guages, cloud ser­vices, API technologies
Ex­pe­rience man­ag­ing a small team, squad, or pro­ject with at least 3-5 other en­g­ineers in var­i­ous roles
Clear com­mu­ni­ca­tion about en­g­ineer­ing top­ics to a di­verse audience
Ex­cite­ment around be­ing an early mem­ber of a small, nim­ble re­search or­ga­ni­za­tion, and play­ing a key role in its success
Pas­sion for the mis­sion and the im­por­tance of de­sign­ing schemes that suc­cess­fully del­e­gate cog­ni­tive work to AI
Ex­pe­rience with func­tional pro­gram­ming, com­pilers, in­ter­preters, or “un­usual” com­put­ing paradigms

Ex­per­i­ment participants

Ought is look­ing for con­trac­tors to act as judges, hon­est ex­perts, and mal­i­cious ex­perts in their fac­tored eval­u­a­tion ex­per­i­ments. I think that hav­ing com­pe­tent peo­ple do­ing this work makes it sig­nifi­cantly eas­ier for Ought to scale up faster and im­proves the prob­a­bil­ity that ex­per­i­ments go well—my rough guess is that a very com­pe­tent and al­igned con­trac­tor work­ing for an hour does about as much good as some­one donat­ing $25-50 to Ought (in ad­di­tion to the $25 wage).

Here is the de­scrip­tion from their post­ing:

We’re look­ing to hire con­trac­tors ($25/​hour) to par­ti­ci­pate in our ex­per­i­ments [...] This is a pretty unique way to help out with AI safety: (i) Re­mote work with flex­ible hours—the ex­per­i­ment is turn-based, so you can par­ti­ci­pate at any time of day (ii) we ex­pect that skill with lan­guage will be more im­por­tant than skill with math or en­g­ineer­ing.
If things go well, you’d likely want to de­vote 5-20 hours/​week to this for at least a few months. Par­ti­ci­pants will need to build up skill over time to play at their best, so we think it’s im­por­tant that peo­ple stick around for a while.
The ap­pli­ca­tion takes about 20 min­utes. If you pass this ini­tial ap­pli­ca­tion stage, we’ll pay you the $25/​hour rate for your train­ing and work go­ing for­ward.
Ap­ply as Ex­per­i­ment Participant

I think Ought is prob­a­bly the best cur­rent op­por­tu­nity to turn marginal $ into more AI safety, and it’s the main AI safety pro­ject I donate to. You can donate here.

They are spend­ing around $1M/​year. Their past work has been some com­bi­na­tion of: build­ing tools and ca­pac­ity, hiring, a se­quence of ex­plo­ra­tory pro­jects, chart­ing the space of pos­si­ble ap­proaches and figur­ing out what they should be work­ing on. You can read their 2018H2 up­date here.

They have re­cently started to scale up ex­per­i­ments on fac­tored eval­u­a­tion (while con­tin­u­ing to think about pri­ori­ti­za­tion, build ca­pac­ity, etc.). I’ve been happy with their ap­proach to ex­plo­ra­tory stages, and I’m ten­ta­tively ex­cited about their ap­proach to ex­e­cu­tion.