Aligning Recommender Systems as Cause Area

By Ivan Ven­drov and Jeremy Nixon

Most re­cent con­ver­sa­tions about the fu­ture fo­cus on the point where tech­nol­ogy sur­passes hu­man ca­pa­bil­ity. But they over­look a much ear­lier point where tech­nol­ogy ex­ceeds hu­man vuln­er­a­bil­ities.

The Prob­lem, Cen­ter for Hu­mane Tech­nol­ogy.

The short-term, dopamine-driven feed­back loops that we have cre­ated are de­stroy­ing how so­ciety works.

Chamath Pal­ihapitiya, former Vice Pres­i­dent of user growth at Face­book.

The most pop­u­lar recom­mender sys­tems—the Face­book news feed, the YouTube home­page, Net­flix, Twit­ter—are op­ti­mized for met­rics that are easy to mea­sure and im­prove, like num­ber of clicks, time spent, num­ber of daily ac­tive users, which are only weakly cor­re­lated with what users care about. One of the most pow­er­ful op­ti­miza­tion pro­cesses in the world is be­ing ap­plied to in­crease these met­rics, in­volv­ing thou­sands of en­g­ineers, the most cut­ting-edge ma­chine learn­ing tech­nol­ogy, and a sig­nifi­cant frac­tion of global com­put­ing power. The re­sult is soft­ware that is ex­tremely ad­dic­tive, with a host of hard-to-mea­sure side effects on users and so­ciety in­clud­ing harm to re­la­tion­ships, re­duced cog­ni­tive ca­pac­ity, and poli­ti­cal rad­i­cal­iza­tion.

In this post we ar­gue that im­prov­ing the al­ign­ment of recom­mender sys­tems with user val­ues is one of the best cause ar­eas available to effec­tive al­tru­ists, par­tic­u­larly those with com­puter sci­ence or product de­sign skills.

We’ll start by ex­plain­ing what we mean by recom­mender sys­tems and their al­ign­ment. Then we’ll de­tail the strongest ar­gu­ment in fa­vor of work­ing on this cause, the like­li­hood that work­ing on al­igned recom­mender sys­tem will have pos­i­tive flow-through effects on the broader prob­lem of AGI al­ign­ment. We then con­duct a (very spec­u­la­tive) cause pri­ori­ti­za­tion anal­y­sis, and con­clude with key points of re­main­ing un­cer­tainty as well as some con­crete ways to con­tribute to the cause.

Cause Area Definition

Recom­mender Systems

By recom­mender sys­tems we mean soft­ware that as­sists users in choos­ing be­tween a large num­ber of items, usu­ally by nar­row­ing the op­tions down to a small set. Cen­tral ex­am­ples in­clude the Face­book news feed, the YouTube home­page, Net­flix, Twit­ter, and In­sta­gram. Less cen­tral ex­am­ples are search en­g­ines, shop­ping sites, and per­sonal as­sis­tant soft­ware which re­quire more ex­plicit user in­tent in the form of a query or con­straints.

Align­ing Recom­mender Systems

By al­ign­ing recom­mender sys­tems we mean any work that leads widely used recom­mender sys­tems to al­ign bet­ter with user val­ues. Cen­tral ex­am­ples of bet­ter al­ign­ment would be recom­mender sys­tems which

  • op­ti­mize more for the user’s ex­trap­o­lated vo­li­tion—not what users want to do in the mo­ment, but what they would want to do if they had more in­for­ma­tion and more time to de­liber­ate.

  • re­quire less user effort to su­per­vise for a given level of al­ign­ment. Recom­mender sys­tems of­ten have fa­cil­ities for deep cus­tomiza­tion (for in­stance, it’s pos­si­ble to tell the Face­book News Feed to rank spe­cific friends’ posts higher than oth­ers) but the cog­ni­tive over­head of cre­at­ing and man­ag­ing those prefer­ences is high enough that al­most no­body uses them.

  • re­duce the risk of strong un­de­sired effects on the user, such as see­ing trau­ma­tiz­ing or ex­tremely psy­cholog­i­cally ma­nipu­la­tive con­tent.

What in­ter­ven­tions would best lead to these im­prove­ments? Pri­ori­tiz­ing spe­cific in­ter­ven­tions is out of scope for this es­say, but plau­si­ble can­di­dates in­clude:

  • Devel­op­ing ma­chine learn­ing tech­niques that differ­en­tially make learn­ing from higher-qual­ity hu­man feed­back eas­ier.

  • De­sign­ing user in­ter­faces with higher band­width and fidelity of trans­mis­sion for user prefer­ences.

  • In­creas­ing the in­cen­tives for tech com­pa­nies to adopt al­gorithms, met­rics, and in­ter­faces that are more al­igned. This could be done through in­di­vi­d­ual choices (us­ing more al­igned sys­tems, work­ing for more al­igned com­pa­nies), or through me­dia pres­sure or reg­u­la­tion.

Con­crete Ex­am­ples of How Recom­mender Sys­tems Could be More Aligned

  • A recom­mender sys­tem that op­ti­mizes partly for a user’s de­sired emo­tional state, e.g. us­ing af­fec­tive com­put­ing to de­tect and filter out text that pre­dictably gen­er­ates anger.

  • A con­ver­sa­tional recom­mender sys­tem that al­lows users to de­scribe, in nat­u­ral lan­guage, their long term goals such as “be­come more phys­i­cally fit”, “get into col­lege”, or “spend more time with friends”. It would then slightly ad­just its recom­men­da­tions to make achieve­ment of the goal more likely, e.g. by show­ing more in­struc­tional or in­spiring videos, or alert­ing more ag­gres­sively about good so­cial events nearby.

  • Once a month, users are sent a sum­mary of their us­age pat­terns for the recom­mender sys­tem, such as the dis­tri­bu­tion of time they spent be­tween poli­tics, sports, en­ter­tain­ment, and ed­u­ca­tional con­tent. Us­ing a con­ve­nient in­ter­face, users are able to spec­ify their ideal dis­tri­bu­tion of time, and the recom­mender sys­tem will guide the re­sults to try to achieve that ideal.

Con­nec­tion with AGI Alignment

Risk from the de­vel­op­ment of ar­tifi­cial in­tel­li­gence is widely con­sid­ered one of the most press­ing global prob­lems and pos­i­tively shap­ing the de­vel­op­ment of AI is one of the most promis­ing cause ar­eas for effec­tive al­tru­ists.

We ar­gue that work­ing on al­ign­ing mod­ern recom­mender sys­tems is likely to have large pos­i­tive spillover effects on the big­ger prob­lem of AGI al­ign­ment. There are a num­ber of com­mon tech­ni­cal sub-prob­lems whose solu­tion seems likely to be helpful for both. But since recom­mender sys­tems are so widely de­ployed, work­ing on them will lead to much tighter feed­back loops, al­low­ing more rapid win­now­ing of the space of ideas and solu­tions, faster build-up of in­sti­tu­tional knowl­edge and bet­ter-cal­ibrated re­searcher in­tu­itions. In ad­di­tion, be­cause of the mas­sive eco­nomic and so­cial benefits of in­creas­ing recom­mender sys­tem al­ign­ment, it’s rea­son­able to ex­pect a snow­ball effect of in­creased fund­ing and re­search in­ter­est af­ter the first suc­cesses.

In the rest of this sec­tion we re­view these com­mon tech­ni­cal sub-prob­lems, and spe­cific benefits from ap­proach­ing them in the con­text of recom­mender sys­tems. We then briefly con­sider ways in which work­ing on recom­mender sys­tem al­ign­ment might ac­tu­ally hurt the cause of AGI al­ign­ment. But the most se­ri­ous ob­jec­tion to our ar­gu­ment from an EA per­spec­tive is lack of ne­glect­ed­ness: recom­mender sys­tem al­ign­ment will hap­pen any­way, so it’s differ­en­tially more im­por­tant to work on other sub-prob­lems of AGI al­ign­ment. We dis­cuss this ob­jec­tion more be­low in the sec­tion on Cause Pri­ori­ti­za­tion.

Over­lap­ping Tech­ni­cal Subproblems

Ro­bust­ness to Ad­ver­sar­ial Manipulation

Ro­bust­ness—en­sur­ing ML sys­tems never fail catas­troph­i­cally even on un­seen or ad­ver­sar­i­ally se­lected in­puts—is a crit­i­cal sub­prob­lem of AGI safety. Many solu­tions have been pro­posed, in­clud­ing ver­ifi­ca­tion, ad­ver­sar­ial train­ing, and red team­ing, but it’s un­clear how to pri­ori­tize be­tween these ap­proaches.

Recom­mender sys­tems like Face­book, Google Search, and Twit­ter are un­der con­stant ad­ver­sar­ial at­tack by the most pow­er­ful or­ga­ni­za­tions in the world, such as in­tel­li­gence agen­cies try­ing to in­fluence elec­tions and com­pa­nies do­ing SEO for their web­sites. Th­ese ad­ver­saries can con­duct es­pi­onage, ex­ploit zero-day vuln­er­a­bil­ities in hard­ware and soft­ware, and draw on re­sources far in ex­cess of any re­al­is­tic in­ter­nal red team. There is no bet­ter test of ro­bust­ness to­day than de­ploy­ing an al­igned recom­mender sys­tem at scale; try­ing to make such sys­tems ro­bust will yield a great deal of use­ful data and in­tu­ition for the larger prob­lem of AGI ro­bust­ness.

Un­der­stand­ing prefer­ences and val­ues from nat­u­ral language

There are a few rea­sons to think that bet­ter nat­u­ral lan­guage un­der­stand­ing differ­en­tially im­proves al­ign­ment for both recom­mender sys­tems and AGI.

First, given how strongly the perfor­mance of deep learn­ing sys­tems scales with data size, it seems plau­si­ble that the sheer num­ber of bits of hu­man feed­back ends up be­ing a limit­ing fac­tor in the al­ign­ment of most AI sys­tems. Since lan­guage is the high­est band­width su­per­vi­sory sig­nal (in bits/​sec­ond) that in­di­vi­d­ual hu­mans can provide to an ML sys­tem, and lin­guis­tic abil­ity is nearly uni­ver­sal, it is prob­a­bly the cheap­est and most plen­tiful form of hu­man feed­back.

More spec­u­la­tively, nat­u­ral lan­guage may have the ad­van­tage of qual­ity as well as quan­tity—since hu­mans seem to learn val­ues at least partly through lan­guage in the form of sto­ries, myths, holy books, and moral claims, nat­u­ral lan­guage may be an un­usu­ally high-fidelity rep­re­sen­ta­tion of hu­man val­ues and goals.

Semi-su­per­vised learn­ing from hu­man feedback

Since it’s plau­si­ble that AGI al­ign­ment will be con­strained by the amount of high-qual­ity hu­man feed­back we can provide, a nat­u­ral sub­prob­lem is mak­ing bet­ter use of the la­bels we get via semi-su­per­vised or weakly su­per­vised learn­ing. Pro­pos­als along these lines in­clude Paul Chris­ti­ano’s Semi-su­per­vised RL and what the au­thors of Con­crete Prob­lems in AI Safety call “Scal­able Over­sight”. One es­pe­cially promis­ing ap­proach to the prob­lem is ac­tive learn­ing, where the AI helps se­lect which ex­am­ples need to be la­bel­led.

What are the ad­van­tages for study­ing semi-su­per­vised learn­ing in the con­text of recom­mender sys­tems? First, be­cause these sys­tems are used by mil­lions of peo­ple, they have plen­tiful hu­man feed­back of vary­ing qual­ity, let­ting us test al­gorithms at much more re­al­is­tic scales than grid­wor­lds or MuJoCo. Se­cond, be­cause recom­mender sys­tems are a large part of many peo­ple’s lives, we ex­pect that the feed­back we get would re­flect more of the com­plex­ity of hu­man val­ues. It seems plau­si­ble that we will need qual­i­ta­tively differ­ent ap­proaches to achieve hu­man goals like “be­come phys­i­cally fit” or “spend more time with my friends” than for sim­ple goals in de­ter­minis­tic en­vi­ron­ments.

Learn­ing to com­mu­ni­cate to humans

It seems very likely that both al­igned recom­mender sys­tems and al­igned AGI re­quire bidi­rec­tional com­mu­ni­ca­tion be­tween hu­mans and AI sys­tems, not just a one-way su­per­vi­sory sig­nal from hu­mans to AI. In par­tic­u­lar, safe AI sys­tems may need to be in­ter­pretable—to provide ac­cu­rate ex­pla­na­tions of the choices they make. They may also need to be cor­rigible, which among other prop­er­ties re­quires them to ac­tively com­mu­ni­cate with users to elicit and clar­ify their true prefer­ences.

Recom­mender sys­tems seem a fer­tile ground for ex­plor­ing and eval­u­at­ing differ­ent ap­proaches for in­ter­pretabil­ity and bidi­rec­tional com­mu­ni­ca­tion with hu­mans, es­pe­cially in the con­text of con­ver­sa­tional search and recom­menders.

Un­der­stand­ing Hu­man Factors

In AI Safety Needs So­cial Scien­tists, Ge­offrey Irv­ing and Amanda Askell make the case that pri­ori­tiz­ing tech­ni­cal ap­proaches to AI safety re­quires deeper em­piri­cal un­der­stand­ing of hu­man fac­tors. The bi­ases, weak­nesses, strengths, in­tro­spec­tion abil­ity, in­for­ma­tion-pro­cess­ing and com­mu­ni­ca­tion limi­ta­tions of ac­tual hu­mans and hu­man in­sti­tu­tions seem crit­i­cal to eval­u­at­ing the most promis­ing AGI al­ign­ment pro­pos­als such as de­bate, am­plifi­ca­tion, and re­cur­sive re­ward mod­el­ing.

We agree that run­ning hu­man stud­ies is likely to be valuable for fu­ture AI safety re­search. But we think equally valuable in­for­ma­tion could be ac­quired by de­ploy­ing and study­ing al­igned recom­mender sys­tems. Recom­mender sys­tems main­tain the largest datasets of ac­tual real-world hu­man de­ci­sions. They have billions of users, many of whom would be will­ing to use ex­per­i­men­tal new in­ter­faces for fun or for the promise of bet­ter long-term out­comes. Recom­mender sys­tems are also a fer­tile ground for test­ing new so­cial and in­sti­tu­tional schemes of hu­man-AI col­lab­o­ra­tion. Just in the do­main of re­li­ably ag­gre­gat­ing hu­man judg­ments (likely a key sub­prob­lem for de­bate and am­plifi­ca­tion) they are con­stantly ex­per­i­ment­ing with new tech­niques, from col­lab­o­ra­tive fil­ter­ing to var­i­ous sys­tems for elic­it­ing and ag­gre­gat­ing re­views, rat­ings, and votes. AI safety needs so­cial sci­en­tists, definitely—but it also needs product de­sign­ers, hu­man-com­puter in­ter­ac­tion re­searchers, and busi­ness de­vel­op­ment spe­cial­ists.

Risks from Align­ing Recom­mender Systems

In what ways could work­ing on recom­mender sys­tem al­ign­ment make AI risks worse?

False Confidence

One plau­si­ble sce­nario is that wide­spread use of al­igned recom­mender sys­tems in­stills false con­fi­dence in the al­ign­ment of AI sys­tems, in­creas­ing the like­li­hood and sever­ity of a catas­trophic treach­er­ous turn, or a slow but un­stop­pable trend to­wards the elimi­na­tion of hu­man agency. Cur­rently the pub­lic, me­dia, and gov­ern­ments have a healthy skep­ti­cism to­wards AI sys­tems, and there is a great deal of push­back against us­ing AI sys­tems even for fairly limited tasks like crim­i­nal sen­tenc­ing, fi­nan­cial trad­ing, and med­i­cal de­ci­sions. But if recom­mender sys­tems re­main the most in­fluen­tial AI sys­tems on most peo­ple’s lives, and peo­ple come to view them as highly em­pa­thetic, trans­par­ent, ro­bust, and benefi­cial, skep­ti­cism will wane and in­creas­ing de­ci­sion-mak­ing power will be con­cen­trated in AI hands. If the tech­niques de­vel­oped for al­ign­ing recom­mender sys­tems don’t scale—i.e. stop work­ing af­ter a cer­tain thresh­old of AI ca­pa­bil­ity—then we may have in­creased over­all AI risk de­spite mak­ing great tech­ni­cal progress.

Dual Use

Aligned recom­mender sys­tems may be a strongly dual-use tech­nol­ogy, en­abling com­pa­nies to op­ti­mize more pow­er­fully for ob­jec­tives be­sides al­ign­ment, such as cre­at­ing even more in­tensely ad­dic­tive prod­ucts. An op­ti­miza­tion ob­jec­tive that al­lows you to turn down anger also al­lows you to turn up anger; abil­ity to op­ti­mize for users’ long term goals im­plies abil­ity to in­sinu­ate your­self deeply into users’ lives.

Greater con­trol over these sys­tems also cre­ates dual use cen­sor­ship con­cerns, where or­ga­ni­za­tions could dampen the recom­men­da­tion of con­tent that is nega­tive to­wards them.

Per­ils of Par­tial Alignment

Work­ing on al­ign­ment of recom­mender sys­tems might sim­ply get us worse and harder to de­tect ver­sions of mis­al­ign­ment. For ex­am­ple, many ideas can’t be effec­tively com­mu­ni­cated with­out cre­at­ing an emo­tion or nega­tive side effect that a par­tially al­igned sys­tem may look to sup­press. Highly war­ranted emo­tional re­sponses (e.g. anger at failures to plan for Hur­ri­cane Ka­t­rina, or in re­sponse to geno­cide) could be im­prop­erly damp­ened. Poli­ti­cal po­si­tions that con­sis­tently cre­ate un­de­sir­able emo­tions would also be sup­pressed, which may or may not be bet­ter than the sta­tus quo of pro­mot­ing poli­ti­cal po­si­tions that gen­er­ate out­rage and fear.

Cause Pri­ori­ti­za­tion Analysis

Pre­dic­tions are hard, es­pe­cially about the fu­ture, es­pe­cially in the do­main of eco­nomics and so­ciol­ogy. So we will de­scribe a par­tic­u­lar model of the world which we think is likely, and do our anal­y­sis as­sum­ing that model. It’s vir­tu­ally cer­tain that this model is wrong, and fairly likely (~30% con­fi­dence) that it is wrong in a way that dra­mat­i­cally un­der­mines our anal­y­sis.

The key ques­tion any model of the prob­lem needs to an­swer is—why aren’t recom­mender sys­tems already al­igned? There are a lot of pos­si­ble con­tin­gent rea­sons, for in­stance that few peo­ple have thought about it, and the few who did were not in a po­si­tion to work on it. But the effi­cient mar­ket hy­poth­e­sis im­plies there isn’t a gi­ant pool of eco­nomic value ly­ing around for any­one to pick up. That means at least one of the fol­low­ing struc­tural rea­sons is true:

  1. Aligned recom­mender sys­tems aren’t very eco­nom­i­cally valuable.

  2. Align­ing recom­mender sys­tems is ex­tremely difficult and ex­pen­sive.

  3. A solu­tion to the al­ign­ment prob­lem is a pub­lic good in which we ex­pect ra­tio­nal eco­nomic ac­tors to un­der­in­vest.

Our model says it’s a com­bi­na­tion of (2) and (3). No­tice that Google didn’t in­vent or fund AlexNet, the break­through pa­per that pop­u­larized image clas­sifi­ca­tion with deep con­volu­tional neu­ral net­works—but it was quick to in­vest im­mense re­sources once the break­through had been made. Similarly with Mon­santo and CRISPR.

We think al­ign­ing recom­mender sys­tems fol­lows the same pat­tern—there are still re­search challenges that are too hard and risky for com­pa­nies to in­vest sig­nifi­cant re­sources in. The challenges seem in­ter­dis­ci­plinary (in­volv­ing in­sights from ML, hu­man-com­puter in­ter­ac­tion, product de­sign, so­cial sci­ence) which makes it harder to at­tract fund­ing and aca­demic in­ter­est. But there is a crit­i­cal thresh­old at which the eco­nomic in­cen­tives to­wards wide adop­tion be­come over­pow­er­ing. Once the ev­i­dence that al­igned recom­mender sys­tems are prac­ti­cal and prof­itable reaches a cer­tain thresh­old, tech com­pa­nies and ven­ture cap­i­tal­ists will pour money and tal­ent into the field.

If this model is roughly cor­rect, al­igned recom­mender sys­tems are in­evitable—the only ques­tion is, how much can we speed up their cre­ation and wide adop­tion? More pre­cisely, what is the re­la­tion­ship be­tween ad­di­tional re­sources in­vested now and the time it takes us to reach the crit­i­cal thresh­old?

The most op­ti­mistic case we can imag­ine is analo­gous to AlexNet—a sin­gle good pa­per or pro­to­type, rep­re­sent­ing about 1-3 per­son-years in­vested, man­ages a con­cep­tual break­through and trig­gers a flood of in­ter­est that brings the time-to-thresh­old 5 years closer.

The most pes­simistic case is that the time-to-thresh­old is not con­strained at the mar­gin by fund­ing, tal­ent or at­ten­tion; per­haps suffi­cient re­sources are already in­vested across the var­i­ous tech com­pa­nies. In that case ad­di­tional re­sources will be com­pletely wasted.

Our me­dian es­ti­mate is that a small re­search sub-field (in­volv­ing ~10-30 peo­ple over 3-5 years) could bring the crit­i­cal thresh­old 3 years closer.

As­sum­ing this model is roughly right, we now ap­ply the Scale-Ne­glect­ness-Solv­abil­ity frame­work for cause pri­ori­ti­za­tion (also known as ITN—Im­por­tance, Tractabil­ity, Ne­glect­ed­ness) as de­scribed by 80000 Hours.


The eas­iest prob­lem to quan­tify is the di­rect effect on qual­ity of life while con­sum­ing con­tent from recom­mender sys­tems. In 2017 Face­book users spent about 1 billion hours /​ day on the site; YouTube also claims more than a billion hours a day in 2019. Net­flix in 2017 counted 140 mil­lion hours per day. Not all of this time is pow­ered by recom­mender sys­tems, but 2.4 billion user hours /​ day = 100 mil­lion user years /​ year is a rea­son­ably con­ser­va­tive or­der of mag­ni­tude es­ti­mate.

What is the differ­ence in ex­pe­rienced wellbe­ing in time on cur­rent recom­mender sys­tems vs al­igned recom­mender sys­tems? 1% seems con­ser­va­tive, lead­ing to 1 mil­lion QALYs lost ev­ery year sim­ply from time spent on un­al­igned recom­mender sys­tems.

It’s likely that the flow-through effects on the rest of users’ lives will be even greater, if the stud­ies show­ing effects on men­tal health, cog­ni­tive func­tion, re­la­tion­ships hold out, and if al­igned recom­mender sys­tems are able to sig­nifi­cantly as­sist users in achiev­ing their long term goals. Even more spec­u­la­tively, if recom­mender sys­tems are able to al­ign with users’ ex­trap­o­lated vo­li­tion this may also have flow-through effects on so­cial sta­bil­ity, wis­dom, and long-ter­mist at­ti­tudes in a way that helps miti­gate ex­is­ten­tial risk.

It’s much harder to quan­tify the scale of the AGI al­ign­ment prob­lem, in­so­far as al­ign­ing recom­mender sys­tems helps solve it; we will defer to 80000 Hours’ es­ti­mate of 3 billion QALYs per year.


Cul­turally there’s a lot of aware­ness of the prob­lems with un­al­igned recom­mender sys­tems, so the amount of po­ten­tial sup­port to draw on seems high. Com­pa­nies like Google and Face­book have an­nounced ini­ti­a­tives around Digi­tal Wel­lbe­ing and Time Well Spent, but it’s un­clear how fun­da­men­tal these changes are. There are some non­prof­its like Cen­ter for Hu­man Tech­nol­ogy work­ing on im­prov­ing in­cen­tives for com­pa­nies to adopt al­igned recom­menders, but none to our knowl­edge work­ing on the tech­ni­cal prob­lem it­self.

How many full-time em­ploy­ees are ded­i­cated to the prob­lem? At the high end, we might count all ML, product, data anal­y­sis, and UI work on recom­mender sys­tems as hav­ing some com­po­nent of al­ign­ing with user val­ues, in which case there is on the or­der of 1000s of peo­ple work­ing on the prob­lem globally. We es­ti­mate the num­ber that are sub­stan­tially en­gag­ing with the al­ign­ment prob­lem (as op­posed to im­prov­ing user en­gage­ment) full-time is at least an or­der of mag­ni­tude lower, prob­a­bly less than 100 peo­ple globally.


The di­rect prob­lem—un­al­igned recom­mender sys­tems mak­ing their users worse off than they could be—seems very solv­able. There are many seem­ingly tractable re­search prob­lems to pur­sue, lots of in­ter­est from the me­dia and wider cul­ture, and clear eco­nomic in­cen­tives for pow­er­ful ac­tors to throw money at a clear and con­vinc­ing tech­ni­cal re­search agenda. It seems like a dou­bling of di­rect effort (~100 more peo­ple) would likely solve a large frac­tion of the prob­lem, per­haps all of it, within a few years.

For the AGI al­ign­ment prob­lem, 80000 Hours’ es­ti­mate (last up­dated in March 2017) is that dou­bling the effort, which they es­ti­mate as $10M an­nu­ally, would re­duce AI risk by about 1%. Given the large de­gree of tech­ni­cal over­lap, it seems plau­si­ble that solv­ing al­igned recom­mender sys­tems would solve 1-10% of the whole AGI al­ign­ment prob­lem, so I’ll es­ti­mate the flow-through re­duc­tion in AI risk at 0.01 − 0.1%.

Over­all Importance

Ivan’s Note: I have very low con­fi­dence that these num­bers mean any­thing. In the spirit of If It’s Worth Do­ing, It’s Worth Do­ing With Made-Up Statis­tics, I’m com­put­ing them any­way. May Taleb have mercy on my soul.

Con­vert­ing all the num­bers above into the 80000 Hours log­a­r­ith­mic scor­ing sys­tem for prob­lem im­por­tance, we get the fol­low­ing over­all prob­lem scores. We use [x,y] to de­note an in­ter­val of val­ues.

Prob­lem Scale Ne­glect­ed­ness Solv­abil­ity To­tal
Unal­igned Recom­menders 8 [6,8] [6,7] [20,23]
Risks from AI (flow-through) 15 [6,8] [2,3] [23,26]

The over­all range is be­tween 20 and 26, which is co­in­ci­den­tally about the range of the most ur­gent global is­sues as scored by 80000 Hours, with cli­mate change at 20 and risks from ar­tifi­cial in­tel­li­gence at 27.

Key Points of Uncertainty

A wise man once said to think of math­e­mat­i­cal proofs not as a way to be con­fi­dent in our the­o­rems, but as a way to fo­cus our doubts on the as­sump­tions. In a similar spirit, we hope this es­say serves to fo­cus our un­cer­tain­ties about this cause area on a few key ques­tions:

  1. Could al­ign­ing weak AI sys­tems such as recom­menders be net harm­ful due to the false con­fi­dence it builds? Are there ways of miti­gat­ing this effect?

  2. When will al­igned recom­mender sys­tems emerge, if we don’t in­ter­vene? If the an­swer is “never”, why? Why might al­igned recom­mender sys­tems not emerge in our eco­nomic en­vi­ron­ment, de­spite their ob­vi­ous util­ity for users?

  3. What frac­tion of the whole AGI al­ign­ment prob­lem would ro­bustly al­ign­ing recom­mender sys­tems with roughly mod­ern ca­pa­bil­ities solve? we es­ti­mated 1-10%, but we can imag­ine wor­lds in which it’s 0.1% or 90%.

  4. What is the di­rect cost that un­al­igned recom­mender sys­tems are im­pos­ing on peo­ple’s lives? With fairly con­ser­va­tive as­sump­tions we es­ti­mated 1 mil­lion QALYs per year, but we could eas­ily see it be­ing two or­ders of mag­ni­tude more or less.

How You Can Contribute

Ma­chine learn­ing re­searchers, soft­ware en­g­ineers, data sci­en­tists, poli­cy­mak­ers, and oth­ers can im­me­di­ately con­tribute to the goal of al­ign­ing recom­mender sys­tems.

  • Much of the re­search needed to en­able effec­tive con­trol of recom­menders has not been done. Re­searchers in academia and es­pe­cially in in­dus­try are in a po­si­tion to ask and an­swer ques­tions like:

    • What side effects are our recom­men­da­tion en­g­ines hav­ing?

    • How can we more effec­tively de­tect harm­ful side effects?

    • What effect do differ­ent op­ti­miza­tion met­rics (e.g. num­ber of likes or com­ments, time spent) have on harm­ful side effects? Are some sub­stan­tially more al­igned with col­lec­tive well-be­ing than oth­ers?

    • Can we de­sign op­ti­miza­tion ob­jec­tives that do what we want?

  • The im­ple­men­ta­tion of re­search tends to be done by soft­ware en­g­ineers. Be­ing a mem­ber of a team stew­ard­ing these recom­mender sys­tems will give you a con­crete un­der­stand­ing of how the sys­tem is im­ple­mented, what its limi­ta­tions and knobs for ad­just­ment are, and what ideas can prac­ti­cally be brought to bear on the sys­tem.

  • Data sci­en­tists can in­ves­ti­gate ques­tions like ‘how does this user’s be­hav­ior change as a re­sult of hav­ing seen this recom­men­da­tion?’ and ‘what tra­jec­to­ries in topic /​ video space ex­ist, where we see large clusters of users un­der­go­ing the same tran­si­tion in their watch pat­terns?’. This is an es­pe­cially crit­i­cal ques­tion for chil­dren and other vuln­er­a­ble users.

  • Poli­cy­mak­ers are cur­rently con­sid­er­ing tak­ing dra­matic steps to re­duce the nega­tive im­pact of tech­nol­ogy on the pop­u­la­tion. Tools de­vel­oped by re­searchers work­ing on this cause area can help. Many of those tools will make it fea­si­ble to check what im­pact is be­ing had on the pop­u­la­tion, and will in­tro­duce meth­ods that guard against spe­cific and quan­tifi­able no­tions of ex­ces­sive harm.