AI Benefits Post 2: How AI Benefits Differs from AI Alignment & AI for Good

This is a post in a se­ries on “AI Benefits.” It is cross-posted from my per­sonal blog. For other en­tries in this se­ries, nav­i­gate to the AI Benefits Blog Series In­dex page.

This post is also dis­cussed on LessWrong.

For com­ments on this se­ries, I am thank­ful to Katya Klinova, Max Ghe­nis, Avi­tal Balwit, Joel Becker, An­ton Korinek, and oth­ers. Er­rors are my own.

If you are an ex­pert in a rele­vant area and would like to help me fur­ther ex­plore this topic, please con­tact me.

How AI Benefits Differs from AI Align­ment & AI for Good

The Values Served by AI Benefits Work

Benefits plans need to op­ti­mize for a num­ber of ob­jec­tives.[1] The fore­most is sim­ply max­i­miz­ing wellbe­ing. But AI Benefits work has some sec­ondary goals, too. Some of these in­clude:

  1. Equal­ity: Benefits are dis­tributed fairly and broadly.[2]

  2. Au­ton­omy: AI Benefits re­spect and en­hance end-benefi­cia­ries’ au­ton­omy.[3]

  3. De­moc­ra­ti­za­tion: Where pos­si­ble, AI Benefits de­ci­sion­mak­ers should cre­ate, con­sult with, or defer to demo­cratic gov­er­nance mechanisms.

  4. Modesty: AI bene­fac­tors should be epistem­i­cally mod­est, mean­ing that they should be very care­ful when pre­dict­ing how plans will change or in­ter­act with com­plex sys­tems (e.g., the world econ­omy).

Th­ese sec­ondary goals are largely in­her­ited from the stated goals of many in­di­vi­d­u­als and or­ga­ni­za­tions work­ing to pro­duce AI Benefits.

Ad­di­tion­ally, since the rate of im­prove­ments to wellbe­ing prob­a­bly de­creases with in­come, the fo­cus on max­i­miz­ing wellbe­ing im­plies a fo­cus on the dis­tri­bu­tional as­pects of Benefits.

How AI Benefits differs from AI Alignment

Another im­por­tant clar­ifi­ca­tion is that AI Benefits differ from AI Align­ment.

Both al­ign­ment and benefi­cial­ity are eth­i­cally rele­vant con­cepts. Align­ment can re­fer to sev­eral differ­ent things. Ia­son Gabriel of Deep­Mind pro­vides a use­ful tax­on­omy of ex­ist­ing con­cep­tions of al­ign­ment. Ac­cord­ing to Gabriel, “AI al­ign­ment” can re­fer to al­ign­ment with:

  1. In­struc­tions: the agent does what I in­struct it to do.”

  2. Ex­pressed in­ten­tions: the agent does what I in­tend it to do.”

  3. Re­vealed prefer­ences: the agent does what my be­havi­our re­veals I pre­fer.”

  4. In­formed prefer­ences or de­sires: the agent does what I would want it to do if I were ra­tio­nal and in­formed.”

  5. In­ter­est or well-be­ing: the agent does what is in my in­ter­est, or what is best for me, ob­jec­tively speak­ing.”

  6. Values: the agent does what it morally ought to do . . . .”

A sys­tem can be al­igned in most of these senses with­out be­ing benefi­cial. Be­ing benefi­cial is dis­tinct from be­ing al­igned in senses 1–4 be­cause those deal only with the de­sires of a par­tic­u­lar hu­man prin­ci­pal, which may or may not be benefi­cial. Be­ing benefi­cial is dis­tinct from con­cep­tion 5 be­cause benefi­cial AI aims to benefit many or all moral pa­tients. Only AI that is al­igned in the sixth sense would be benefi­cial by defi­ni­tion. Con­versely, AI need not be well-al­igned to be benefi­cial (though it might help).

How AI Benefits differs from AI for Good

A huge num­ber of pro­jects ex­ist un­der the ban­ner of “AI for Good.” Th­ese pro­jects are gen­er­ally benefi­cial. How­ever, AI Benefits work is differ­ent from sim­ply find­ing and pur­su­ing an AI for Good pro­ject.

AI Benefits work aims at helping AI labs craft a long-term Benefits strat­egy. Un­like AI for Good, which is tied to spe­cific tech­niques/​ca­pa­bil­ities (e.g., NLP) in cer­tain do­mains (e.g., AI in ed­u­ca­tion), AI Benefits is ca­pa­bil­ity- and do­main-ag­nos­tic. Ac­cord­ingly, the pace of AI ca­pa­bil­ities de­vel­op­ment should not dra­mat­i­cally al­ter AI Benefits plans at the high­est level (though it may of course change how they are im­ple­mented). Most of my work there­fore fo­cuses not on con­crete benefi­cial AI ap­pli­ca­tions them­selves, but rather on the pro­cess of choos­ing be­tween and im­prov­ing pos­si­ble benefi­cial ap­pli­ca­tions. This meta-level fo­cus is par­tic­u­larly use­ful at OpenAI, where the pri­mary mis­sion is to benefit the world by build­ing AGI—a tech­nol­ogy with difficult-to-fore­see ca­pa­bil­ities.


  1. Multi-ob­jec­tive op­ti­miza­tion is a very hard prob­lem. Manag­ing this op­ti­miza­tion prob­lem both for­mally and pro­ce­du­rally is a key desider­a­tum for Benefits plans. I do not think I have come close to solv­ing this prob­lem, and would love in­put on this point. ↩︎

  2. OpenAI’s Char­ter com­mits “to us[ing] any in­fluence we ob­tain over AGI’s de­ploy­ment to en­sure it is used for the benefit of all . . . .” ↩︎

  3. OpenAI’s Char­ter com­mits to avoid­ing “un­duly con­cen­trat[ing] power.” ↩︎

No comments.