80K Podcast: Stuart Russell

Link post

…if no one’s al­lowed to talk about the prob­lems, then no one is go­ing to fix them. So it’s kind of like say­ing you’ve come across a ter­rible ac­ci­dent and you say, “Well, no one should call an am­bu­lance be­cause some­one’s go­ing to call an am­bu­lance”.

Stu­art Rus­sell, Pro­fes­sor at UC Berkeley and co-au­thor of the most pop­u­lar AI text­book, thinks the way we ap­proach ma­chine learn­ing to­day is fun­da­men­tally flawed.

In his new book, Hu­man Com­pat­i­ble, he out­lines the ‘stan­dard model’ of AI de­vel­op­ment, in which in­tel­li­gence is mea­sured as the abil­ity to achieve some definite, com­pletely-known ob­jec­tive that we’ve stated ex­plic­itly. This is so ob­vi­ous it al­most doesn’t even seem like a de­sign choice, but it is.

Un­for­tu­nately there’s a big prob­lem with this ap­proach: it’s in­cred­ibly hard to say ex­actly what you want. AI to­day lacks com­mon sense, and sim­ply does what­ever we’ve asked it to. That’s true even if the goal isn’t what we re­ally want, or the meth­ods it’s choos­ing are ones we would never ac­cept.

We already see AIs mis­be­hav­ing for this rea­son. Stu­art points to the ex­am­ple of YouTube’s recom­mender al­gorithm, which re­port­edly nudged users to­wards ex­treme poli­ti­cal views be­cause that made it eas­ier to keep them on the site. This isn’t some­thing we wanted, but it helped achieve the al­gorithm’s ob­jec­tive: max­imise view­ing time.

Like King Mi­das, who asked to be able to turn ev­ery­thing into gold but ended up un­able to eat, we get too much of what we’ve asked for.

This ‘al­ign­ment’ prob­lem will get more and more se­vere as ma­chine learn­ing is em­bed­ded in more and more places: recom­mend­ing us news, op­er­at­ing power grids, de­cid­ing prison sen­tences, do­ing surgery, and fight­ing wars. If we’re ever to hand over much of the econ­omy to think­ing ma­chines, we can’t count on our­selves cor­rectly say­ing ex­actly what we want the AI to do ev­ery time.

Stu­art isn’t just dis­satis­fied with the cur­rent model though, he has a spe­cific solu­tion. Ac­cord­ing to him we need to re­design AI around 3 prin­ci­ples:

  1. The AI sys­tem’s ob­jec­tive is to achieve what hu­mans want.

  2. But the sys­tem isn’t sure what we want.

  3. And it figures out what we want by ob­serv­ing our be­havi­our.

Stu­art thinks this de­sign ar­chi­tec­ture, if im­ple­mented, would be a big step for­ward to­wards re­li­ably benefi­cial AI.

For in­stance, a ma­chine built on these prin­ci­ples would be happy to be turned off if that’s what its owner thought was best, while one built on the stan­dard model should re­sist be­ing turned off be­cause be­ing de­ac­ti­vated pre­vents it from achiev­ing its goal. As Stu­art says, “you can’t fetch the coffee if you’re dead.”

Th­ese prin­ci­ples lend them­selves to­wards ma­chines that are mod­est and cau­tious, and check in when they aren’t con­fi­dent they’re truly achiev­ing what we want.

We’ve made progress to­ward putting these prin­ci­ples into prac­tice, but the re­main­ing en­g­ineer­ing prob­lems are sub­stan­tial. Among other things, the re­sult­ing AIs need to be able to in­ter­pret what peo­ple re­ally mean to say based on the con­text of a situ­a­tion. And they need to guess when we’ve re­jected an op­tion be­cause we’ve con­sid­ered it and de­cided it’s a bad idea, and when we sim­ply haven’t thought about it at all.

Stu­art thinks all of these prob­lems are sur­mountable, if we put in the work. The harder prob­lems may end up be­ing so­cial and poli­ti­cal.

When each of us can have an AI of our own — one smarter than any per­son — how do we re­solve con­flicts be­tween peo­ple and their AI agents? How con­sid­er­ate of other peo­ple’s in­ter­ests do we ex­pect AIs to be? How do we avoid them be­ing used in mal­i­cious or anti-so­cial ways?

And if AIs end up do­ing most work that peo­ple do to­day, how can hu­mans avoid be­com­ing en­fee­bled, like lazy chil­dren tended to by ma­chines, but not in­tel­lec­tu­ally de­vel­oped enough to know what they re­ally want?

De­spite all these prob­lems, the re­wards of suc­cess could be enor­mous. If cheap think­ing ma­chines can one day do most of the work peo­ple do now, it could dra­mat­i­cally raise ev­ery­one’s stan­dard of liv­ing, like a sec­ond in­dus­trial rev­olu­tion.

Without hav­ing to work just to sur­vive, peo­ple might flour­ish in ways they never have be­fore.

In to­day’s con­ver­sa­tion we cover, among many other things:

  • What are the ar­gu­ments against be­ing con­cerned about AI?

  • Should we de­velop AIs to have their own eth­i­cal agenda?

  • What are the most ur­gent re­search ques­tions in this area?

Get this epi­sode by sub­scribing to our pod­cast on the world’s most press­ing prob­lems and how to solve them: type 80,000 Hours into your pod­cast­ing app. Or read the tran­script be­low.

Key points

Purely al­tru­is­tic machines

The prin­ci­ple says that the ma­chine’s only pur­pose is the re­al­iza­tion of hu­man prefer­ences. So that ac­tu­ally has some kind of spe­cific tech­ni­cal con­tent in it. For ex­am­ple, if you look at, in com­par­i­son, Asi­mov’s laws, he says the ma­chine should pre­serve its own ex­is­tence. That’s the third law. And he’s got a caveat say­ing only if that doesn’t con­flict with the first two laws. But in fact it’s strictly un­nec­es­sary be­cause the rea­son why you want the ma­chine to pre­serve its own ex­is­tence is not some mis­placed sense of con­cern for the ma­chine’s feel­ings or any­thing like that. The rea­son should be be­cause its ex­is­tence is benefi­cial to hu­mans. And so the first prin­ci­ple already en­com­passes the obli­ga­tion to keep your­self in func­tion­ing or­der so that you can be helping hu­mans satisfy their prefer­ences.

So there’s a lot you could write just about that. It seems like “moth­er­hood and ap­ple pie”: of course ma­chines should be good for hu­man be­ings, right? What else would they be? But already that’s a big step be­cause the stan­dard model doesn’t say they should be good for hu­man be­ings at all. The stan­dard model just says they should op­ti­mize the ob­jec­tive and if the ob­jec­tive isn’t good for hu­man be­ings, the stan­dard model doesn’t care. So just the first prin­ci­ple would in­clude the fact that hu­man be­ings in the long run do not want to be en­fee­bled. They don’t want to be overly de­pen­dent on ma­chines to the ex­tent that they lose their own ca­pa­bil­ities and their own au­ton­omy and so on. So peo­ple ask, “Isn’t your ap­proach go­ing to elimi­nate hu­man au­ton­omy”?

But of course, no. A prop­erly de­signed ma­chine would only in­ter­vene to the ex­tent that hu­man au­ton­omy is pre­served. And so some­times it would say, “No, I’m not go­ing to help you tie your shoelaces. You have to tie your shoelaces your­self” just as par­ents do at some point with the child. It’s time for the par­ents to stop ty­ing the child’s shoelaces and let the child figure it out and get on with it.

Hum­ble machines

The sec­ond prin­ci­ple is that ma­chines are go­ing to be un­cer­tain about the hu­man prefer­ences that they’re sup­posed to be op­ti­miz­ing or re­al­iz­ing. And that’s not so much a prin­ci­ple, it’s just a state­ment of fact. That’s the dis­tinc­tion that sep­a­rates this re­vised model from the stan­dard model of AI. And it’s re­ally the piece that is what brings about the safety con­se­quences of this model. That if ma­chines are cer­tain about the ob­jec­tive, then you get all these un­de­sir­able con­se­quences: the pa­per­clip op­ti­mizer, et cetera. Where the ma­chine pur­sues its ob­jec­tive in an op­ti­mal fash­ion, re­gard­less of any­thing we might say. So we can say, you know, “Stop, you’re de­stroy­ing the world”! And the ma­chine says, “But I’m just car­ry­ing out the op­ti­mal plan for the ob­jec­tive that’s put in me”. And the ma­chine doesn’t have to be think­ing, “Okay, well the hu­man put these or­ders into me; what are they”? It’s just the ob­jec­tive is the con­sti­tu­tion of the ma­chine.

And if you look at the agents that we train with re­in­force­ment learn­ing, for ex­am­ple, de­pend­ing on what type of agent they are, if it’s a Q-learn­ing agent or a policy search agent, which are two of the more pop­u­lar kinds of re­in­force­ment learn­ing, they don’t even have a rep­re­sen­ta­tion of the ob­jec­tive at all. They’re just the train­ing pro­cess where the re­ward sig­nal is sup­plied by the re­in­force­ment learn­ing frame­work. So that re­ward sig­nal is defin­ing the ob­jec­tive that the ma­chine is go­ing to op­ti­mize. But the ma­chine doesn’t even know what the ob­jec­tive is. It’s just an op­ti­mizer of that ob­jec­tive. And so there’s no sense in which that ma­chine could say, “Oh, I won­der if my ob­jec­tive is the wrong one” or any­thing like that. It’s just an op­ti­mizer of that ob­jec­tive.

Learn­ing to pre­dict hu­man preferences

One way to do in­verse re­in­force­ment learn­ing is Bayesian IRL, where you start with a prior and then the ev­i­dence from the be­hav­ior you ob­serve then up­dates your prior and even­tu­ally you get a pretty good idea of what it is the en­tity or per­son is try­ing to do. It’s a very nat­u­ral thing that peo­ple do all the time. You see some­one do­ing some­thing and most of the time it just feels like you just di­rectly per­ceive what they’re do­ing, right? I mean, you see some­one go up to the ATM and press some but­tons and take the money. It’s just like that’s what they’re do­ing. They’re get­ting money out of the ATM. I de­scribe the be­hav­ior in this pur­po­sive form.

I don’t de­scribe it in terms of the phys­i­cal tra­jec­tory of their legs and arms and hands and so on. I de­scribe it as, you know, the ac­tion is some­thing that’s pur­pose fulfilling. So we per­ceive it di­rectly. And then some­times you could be wrong, right? They could be try­ing to steal money from the ATM by some spe­cial code key se­quence that they’ve figured out. Or they could be act­ing in a movie. So if you saw them take a few steps back and then do the whole thing again, you might won­der, “Oh, that’s funny. What are they do­ing? Maybe they’re try­ing to get out more money than the limit they can get on each trans­ac­tion”? And then if you saw some­one with a cam­era filming them, you would say, “Oh, okay, I see now what they’re do­ing. They’re not get­ting money from the ATM at all. They are act­ing in a movie”.

So it’s just ab­solutely com­pletely nat­u­ral for hu­man be­ings to in­ter­pret our per­cep­tions in terms of pur­pose. In con­ver­sa­tion, you’re always try­ing to figure out “Why is some­one say­ing that”? Are they ask­ing me a ques­tion? Is it a rhetor­i­cal ques­tion? It’s so nat­u­ral, it’s sub­con­scious a lot of the time. So there are many differ­ent forms of in­ter­ac­tion that could take place that would provide in­for­ma­tion to ma­chines about hu­man prefer­ences. For ex­am­ple, just read­ing books pro­vides in­for­ma­tion about hu­man prefer­ences: about the prefer­ences of the in­di­vi­d­u­als, but also about hu­mans in gen­eral.

One of the ways that we learn about other hu­mans is by read­ing nov­els and see­ing the choices of the char­ac­ters. And some­times you get di­rect in­sight into their mo­ti­va­tions de­pend­ing on whether the au­thor wants to give you that. Some­times you have to figure it out. So I think that there’s a wealth of in­for­ma­tion from which ma­chines could build a gen­eral prior about hu­man prefer­ences. And then as you in­ter­act with an in­di­vi­d­ual, you re­fine that prior. You find out that they’re a ve­gan. You find out that they voted for Pres­i­dent Trump. You try to re­solve these two con­tra­dic­tory facts. And then you grad­u­ally build up a more spe­cific model for that par­tic­u­lar in­di­vi­d­ual.

En­fee­ble­ment problem

Go all the way to, you know, the chil­dren who are raised by wolves or what­ever. The out­come seems to be that, “Oh my gosh, if they’re aban­doned in the woods as in­fants and some­how they sur­vive and grow up, they don’t speak Latin”. They don’t speak at all. And they have some sur­vival skills, but are they writ­ing po­etry? Are they try­ing to learn more about physics? No, they’re not do­ing any of those things. So there’s noth­ing nat­u­ral about, shall we say, sci­en­tific cu­ri­os­ity. It’s some­thing that’s emerged over thou­sands of years of cul­ture.

So we have to think what kind of cul­ture do we need in or­der to pro­duce adults who re­tain cu­ri­os­ity and au­ton­omy and vi­gor as op­posed to just be­com­ing in­sti­tu­tion­al­ized. And I think if you look at E.M Forster’s story “The Ma­chine Stops”, I think that’s a pretty good ex­plo­ra­tion of this. That ev­ery­one in his story is looked af­ter. No one has any kind of use­ful job. In fact, the most use­ful thing they can think of is to listen to MOOCs. So he in­vented the MOOC in 1909 so peo­ple are giv­ing on­line open lec­tures to any­one who wants to listen, and then peo­ple sub­scribe to var­i­ous pod­cast se­ries, I guess you’d call them. And that’s kind of all they do. There’s very lit­tle ac­tual pur­pose­ful ac­tivity left for the hu­man race. And this is not de­sir­able; to me, this is a dis­aster. We could de­stroy our­selves with nu­clear weapons. We could wipe out the hab­it­able bio­sphere with cli­mate change. Th­ese would be dis­asters, but this is an­other dis­aster, right?

A fu­ture where the hu­man race has lost pur­pose. That the vast ma­jor­ity of in­di­vi­d­u­als func­tion with very lit­tle au­ton­omy or aware­ness or knowl­edge or learn­ing. So how do you cre­ate a cul­ture and ed­u­ca­tional pro­cess? I think what hu­mans value in them­selves is a re­ally im­por­tant thing. How do you make it so that peo­ple make the effort to learn and dis­cover and gain au­ton­omy and skills when all of the in­cen­tive to do that up to now, dis­ap­pears. And our whole ed­u­ca­tion sys­tem is very ex­pen­sive. As I point out in the book, when you add up how much time peo­ple have spent learn­ing to be com­pe­tent hu­man be­ings, it’s about a trillion per­son years and it’s all be­cause you have to. Other­wise things just com­pletely fall apart. And we’ve in­ter­nal­ized that in our whole sys­tem of how we re­ward peo­ple. We give them grades. We give them ac­co­lades. We give them No­bel prizes. There’s an enor­mous amount in our cul­ture which is there to re­ward the pro­cess of learn­ing and be­com­ing com­pe­tent and skil­led.

And you could ar­gue, “Well that’s from the en­light­en­ment” or what­ever. But I would ar­gue it’s mostly a con­se­quence of the fact that that’s func­tional. And when the func­tional pur­pose of all that dis­ap­pears, I think we might see it de­cay very rapidly un­less we take steps to avoid it.

AI moral rights

Stu­art Rus­sell: If they re­ally do have sub­jec­tive ex­pe­rience, and putting aside whether or not we would ever know, putting aside the fact that if they do, it’s prob­a­bly com­pletely un­like any kind of sub­jec­tive ex­pe­rience that hu­mans have or even that an­i­mals have be­cause it’s be­ing pro­duced by a to­tally differ­ent com­pu­ta­tional ar­chi­tec­ture as well as a to­tally differ­ent phys­i­cal ar­chi­tec­ture. But even if we put all that to one side, it seems to me that if they are ac­tu­ally hav­ing sub­jec­tive ex­pe­rience, then we do have a real prob­lem and it does af­fect the calcu­la­tion in some sense. It might say ac­tu­ally then we re­ally can’t pro­ceed with this en­ter­prise at all, be­cause I think we have to re­tain con­trol from our own point of view. But if that im­plies in­flict­ing un­limited suffer­ing on sen­tient be­ings, then it would seem like, well, we can’t go that route at all. Again, there’s no analogues, right? It’s not ex­actly like invit­ing a su­pe­rior alien species to come and be our slaves for­ever, but it’s sort of like that.

Robert Wiblin: I sup­pose if you didn’t want to give up on the whole en­ter­prise, you could try to find a way to de­sign them so that they weren’t con­scious at all. Or I sup­pose al­ter­na­tively you could de­sign them so that they are just ex­tremely happy when­ever hu­man prefer­ences are satis­fied. So it’s kind of a win-win.

Stu­art Rus­sell: Yeah. If we un­der­stood enough about the me­chan­ics of their con­scious­ness, that’s a pos­si­bil­ity. But again, even that doesn’t seem right.

Robert Wiblin: Be­cause they lack au­ton­omy?

Stu­art Rus­sell: I mean, we wouldn’t want that fate for a hu­man be­ing. That we give them some happy drugs so that they’re happy be­ing our ser­vants for­ever and hav­ing no free­dom. You know, it’s sort of the North Korea model al­most. We find that pretty ob­jec­tion­able.

Ar­ti­cles, books and blog posts dis­cussed in the show

Stu­art’s work

Every­thing else

No comments.