My personal cruxes for working on AI safety

The fol­low­ing is a heav­ily ed­ited tran­script of a talk I gave for the Stan­ford Effec­tive Altru­ism club on 19 Jan 2020. I had tran­scribe it, and then Linchuan Zhang, Rob Bens­inger and I ed­ited it for style and clar­ity, and also to oc­ca­sion­ally have me say smarter things than I ac­tu­ally said. Linch and I both added a few notes through­out. Thanks also to Bill Zito, Ben We­in­stein-Raun, and Howie Lem­pel for com­ments.

I feel slightly weird about post­ing some­thing so long, but this is the nat­u­ral place to put it.

Over the last year my be­liefs about AI risk have shifted mod­er­ately; I ex­pect that in a year I’ll think that many of the things I said here were dumb. Also, very few of the ideas here are origi­nal to me.


After all those caveats, here’s the talk:


It’s great to be here. I used to hang out at Stan­ford a lot, fun fact. I moved to Amer­ica six years ago, and then in 2015, I came to Stan­ford EA ev­ery Sun­day, and there was, ob­vi­ously, a to­tally differ­ent crop of peo­ple there. It was re­ally fun. I think we were a lot less suc­cess­ful than the cur­rent Stan­ford EA iter­a­tion at at­tract­ing new peo­ple. We just liked hav­ing weird con­ver­sa­tions about weird stuff ev­ery week. It was re­ally fun, but it’s re­ally great to come back and see a Stan­ford EA which is shaped differ­ently.

To­day I’m go­ing to be talk­ing about the ar­gu­ment for work­ing on AI safety that com­pels me to work on AI safety, rather than the ar­gu­ment that should com­pel you or any­one else. I’m go­ing to try to spell out how the ar­gu­ments are ac­tu­ally shaped in my head. Lo­gis­ti­cally, we’re go­ing to try to talk for about an hour with a bunch of back and forth and you guys ar­gu­ing with me as we go. And at the end, I’m go­ing to do mis­cel­la­neous Q and A for ques­tions you might have.

And I’ll prob­a­bly make ev­ery­one stand up and sit down again be­cause it’s un­rea­son­able to sit in the same place for 90 min­utes.

Meta level thoughts

I want to first very briefly talk about some con­cepts I have that are about how you want to think about ques­tions like AI risk, be­fore we ac­tu­ally talk about AI risk.

Heuris­tic arguments

When I was a con­fused 15 year old brows­ing the in­ter­net around 10 years ago, I ran across ar­gu­ments about AI risk, and I thought they were pretty com­pel­ling. The ar­gu­ments went some­thing like, “Well, sure seems like if you had these pow­er­ful AI sys­tems, that would make the world be re­ally differ­ent. And we don’t know how to al­ign them, and it sure seems like al­most all goals they could have would lead them to kill ev­ery­one, so I guess some peo­ple should prob­a­bly re­search how to al­ign these things.” This ar­gu­ment was about as so­phis­ti­cated as my un­der­stand­ing went un­til a few years ago, when I was pretty in­volved with the AI safety com­mu­nity.

I in fact think this kind of ar­gu­ment leaves a lot of ques­tions unan­swered. It’s not the kind of ar­gu­ment that is solid enough that you’d want to use it for me­chan­i­cal en­g­ineer­ing and then build a car. It’s sug­ges­tive and heuris­tic, but it’s not try­ing to cross all the T’s and dot all the I’s. And it’s not even tel­ling you all the places where there’s a hole in that ar­gu­ment.

Ways heuris­tic ar­gu­ments are insufficient

The thing which I think is good to do some­times, is in­stead of just think­ing re­ally loosely and heuris­ti­cally, you should try to have end-to-end sto­ries of what you be­lieve about a par­tic­u­lar topic. And then if there are parts that you don’t have an­swers to, you should write them down ex­plic­itly with ques­tion marks. I guess I’m ba­si­cally ar­gu­ing to do that in­stead of just say­ing, “Oh, well, an AI would be dan­ger­ous here.” And if there’s all these other steps as well, then you should write them down, even if you’re just go­ing to have your jus­tifi­ca­tion be ques­tion marks.

So here’s an ob­jec­tion I had to the ar­gu­ment I gave be­fore. AI safety is just not im­por­tant if AI is 500 years away and whole-brain em­u­la­tion or nan­otech­nol­ogy is go­ing to hap­pen in 20 years. Ob­vi­ously, in that world, we should not be work­ing on AI safety. Similarly, if some other ex­is­ten­tial risk might hap­pen in 20 years, and AI is just definitely not go­ing to hap­pen in the next 100 years, we should just ob­vi­ously not work on AI safety. I think this is pretty clear once I point it out. But it wasn’t men­tioned at all in my ini­tial ar­gu­ment.

I think it’s good to some­times try to write down all of the steps that you have to make for the thing to ac­tu­ally work. Even if you’re then go­ing to say things like, “Well, I be­lieve this be­cause other EAs seem smart, and they seem to think this.” If you’re go­ing to do that any­way, you might as well try to write down where you’re do­ing it. So in that spirit, I’m go­ing to pre­sent some stuff.

- [Guest] There’s so many ex­is­ten­tial risks, like a nu­clear war could show up at any minute.

- Yes.

- [Guest] So like, is there some thresh­old for the prob­a­bil­ity of an ex­is­ten­tial risk? What’s your crite­ria for, among all the ex­is­ten­tial risks that ex­ist, which ones to fo­cus on?

- That’s a great ques­tion, and I’m go­ing to come back to it later.

- [Guest] Could you define a whole-brain em­u­la­tion for the EA noobs?

- Whole-brain em­u­la­tion is where you scan a hu­man brain and run it on a com­puter. This is al­most surely tech­ni­cally fea­si­ble; the hard­est part is scan­ning hu­man brains. There are a bunch of differ­ent ways you could try to do this. For ex­am­ple, you could imag­ine at­tach­ing a lit­tle ra­dio trans­mit­ter to all the neu­rons in a hu­man brain, and hav­ing them send out a lit­tle sig­nal ev­ery time that neu­ron fires, but the prob­lem with this is that if you do this, the hu­man brain will just catch fire. Be­cause if you just take the min­i­mal pos­si­ble en­ergy in a ra­dio trans­mit­ter, that would get the sig­nal out, and then you mul­ti­ply that by 100 billion neu­rons, you’re like, “Well, that sure is a brain that is on fire.” So you can’t cur­rently scan hu­man brains and run them. We’ll talk about this more later.

Thanks for the ques­tion. I guess I want to do a quick poll of how much back­ground peo­ple are com­ing into this with. Can you raise your hand if you’ve spent more than an hour of think­ing about AI risk be­fore, or hear­ing talks about AI risk be­fore?

Can you raise your hand if you know who Paul Chris­ti­ano is, or if that name is fa­mil­iar?Can you raise your hand if you knew what whole-brain em­u­la­tion was be­fore that ques­tion was asked?

Great. Can you raise your hand if you know what UDASSA is?

Great, won­der­ful.

I kind of wanted to ask a “see­ing how many peo­ple are ly­ing about things they know” ques­tion. I was con­sid­er­ing say­ing a com­pletely fake acronym, but I de­cided not to do that. I mean, it would have been an acronym for some­thing, and they would have been like, “Why is Buck ask­ing about that con­cept from the­o­ret­i­cal biol­ogy?”

Ways of listen­ing to a talk

All right, here’s an­other thing. Sup­pose you’re listen­ing to a talk from some­one whose job is think­ing about AI risk. Here are two ways you could ap­proach this. The first way is to learn to imi­tate my ut­ter­ances. You could think, “Well, I want to know what Buck would say in re­sponse to differ­ent ques­tions that peo­ple might ask him.”

And this is a very rea­son­able thing to do. I of­ten talk to some­one who’s smart. I of­ten go talk to Paul Chris­ti­ano, and I’m like, well, it’s just re­ally de­ci­sion-rele­vant to me to know what Paul thinks about all these top­ics. And even if I don’t know why he be­lieves these things, I want to know what he be­lieves.

Here’s the sec­ond way: You can take the things that I’m say­ing as scrap parts, and not try to un­der­stand what I over­all be­lieve about any­thing. You could just try to hear glim­mers of ar­gu­ments that I make, that feel in­di­vi­d­u­ally com­pel­ling to you, such that if you had thought of that ar­gu­ment, you’d be like, “Yeah, this is a pretty solid ar­gu­ment.” And then you can try and take those parts and in­te­grate them into your own be­liefs.

I’m not say­ing you should always do this one, but I am say­ing that at least some­times, your at­ti­tude when some­one’s talk­ing should be, “This guy’s say­ing some things. Prob­a­bly he made up half of them to con­fuse me, and prob­a­bly he’s an idiot, but I’m just go­ing to listen to them, and if any of them are good, I’m go­ing to try and in­cor­po­rate them. But I’m go­ing to as­sess them all in­di­vi­d­u­ally.”

Okay, that’s the meta points. Ready for some cruxes on AI risk?

- [Guest] Just one clar­ifi­ca­tion. So, does that mean then that the, in your be­lief, the whole-brain em­u­la­tion is go­ing to hap­pen in 20 years?

- Sorry, what? I think whole-brain em­u­la­tion is not go­ing to hap­pen in 20 years.

- [Guest] Okay, so the num­bers you threw out were just purely hy­po­thet­i­cal?

- Oh, yes, sorry, yes. I do in fact work on AI safety. But if I had these other be­liefs, which I’m go­ing to ex­plain, then I would not work on AI safety. If I thought whole-brain em­u­la­tion were com­ing sooner than AI, I would de-pri­ori­tize AI safety work.

- [Guest] Okay.


Some­thing that would be great is that when I say things, you can write down things that feel un­com­pel­ling or con­fus­ing to you about the ar­gu­ments. I think that’s very healthy to do. A lot of the time, the way I’m go­ing to talk is that I’m go­ing to say some­thing, and then I’m go­ing to say the parts of it that I think are un­com­pel­ling. Like, the parts of the ar­gu­ment that I pre­sent that I think are wrong. And I think it’s pretty healthy to listen out and try and see what parts you think are wrong. And then I’ll ask you for yours.

Crux 1: AGI would be a big deal if it showed up here

Okay, AGI would be a big deal if it showed up here. So I’m go­ing to say what I mean by this, and then I’m go­ing to give a few clar­ifi­ca­tions and a few ob­jec­tions to this that I have.

This part feels pretty clear. In­tel­li­gence seems re­ally im­por­tant. Imag­ine hav­ing a com­puter that was very in­tel­li­gent; it seems like this would make the world look sud­denly very differ­ent. In par­tic­u­lar, one ma­jor way that the world might be very differ­ent is: the world is cur­rently very op­ti­mized by hu­mans for things that hu­mans want, and if I made some sys­tem, maybe it would be try­ing to make the world be a differ­ent way. And then maybe the world would be that very differ­ent way in­stead.

So I guess un­der this point, I want to say, “Well, if I could just have a com­puter do smart stuff, that’s go­ing to make a big differ­ence to what the world is like, and that could be re­ally good, or re­ally bad.”

There’s at least one ma­jor caveat to this, which I think is re­quired for this to be true. I’m cu­ri­ous to hear a cou­ple of peo­ple’s con­fu­sion, or ob­jec­tions to this claim, and then I’ll say the one that I think is most im­por­tant, if none of you say it quickly enough.

- [Guest] What do you mean by “showed up here”? Be­cause, to my mind, “AGI” ac­tu­ally means gen­eral in­tel­li­gence, mean­ing that it can ac­com­plish any task that a hu­man can, or it can even go be­yond that. So what do you mean by “showed up here”?

- Yeah, so by “here”, I guess I’m try­ing to cut away wor­lds that are very differ­ent from this one. So for in­stance, I think that if I just said, “AGI would be a big deal if it showed up”, then I think this would be wrong. Be­cause I think there are wor­lds were AGI would not be a big deal as much. For in­stance, what if we already have whole-brain em­u­la­tion? I think in that world, AGI is a much smaller deal. So I’m try­ing to say that in wor­lds that don’t look rad­i­cally differ­ent from this one, AGI is a big deal.

- [Guest] So you’re say­ing “if the world is iden­ti­cal, ex­cept for AGI”?

- That’s a good way of putting it. If the world looks like this, kind of. Or if the world looks like what I, Buck, ex­pect it to look in 10 years. And then we get AGI ⁠— that would be a re­ally differ­ent world.

Any other ob­jec­tions? I’ve got a big one.

- [Guest] I’m a bit con­fused about how agency, and in­tel­li­gence and con­scious­ness re­late, and how an in­tel­li­gence would have prefer­ences or ways it would want the world to be. Or, like, how broad this in­tel­li­gence should be.

- Yeah!

I’m go­ing to write down peo­ple’s notes as I go, some­times, ir­reg­u­larly, not cor­re­lated with whether I think they’re good points or not.

- [Guest] Do you have defi­ni­tions of “AGI” and “big deal”?

- “AGI”: a thing that can do all the kind of smart stuff that hu­mans do. By “big deal”, I mean it ba­si­cally is dumb to try to make plans that have phases which are con­crete, and hap­pen af­ter the AGI. So, by anal­ogy, al­most all of my plans would seem like stupid plans if I knew that there was go­ing to be a ma­jor alien in­va­sion in a year. All of my plans that are, like, 5-year-time-scale plans are bad plans in the alien in­va­sion world. That’s what I mean by “big deal”.

[Post-talk note: Holden Karnofsky gives a re­lated defi­ni­tion here: he defines “trans­for­ma­tive AI” as “AI that pre­cip­i­tates a tran­si­tion com­pa­rable to (or more sig­nifi­cant than) the agri­cul­tural or in­dus­trial rev­olu­tion”.]

- [Guest] I think one ob­jec­tion could be that if AGI were de­vel­oped, we would be un­able to get it to co­op­er­ate with us to do any­thing good, and it may have no in­ter­est in do­ing any­thing bad, in which case, it would not be a big deal.

- Yep, that makes sense. I per­son­ally don’t think that’s very likely, but that would be a way this could be wrong.

The main ob­jec­tion I have is that I didn’t men­tion what the price of the AGI is. For in­stance, I think a re­ally im­por­tant ques­tion is “How much does it cost you to run your AGI for long enough for it to do the same in­tel­lec­tual la­bor that a hu­man could do in an hour?” For in­stance, if it costs $1 mil­lion an hour: al­most no hu­man gets paid $1 mil­lion an hour for their brains. In fact, I think ba­si­cally no hu­man gets paid that much. I think the most money that a hu­man ever makes in a year is a cou­ple billion dol­lars. And there’s ap­prox­i­mately 2,000 work­ing hours a year, which means that you’re mak­ing $500,000 an hour. So max hu­man wage is maybe $500,000 per hour. I would love it if some­one checks the math on this.

[Linch adds: 500K * 2000 = 1 billion. I as­sume “cou­ple billion” is more than one. San­ity check: Be­zos has ~100 billion ac­cu­mu­lated in ~20 years, so 5B/​year; though un­clear how much of Jeff Be­zos’ money is paid for his brain vs. other things like hav­ing cap­i­tal/​so­cial cap­i­tal. Also un­clear how much Be­zos should be val­ued at ex ante.]

So, a fun ex­er­cise that you can do is you can imag­ine that we have a ma­chine that can do all the in­tel­lec­tual la­bor that a hu­man can do, at some price, and then we just ask how the world looks differ­ent in that world. So for in­stance, in the world where that price is $500,000 an hour, that just does not change the world very much. Another one is: let’s as­sume that this is an AGI that’s as smart as the av­er­age hu­man. I think ba­si­cally no one wants to pay $500,000 an hour to an av­er­age hu­man. I think that at $100 an hour, that’s the price of a rea­son­ably well-trained knowl­edge worker in a first-world coun­try, ish. And so I think at that price, $100 an hour, life gets pretty in­ter­est­ing. And at the price of $10 an hour, it’s re­ally, re­ally wild. I think at the price of $1 an hour, it’s just ab­surd.

Fun fact: if you look at the com­pu­ta­tion that a hu­man brain does, and you say, “How much would it cost me to buy some servers on AWS that run this much?”, the price is some­thing like $6 an hour, ac­cord­ing to one es­ti­mate by peo­ple I trust. (I don’t think there’s a pub­lic cita­tion available for this num­ber, see here for a few other rele­vant es­ti­mates.) You es­ti­mate the amount of use­ful com­pu­ta­tional work done by the brain, us­ing ar­gu­ments about the amount of noise in var­i­ous brain com­po­nents to ar­gue that the brain can’t pos­si­bly be rely­ing on more than three dec­i­mal places of ac­cu­racy of how hard a synapse is firing, or some­thing like that, and then you look at how ex­pen­sive it is to buy that much com­put­ing power. This is very much an un­cer­tain me­dian guess rather than a bound, and I think it is also some­what lower than the likely price of run­ning a whole brain em­u­la­tion (for that, see “Whole Brain Emu­la­tion: A Roadmap”).

But yeah, $6 an hour. So the rea­son that we don’t have AGI is not that we could make AGI as pow­er­ful as the brain, and we just don’t be­cause it’s too ex­pen­sive.

- [Guest] I’m just won­der­ing, what’s some ev­i­dence that can make us ex­pect that AGI will be su­per ex­pen­sive?

- Well, I don’t know. I’m not par­tic­u­larly claiming that it will be par­tic­u­larly ex­pen­sive to run. One thing that I am com­fortable claiming is if some­thing is ex­tremely valuable, the first time that it hap­pens, it’s usu­ally about that ex­pen­sive, mean­ing you don’t make much of a profit. There’s some kind of eco­nomic effi­ciency ar­gu­ment that if you can make $1 mil­lion from do­ing some­thing, and the price is steadily fal­ling, peo­ple will prob­a­bly first do it at the time when the price is about $1 mil­lion. And so an in­ter­est­ing ques­tion is: if I imag­ine in ev­ery year, peo­ple are be­ing rea­son­able, then how much is the world differ­ent in the year when AGI costs you $2,500 an hour to run ver­sus, like, $10 an hour to run?Another fun ex­er­cise, which I think is pretty good, is you can look at Moore’s Law or some­thing and say, “Well, let’s just as­sume the price of a tran­sis­tor costs some­thing like this. It falls by a fac­tor of two ev­ery 18 months. Let’s sup­pose that one year it costs $10,000 an hour to run this thing, and then it halves ev­ery 18 months.” And you look at how the world changes over time, and it’s kind of an in­ter­est­ing ex­er­cise.

Other thoughts or ob­jec­tions?

- [Guest] Even if it’s more ex­pen­sive, if it’s ridicu­lously faster than a hu­man brain, it could still be valuable.

- Yeah. So for in­stance, I know peo­ple who make a lot of money be­ing traders. Th­ese peo­ple are prob­a­bly mostly three stan­dard de­vi­a­tions above av­er­age for a hu­man. Some of these hu­mans get paid thou­sands of dol­lars an hour, and also if you can just scale how fast they run, lin­early in price, it would be worth it to run them many times faster. This is per hour of hu­man la­bor, but pos­si­bly, you can get it faster in se­rial time. Like, an­other thing you prob­a­bly want to do with them is have a bunch of un­manned sub­marines, where it’s a lot less bad if your AI gets de­stroyed by a mis­sile or some­thing. Okay, any other thoughts?

- [Guest] So, yes, it wouldn’t nec­es­sar­ily be log­i­cal to run AGI if it was very ex­pen­sive, but I still think peo­ple would do it, given that you have tech­nol­ogy like quan­tum com­put­ers, which right now can’t do any­thing that a nor­mal com­puter can’t do, and yet we pour mil­lions and billions of dol­lars into build­ing them and run­ning them, and run all kinds of things on them.

- I mean, I think we don’t pour billions of dol­lars. Tell me if I’m wrong, please. But I would have thought that we spend a cou­ple tens of mil­lions of dol­lars a year, and some of that is be­cause Google is kind of stupid about this, and some of it is be­cause the NSF funds dumb stuff. I could just be com­pletely wrong.

- [Guest] Why is Google stupid about this?

- As in like, so­ciolog­i­cally, what’s wrong with them?

- [Guest] Yeah.

- I don’t know. What­ever, I think quan­tum com­put­ing is stupid. Like, con­tro­ver­sial opinion.

- [Guest] There was a bill to in­ject $1.2 billion into quan­tum.

- Into quan­tum.

- [Guest] I think I read it on Giz­modo. I re­mem­ber when this hap­pened. The U.S. gov­ern­ment or some­one — I don’t know, Europe? — some­one put a ton of money, like a billion dol­lars, into quan­tum re­search grants.

- Okay, but quan­tum… sorry, I’m not dis­agree­ing with you, I’m just dis­agree­ing with the world or some­thing. Chem­istry is just quan­tum me­chan­ics of elec­trons. Maybe they just like that. I’d be cu­ri­ous if you could tell us. My guess is that we don’t pour billions of dol­lars. The world econ­omy is like $80 trillion a year, right? The U.S. econ­omy’s like $20 trillion a year.

- [Guest] Trump did in fact sign a $1.2 billion quan­tum com­put­ing bill.

- Well, that’s stupid.

- [Guest] Ap­par­ently, this is be­cause we don’t want to fall be­hind in the race with China.

- Well, that’s also stupid.

- [Guest] But I can see some­thing similar hap­pen­ing with AGI.

- Yeah, so one thing is, it’s not that dan­ger­ous if it costs a squillion billion dol­lars to run, be­cause you just can’t run it for long enough for any­thing bad to hap­pen. So, I agree with your points. I think I’m go­ing to move for­ward slightly af­ter tak­ing one last com­ment.

- [Guest] Do you have any ex­am­ples of tech­nolo­gies that weren’t a big deal, purely be­cause of the cost?

- I mean, kind of ev­ery­thing is just a cost prob­lem, right?

[Linch notes: We figured out alchemy in the early 1900s.]

- [Guest] Com­put­ers, at the be­gin­ning. Com­put­ers were so ex­pen­sive that no one could af­ford them, ex­cept for like NASA.

- [Guest] Right, but over time, the cost de­creased, so are you say­ing that...? Yeah, I’m just won­der­ing, with AGI, it’s like, rea­son­able to think maybe the ini­tial ver­sion is very ex­pen­sive, but then work will be put into it and it’ll be less ex­pen­sive. Is there any rea­son to be­lieve that trend wouldn’t hap­pen for AGI?

- Not that I know of. My guess is that the world looks one of two ways. One is that ei­ther you have some­thing like the cost of hu­man in­tel­lec­tual la­bor folds by a fac­tor of ten for a cou­ple years, start­ing at way too ex­pen­sive and end­ing at dirt cheap. Or it hap­pens even faster. I would be very sur­prised if it’s per­ma­nently too ex­pen­sive to run AGI. Or, I’d be very, very, very sur­prised if we can train an AGI, but we never get the cost be­low $1 mil­lion.

And this isn’t even be­cause of the $6 an hour num­ber. Like, I don’t know man, brains are prob­a­bly not perfect. It would just be amaz­ing if evolu­tion figured out a way to do it that’s like a squillion times cheaper, but we still figure out a way to do it. Like, it just seems to me that the cost is prob­a­bly go­ing to mostly be in the train­ing. My guess is that it costs a lot more to train your AGI than to run it. And in the world where you have to spend $500,000 an hour to run your thing, you prob­a­bly had to spend fifty gazillion dol­lars to train it. And that would be the place where I ex­pect it to fail.

You can write down your other ob­jec­tions, and then we can talk about them later.

Crux 2: AGI is plau­si­bly soon­ish, and the next big deal

All right, here’s my next crux. AGI is plau­si­bly soon-ish, as in, less than 50 years, and the next big deal. Okay, so in this crux I want to ar­gue that AGI might hap­pen rel­a­tively soon, and also, it might hap­pen be­fore one of the other crazy things hap­pen that would mean we should only fo­cus on that thing in­stead.

So a cou­ple of things that peo­ple have already men­tioned, or that I men­tioned, as po­ten­tially crazy things that would change the world. There’s whole-brain em­u­la­tion. Can other peo­ple name some other things that would make the world rad­i­cally differ­ent if they hap­pened?

- [Guest] Very wide­spread ge­netic en­g­ineer­ing.

- Yeah, that seems right. By the way, the defi­ni­tion of “big deal” that I want you guys to use is “you ba­si­cally should not make spe­cific con­crete plans which have steps that hap­pen af­ter that thing hap­pens”. I in fact think that wide­spread and wildly pow­er­ful ge­netic en­g­ineer­ing of hu­mans is one, such that you should not have plans that go af­ter when the wide­spread ge­netic en­g­ineer­ing hap­pens, or you shouldn’t have spe­cific plans.

- [Guest] Nu­clear war.

- Yeah, nu­clear war. Maybe other global catas­trophic risks. So any­thing which looks like it might just re­ally screw up what the world looks like. Any­thing which might kill a billion peo­ple. If some­thing’s go­ing to kill a billion peo­ple, it seems plau­si­ble that that’s re­ally im­por­tant and you should work on that in­stead. It’s not like a to­tal slam dunk that you should work on that in­stead, but it seems plau­si­ble at least. Yeah, can I get some more?

- [Guest] What about nu­clear fu­sion? I read an ar­ti­cle say­ing that if any gov­ern­ment could get that kind of tech­nol­ogy, it could po­ten­tially trig­ger a war, just be­cause it breaks the bal­ance of power that is cur­rently in place in in­ter­na­tional poli­tics.

- Yeah, I can imag­ine some­thing like that hap­pen­ing, maybe. I want to put that some­what un­der other x-risks, or nu­clear war. Another kind of thing that feels like is an ex­am­ple of desta­bi­liza­tion of power. But desta­bi­liza­tion of var­i­ous types mostly is a thing be­cause it leads to x-risk.

- [Guest] Do you con­sider P = NP to be such for that?

- Depends on how good the al­gorithm is. [Linch: The proof might also not be con­struc­tive.]

- [Guest] Yeah, it de­pends. In pub­lic key cryp­tog­ra­phy, there’s...

- I don’t re­ally care about pub­lic key cryp­tog­ra­phy break­ing… If P = NP, and there’s just like a lin­ear time al­gorithm for like… If you can solve SAT prob­lems of lin­ear size and lin­ear time, apolo­gies for the jar­gon, I think that’s just like pretty close to AGI. Or that’s just like — if you have that tech­nol­ogy, you can just solve any ma­chine learn­ing prob­lem you want, by say­ing, “Hey, can you tell me the pro­gram which does the best on this par­tic­u­lar score?” And that’s just a SAT prob­lem. I think that it is very un­likely that there’s just like a re­ally fast, lin­ear time, SAT solv­ing al­gorithm. Yeah, that’s an in­ter­est­ing one. Any oth­ers?

- [Guest] Like a plague, or a famine. Or like, ter­rible effects of cli­mate change, or like a su­per vol­cano.

- Okay.

Nat­u­ral x-risks, things that would kill ev­ery­one, em­piri­cally don’t hap­pen that of­ten. You can look at the earth, and you can be like, “How of­ten have things hap­pened that would have kil­led ev­ery­one if they hap­pened now?” And the an­swer’s like, a cou­ple times. Nat­u­ral dis­asters which would qual­ify as GCRs but not x-risks are prob­a­bly also rare enough that I am not that wor­ried about them. So I think it’s most likely that catas­trophic dis­asters that hap­pen soon will be a re­sult of tech­nolo­gies which were ei­ther in­vented rel­a­tively re­cently (eg nukes) or haven’t been de­vel­oped yet.

In the case of cli­mate change, we can’t use that ar­gu­ment, be­cause cli­mate change is an­thro­pogenic; how­ever, my sense is that ex­perts think that cli­mate change is quite un­likely to cause enough dam­age to be con­sid­ered a GCR.

Another one I want to in­clude is sketchy dystopias. We have never had an evil em­pire which has im­mor­tal god em­per­ors, and perfect surveillance, and mind read­ing and lie de­tec­tion. There’s no par­tic­u­lar tech­ni­cal rea­son why you can’t have all these things. They might all be a lot eas­ier than AGI. I don’t know, this seems like an­other one.

If I had to rank these in how likely they seem to break this claim, I’d rank them from most to least likely as:

  • Var­i­ous biose­cu­rity risks

  • Stable dystopias, nu­clear war or ma­jor power war, whole brain emulation

  • Cli­mate change

  • Su­per vol­canos, asteroids

I want to say why I think AI risk is more likely than these things. Or get­ting AGI is more likely ear­lier.

But be­fore I say that, you see how I wrote less than 50 years here? Even if I thought the world in 100 years was go­ing to just be like the world like it is now, ex­cept with mildly bet­ter iPhones — maybe mildly worse iPhones, I don’t know, it’s not clear what the di­rec­tion the trend is… I don’t know. Affect­ing the world in 100 years seems re­ally hard.

And it seems to me that the sto­ries that I have for how my work ends up mak­ing a differ­ence to the world, most of those are just look re­ally un­likely to work if AGI is more than 50 years off. It’s re­ally hard to do re­search that im­pacts the world pos­i­tively more than 50 years down the road. It’s par­tic­u­larly hard to do re­search that im­pacts a sin­gle event that hap­pens 50 years in the fu­ture, pos­i­tively. I just don’t think I can very likely do that. And if I learned that there was just no way we were go­ing to have AGI in the next 50 years, I would then think, “Well, I should prob­a­bly re­ally re­think my life plans.”

AI timelines

Okay, so here’s a fun ques­tion. When are we go­ing to get AGI? Here’s some ways of think­ing about it.

One of them is Laplace’s Law of Suc­ces­sion. This one is: there is some ran­dom vari­able. It turns out that ev­ery year that peo­ple try to build an AGI, God draws a ball from an urn. And we see if it’s white or black. And if it’s white, he gives us an AGI. And if it’s black, he doesn’t give us an AGI. And we don’t know what pro­por­tion of balls in the urn are black. So we’re go­ing to treat that as a ran­dom pa­ram­e­ter be­tween zero and one.

Now, the first year, your prior on this pa­ram­e­ter theta, which is the pro­por­tion of years that God gives you an AGI — the first year, you have a uniform prior. The sec­ond year, you’re like, “Well, it sure seems like God doesn’t give us an AGI ev­ery year, be­cause he didn’t give us one last year.” And I end up with a pos­te­rior where you’ve up­dated to­tally against the “AGI ev­ery year” hy­poth­e­sis, and not at all against the “AGI never” hy­poth­e­sis. And the next year, when you don’t get an AGI you up­date against, and against, and against.

So this is one way to de­rive Laplace’s Law of Suc­ces­sion. And if you use Laplace’s Law of Suc­ces­sion, then it means that af­ter 60 years of try­ing to build an AGI, there is now a 1 in 62 chance that you get an AGI next year. So you can say, “Okay. Let’s just use Laplace’s Law of Suc­ces­sion to es­ti­mate time un­til AGI.” And this sug­gests that the prob­a­bil­ity of AGI in the next 50 years is around 40%. This is not the best ar­gu­ment in the world, but if you’re just try­ing to make ar­gu­ments that are at least kind of vaguely con­nected to things, then Laplace’s Law of Suc­ces­sion says 40%.

- [Guest] What’s your thresh­old for even in­clud­ing such an ar­gu­ment in your over­all thought pro­cess? I’m guess­ing there are a lot of ar­gu­ments at that level of… I don’t know.

- I think there are fewer than 10 ar­gu­ments that are that sim­ple and that good.

- [Guest] This re­ally de­pends on the size of the step you chose. You chose “one year” ar­bi­trar­ily. It could have been one sec­ond ⁠— God draws a ball a sec­ond.

- No, that’s not it. There’s a limit, be­cause in that case, if I choose my shorter time steps, then it’s less likely that God draws me a ball in the next time step. But I also get to check more time steps over the next year.

- [Guest] I see.

- [Guest 2] “Pois­son pro­cess” is the word you’re look­ing for, I think.

- Yes, this is a Pois­son pro­cess.

- [Guest] How is this ar­gu­ment differ­ent for any­thing else, re­ally? Is the in­put pa­ram­e­ter..

So you might say, what does this say about the risk of us sum­mon­ing a de­mon next year? I’m go­ing to say, “Well, we’ve been try­ing to sum­mon demons for a long, long while. — Like 5,000 years.” I don’t know… I agree.

Here’s an­other way you can do the Laplace’s Law of Suc­ces­sion ar­gu­ment. I gave the pre­vi­ous ar­gu­ment based on years of re­search since 1960, be­cause that’s when the first con­fer­ence on AI was. You could also do it on re­searcher years. As in: God draws from the urn ev­ery time a re­searcher finishes their year of think­ing about AI. And in this model, I think that you get a 50% chance in 10 years or some­thing in­sane like that, maybe less. Be­cause there are so many more re­searchers now than there used to be. So I think this one gives you ⁠— I’m go­ing to say the me­di­ans ⁠— this one gives you around 60 years, which just like, Laplace’s Law of Suc­ces­sion always says you should wait as long as it’s been so far. On re­searcher years, you get like 10 years or less.

All right, here are some other mod­els you can use. I’m just go­ing to name some quickly. One thing you can do is, you can ask, “Look, how big is a hu­man brain? Now, let’s pre­tend AGI will be a neu­ral net. How much com­pute is re­quired to train a policy that is that big? When will we have that amount of com­pute?” And you can do these kind of things. Another ap­proach is, “How big is the hu­man genome? How long does it take to train a policy that big?” What­ever, you do a lot of shit like this.

Hon­estly, the ar­gu­ment that’s com­pel­ling to me right now is the fol­low­ing. Maybe to build an AGI, you need to have pretty good ma­chine learn­ing, in the kind of way that you have to­day. Like, you have to have ma­chine learn­ing that’s good enough to learn pretty com­plex pat­terns, and then you have to have a bunch of smart peo­ple who from when they were 18, de­cided they were go­ing to try and do re­ally cool ma­chine learn­ing re­search in col­lege. And then the smart peo­ple de­cide they’re go­ing to try and build AGIs. And if this is the thing that you think is the im­por­tant in­put to the AGI cre­ation pro­cess, then I think that you no­tice the amount of smart 18 year olds who de­cided they wanted to go into AGI is way higher than it used to be. It’s prob­a­bly 10 times higher than it was 10 years ago.

And if you have Laplace’s Law of Suc­ces­sion over how many smart 18 year olds who turn into re­searchers are re­quired be­fore you get the AGI, then that also gives you pretty rea­son­able prob­a­bil­ities of AGI pretty soon. It ends up with me hav­ing… to­day, I’m feel­ing ~70% con­fi­dent of AGI in the next 50 years.

Why do I think it’s more likely than one of these other things? Ba­si­cally, be­cause it seems like it’s pretty soon.

It seems like whole-brain em­u­la­tion isn’t go­ing to hap­pen that soon. Ge­netic en­g­ineer­ing, I don’t know, and I don’t want to talk about it right now. Bio risk ⁠— there are a lot of peo­ple whose job is mak­ing re­ally pow­er­ful smart ML sys­tems. There are not very many peo­ple whose job is try­ing to figure out how to kill ev­ery­one us­ing bioweapons. This just feels like the main ar­gu­ment for why AI is more ur­gent; it’s just re­ally hard for me to imag­ine a world where peo­ple don’t try to build re­ally smart ML sys­tems. It’s not that hard for me to imag­ine a world where no very smart per­son ever ded­i­cates their life to try­ing re­ally hard to figure out how to kill ev­ery­one us­ing syn­thetic biol­ogy. Like, there aren’t that many re­ally smart peo­ple who want to kill ev­ery­one.

- [Guest] Why aren’t you wor­ried about nu­clear war? Like, peo­ple kil­ling the U.S. and hav­ing nu­clear war and a bunch of places where there are AI re­searchers, and then it just slows it down for awhile. Why think this is not that con­cern­ing?

- Ah, seems rea­son­ably un­likely to hap­pen. Laplace’s Law of Suc­ces­sion. We’ve had nu­clear weapons for 80 years. (laughs)

Okay, you were like, “Why are you us­ing this Laplace’s Law of Suc­ces­sion ar­gu­ment?” And I’m like, look. When you’re an idiot, if you have Laplace’s Law of Suc­ces­sion ar­gu­ments, you’re at least limit­ing how much of an idiot you can be. I think there are just re­ally bad pre­dic­tors out there. There are peo­ple who are just like, “I think we’ll get into a nu­clear war with China in the next three years, with a 50% prob­a­bil­ity.” And the thing is, I think that it ac­tu­ally is pretty healthy to be like, “Laplace’s Law of Suc­ces­sion. Is your cur­rent situ­a­tion re­ally all that differ­ent from all the other three-year pe­ri­ods since we’ve had nu­clear weapons?”

[Linch notes: An­throp­ics seems like a non­triv­ial con­cern, es­pe­cially if we’re con­di­tion­ing on ob­server mo­ments (or “smart ob­server mo­ments”) rather than liter­ally “years at least one hu­man is al­ive”.]

- [Guest] Strictly, it places no limit on how much of an idiot you can be. Be­cause you can mod­ify your prior to get any pos­te­rior, us­ing Laplace’s Law of Suc­ces­sion, if you’re care­ful. Ba­si­cally. So, if you can jus­tify us­ing a uniform prior, then maybe it limits how much of an idiot you can be, but I don’t think that if a uniform prior yields idiocy, then, I’m not sure it does place a limit.

- For some rea­son, I feel like peo­ple who do this end up be­ing less an idiot, em­piri­cally.

- [Guest] Okay, that’s fine.

- All right, we’re go­ing to stand up, and jump up and down five times. And then we’re go­ing to sit down again and we’re go­ing to hear some more of this.

Crux 3: You can do good by think­ing ahead on AGI

Okay, num­ber three. You can do good by think­ing ahead on AGI. Can one do good by think­ing ahead on par­tic­u­lar tech­ni­cal prob­lems? The spe­cific ver­sion of this is that the kind of AI safety re­search that I do is pred­i­cated on the as­sump­tion that there are tech­ni­cal ques­tions which we can ask now such that if we an­swer them now, AI will then go bet­ter.

I think this is ac­tu­ally kind of sketchy as a claim and I think that I don’t see peo­ple push back on it quite enough and that meant that I was very happy about the peo­ple to­day who I talked to who pushed back on it, so bonus points to them.

So here’s two ar­gu­ments that we can’t make progress now.

Prob­lems solve themselves

One is in gen­eral, prob­lems solve them­selves

Imag­ine if I said to you: “One day hu­mans are go­ing to try and take hu­mans to Mars. And it turns out that most de­signs of a space­ship to Mars don’t have enough food on them for hu­mans to not starve over the course of their three-month-long trip to Mars. We need to work on this prob­lem. We need to work on the prob­lem of mak­ing sure that when peo­ple build space­ships to Mars they have enough food in them for the peo­ple who are in the space­ships.”

I think this is a stupid ar­gu­ment. Be­cause peo­ple are just not go­ing to fuck this one up. I would just be very sur­prised if all these peo­ple got on their space­ship and then they re­al­ized af­ter a week oh geez, we for­got to pack enough food. Be­cause peo­ple don’t want to die of star­va­tion on a space­ship and peo­ple would pre­fer to buy things that aren’t go­ing to kill them. And I think this is ac­tu­ally a re­ally good de­fault ar­gu­ment.

Another one is: “Most peo­ple have cars. It would be a tremen­dous dis­aster if ev­ery­one bought cars which had guns in the steer­ing wheels such that if you turn on the ac­cel­er­a­tor, they shoot you in the face. That could kill billions of peo­ple.” And I’m like, yep. But peo­ple are not go­ing to buy those cars be­cause they don’t want to get shot in the face. So I think that if you want to ar­gue for AI safety be­ing im­por­tant you have to ar­gue for a dis­anal­ogy be­tween those two ex­am­ples and the AI safety case.

Think­ing ahead is real hard

The other one is: think­ing ahead is real hard. I don’t ac­tu­ally know of any ex­am­ples ever where some­one said, “It will be good if we solve this tech­ni­cal prob­lem, be­cause of this prob­lem which is go­ing to come up in 20 years.” I guess the only one I know of is those god­damn quan­tum com­put­ers again, where peo­ple de­cided to start com­ing up with quan­tum-re­sis­tant se­cu­rity ages ago, such that as soon as we get pow­er­ful quan­tum com­put­ers, even though they can break your RSA, you just use one of these other things. But I don’t think they did this be­cause they thought it was helpful. I think they did it be­cause they’re crypto nerds who like solv­ing ran­dom the­o­ret­i­cal prob­lems. So I can’t name an ex­am­ple of any­one think­ing ahead about a tech­ni­cal prob­lem in a use­ful way.

- [Stu­dent] But even there, there’s a some­what more pre­cise defi­ni­tion of what a quan­tum com­puter even is. It’s not clear to me that there’s any­thing close for what AGI is go­ing to look like. So even that ex­am­ple strikes me as weird.

- You’re say­ing it’s eas­ier for them to solve their prob­lem than it would be for us to do use­ful work on AI?

- At least there’s some defi­ni­tion. I ac­tu­ally don’t know what’s go­ing on in their field at all. But I don’t know that there’s any defi­ni­tion of what AGI will look like.

- Yeah. I’m tak­ing that as an ar­gu­ment for why even that situ­a­tion is an eas­ier case for think­ing ahead than the AI safety case.

- Yeah, yeah, like here, what kind of as­sump­tion are we very sure about? And I think in our pre­vi­ous con­ver­sa­tion you were say­ing the fact that some ob­jec­tive is go­ing to be op­ti­mized or some­thing.

Ar­gu­ments for think­ing ahead

Okay, so I want to ar­gue for the claim that it’s not to­tally crazy to think about the AI al­ign­ment prob­lem right now.

So here are some ar­gu­ments I want to make, about why I think we can maybe do good stuff now.

By the way, an­other phras­ing of this is, if you could trade one year of safety re­search now for x years of safety re­search the year that AGI is de­vel­oped or five years be­fore AGI is de­vel­oped, what is the value of x at which you’re in­differ­ent? And I think that this is just a ques­tion that you can ask peo­ple. And I think a lot of AI safety re­searchers think that the re­search that is done the year of build­ing the AGI is just five times or 10 times more im­por­tant. And I’m go­ing to provide some ar­gu­ments for why think­ing ahead ac­tu­ally might be helpful.


One is re­lax­ations of the prob­lem. By “re­lax­ation”, I mean you take some prob­lem and in­stead of try­ing to solve it, you try to solve a differ­ent, eas­ier prob­lem.

Here’s what I mean by this: There are a va­ri­ety of ques­tions whose an­swer I don’t know, which seem like eas­ier ver­sions of the AI safety prob­lem.

Here’s an ex­am­ple. Sup­pose some­one gave me an in­finitely fast com­puter on a USB drive and I want to do good in the world us­ing my in­finitely fast com­puter on a USB drive. How would I do this? I think this has many fea­tures in com­mon with AI safety prob­lem, but it’s just strictly eas­ier be­cause all I’m try­ing to do is to figure out how to use this in­cred­ibly smart, pow­er­ful thing that can do lots of stuff, and any thing which you can do with ma­chine learn­ing you can also do with this thing. You can ei­ther just run your nor­mal ma­chine learn­ing al­gorithms or you can do this crazy op­ti­miz­ing over pa­ram­e­ter space for what­ever ar­chi­tec­ture you like, or op­ti­miz­ing over all pro­grams for some­thing.

This is just eas­ier than ma­chine learn­ing, but I still don’t know how to use this to make a drug that helps with a par­tic­u­lar dis­ease. I’m not even quite sure how to use this safely to make a mil­lion dol­lars on the stock mar­ket, though I am rel­a­tively op­ti­mistic I’d be able to figure that one out. There’s a bunch of con­sid­er­a­tions.

If I had one of these in­finitely fast com­put­ers, I don’t think I know how to do safe, use­ful things with it. If we don’t know how to an­swer this ques­tion now, then no mat­ter how easy it is to al­ign ML sys­tems, it’s never go­ing to get eas­ier than this ques­tion. And there­fore, maybe I should con­sider try­ing to solve this now.

Be­cause if I can solve this now, maybe I can ap­ply that solu­tion par­tially to the ML thing. And if I can’t solve this now, then that’s re­ally good to know, be­cause it means that I’m go­ing to be pretty screwed when the ML thing comes along.

Another re­lax­ation you can do is you can pre­tend you have an amaz­ing func­tion ap­prox­i­ma­tor, where by “func­tion ap­prox­i­ma­tor” I just mean an ideal­ized neu­ral net. If you have a bunch of la­beled train­ing data, you can put it in your mag­i­cal func­tion ap­prox­i­ma­tor and it’ll be a re­ally good func­tion ap­prox­i­ma­tor on this. Or if you want to do re­in­force­ment learn­ing, you can do this and it’ll be great. I think that we don’t know how to do safe, al­igned things us­ing an amaz­ing func­tion ap­prox­i­ma­tor, and I think that ma­chine learn­ing is just strictly more an­noy­ing to al­ign than this. So that’s the kind of work that I think we can do now, and I think that the work that we do on that might ei­ther just be ap­pli­ca­ble or it might share some prob­lems in com­mon with the ac­tual AI al­ign­ment prob­lem. Thoughts, ques­tions, ob­jec­tions?

- [Stu­dent] For the halt­ing Or­a­cle thing, are we as­sum­ing away the “what if us­ing it for any­thing is in­her­ently un­safe for spooky uni­ver­sal prior rea­sons” thing?

- That’s a re­ally great ques­tion. I think that you are not al­lowed to as­sume away the spooky uni­ver­sal prior prob­lems.

- [Stu­dent 2] So what was the ques­tion? I didn’t un­der­stand the mean­ing of the ques­tion.

- The ques­tion is… all right, there’s some crazy shit about the uni­ver­sal prior. It’s a re­ally long story. But ba­si­cally if you try to use the Solomonoff prior, it’s… sorry, nev­er­mind. Ask me later. It was a tech­ni­cal­ity. Other ques­tions or ob­jec­tions?

So all right, I think this claim is pretty strong and I think a lot of you prob­a­bly dis­agree with it. The claim is, you can do re­search on AI safety now, even though we don’t know what the AGI looks like, be­cause there are eas­ier ver­sions of the prob­lem that we don’t know how to solve now, so we can just try and solve them. Fight me.

- [Stu­dent] You could tech­ni­cally make the prob­lem worse by ac­tu­ally ar­riv­ing to some con­clu­sions that will help ac­tual AI re­search, like not safety but like the ca­pa­bil­ities re­search by ac­ci­dent.

- Seems right. Yeah, maybe you should not pub­lish all the stuff that you come up with.

When you’re do­ing safety re­search, a lot of the time you’re im­plic­itly try­ing to an­swer the ques­tion of what early AGI sys­tems will look like. I think there’s a way in which safety re­search is par­tic­u­larly likely to run into dan­ger­ous ques­tions for this rea­son.

- [Stu­dent] So if we say that AGI is at least as good as a hu­man, couldn’t you just re­lax it to a hu­man? But if you do re­lax it to just, say, “I’m go­ing to try to make this hu­man or this brain as safe as pos­si­ble,” wouldn’t that be similar to op­er­a­tions re­search? In busi­ness school, where they de­sign sys­tems of re­dun­dan­cies in nu­clear plants and stuff like that?

- So, a re­lax­ation where you just pre­tend that this thing is liter­ally a hu­man — I think that this makes it too easy. Be­cause I think hu­mans are not go­ing to try and kill you, most of the time. You can imag­ine hav­ing a box which can just do all the things that a hu­man does at 10 cents an hour. I think that it’d be less pow­er­ful than an AGI in some ways, but I think it’s pretty use­ful. Like, if I could buy ar­bi­trary IQ-100 hu­man la­bor for 10 cents an hour, I would prob­a­bly be­come a re­sel­ler of cheap hu­man la­bor.

- [Stu­dent] I got a ques­tion from Dis­cord. How in­ter­pretable is the func­tion ap­prox­i­ma­tor? Do you think that we couldn’t al­ign a func­tion ap­prox­i­ma­tor with, say, the opac­ity of a lin­ear model?

- Yes. I mean, in this case, if you have an opaque func­tion ap­prox­i­ma­tor, then other prob­lems are harder. I’m as­sum­ing away in­ner al­ign­ment prob­lems (apolo­gies for the jar­gon). Even lin­ear mod­els still have the outer al­ign­ment prob­lem.

Anal­ogy to security

Here’s an­other ar­gu­ment I want to make. I’m go­ing to use se­cu­rity as an anal­ogy. Imag­ine you want to make a se­cure op­er­at­ing sys­tem, which has liter­ally zero se­cu­rity bugs, be­cause you’re about to use it as the con­trol sys­tem for your au­tonomous nu­clear weapons satel­lite that’s go­ing to be in space and then it’s go­ing to have all these nu­clear weapons in it.

So you re­ally need to make sure that no one’s go­ing to be able to hack it and you’re not able to change it and you ex­pect it to be in the sky for 40 years. It turns out that in this sce­nario you’re a lot bet­ter off if you’ve thought about se­cu­rity at the start of the pro­ject than if you only try to think about se­cu­rity at the end of the pro­ject. Speci­fi­cally it turns out that there are de­ci­sions about how to write soft­ware which make it dras­ti­cally eas­ier or harder to prove se­cu­rity. And you re­ally want to make these de­ci­sions right.

And in this kind of a world, it’s re­ally im­por­tant that you know how one goes about build­ing a se­cure sys­tem be­fore you get down to the tricky en­g­ineer­ing re­search of how to ac­tu­ally build the sys­tem. I think this is an­other situ­a­tion which sug­gests that work done early might be use­ful.

Another way of say­ing this is to think of op­er­at­ing sys­tems. I want to make an op­er­at­ing sys­tem which has cer­tain prop­er­ties, and cur­rently no one knows how to make an op­er­at­ing sys­tem with these prop­er­ties, but it’s go­ing to need to be built on top of some other prop­er­ties that we already un­der­stand about op­er­at­ing sys­tems and we should figure out how to do those se­curely first.

This is an ar­gu­ment that peo­ple at MIRI feel good about and of­ten em­pha­size. It’s eas­ier to put se­cu­rity in from the start. Over­all I think this is the ma­jor­ity of my rea­son for why I think that you can do use­ful safety work start­ing right now.

I want to give some lame rea­sons too, like lame meta rea­sons. Maybe it’s use­ful for field build­ing. Maybe you think that AI safety re­search that hap­pens to­day is just 10 times less use­ful than AI safety re­search that hap­pens in the five years be­fore the AGI is built. But if you want to have as much of that as pos­si­ble it’s re­ally helpful if you get the field built up now. And you have to do some­thing with your re­searchers and if you have them do the best AI safety re­search they can, maybe that’s not crazy.

- [Stu­dent] Maybe if you jump the gun and you try to start a move­ment be­fore it’s ac­tu­ally there and then it fiz­zles out, then it’s go­ing to be harder to start it when it’s re­ally im­por­tant.

- Yep. So here’s an ex­am­ple of some­thing kinda like that. There are peo­ple who think that MIRI, where I work, com­pletely screwed up AI safety for ev­ery­one by be­ing crazy on the in­ter­net for a long time. And they’re like, “Look, you did no good. You got a bunch of weird nerds on the in­ter­net to think AI safety is im­por­tant, but those peo­ple aren’t very com­pe­tent or ca­pa­ble, and now you’ve just poi­soned the field, and now when I try to talk to my pres­ti­gious, le­git ma­chine learn­ing friends they think that this is stupid be­cause of the one time they met some an­noy­ing ra­tio­nal­ist.” I think that’s kind of a re­lated con­cern that is real. Yeah, I think it’s a strong con­sid­er­a­tion against do­ing this.

- [Stu­dent] I agree with the se­cu­rity ar­gu­ment, but it brings up an­other ob­jec­tion, which is: even if you “make progress”, peo­ple have to ac­tu­ally make use of the things you dis­cov­ered. That means they have to be aware of it, it has to be cost effec­tive. They have to de­cide if they want to do it.

- Yeah, all right, I’m happy to call this crux four.

Crux 4: good al­ign­ment solu­tions will be put to use

Good al­ign­ment solu­tions will be put to use, or might be put to use. So I in fact think that it’s pretty likely… So there are these terms like “com­pet­i­tive­ness” and “safety tax” (or “al­ign­ment tax”) which are used to re­fer to the ex­tent to which it’s eas­ier to make an un­al­igned AI than an al­igned AI. I think that if it costs you only 10% more to build an al­igned AI, and if the ex­pla­na­tion of why this AI is al­igned is not that hard, as in you can un­der­stand it if spend a day think­ing about it, I would put more than 50% prob­a­bil­ity on the peo­ple who try to build this AGI us­ing that solu­tion.

The rea­son I be­lieve this is that when I talk to peo­ple who are try­ing to build AGIs, like peo­ple at Deep­Mind or OpenAI, I’m like, “Yep, they say the right things, like ‘I would like to build my AI to be al­igned, be­cause I don’t want to kill ev­ery­one’”. And I hon­estly be­lieve them. I think it’s just a re­ally com­mon de­sire to not be the one who kil­led all of hu­man­ity. That’s where I’m at.

- [Stu­dent] I mean, as a coun­ter­ar­gu­ment, you could walk into al­most any soft­ware com­pany and they’ll pay tons of lip ser­vice to good se­cu­rity and then not do it, right?

- Yep, that’s right. And that’s how we might all die. And what I said is in the case where it’s re­ally easy, in the case where it’s re­ally cheap, and it costs you only 10% more to build the AGI that’s al­igned, I think we’re fine. I am a lot more wor­ried about wor­lds where it would have cost you $10 billion to build the sub­tly un­al­igned AI but it costs you $100 billion to build the al­igned AI, and both of these prices fall by a fac­tor of two ev­ery year.

And then we just have to won­der whether some­one spends the $100 billion for the al­igned AI be­fore some­one spends the $10 billion dol­lars for the un­al­igned AI; and ac­tu­ally all these figures are fal­ling, main­tain­ing a con­stant ra­tio. I think think­ing about this is a good ex­er­cise.

And even scarier is the thing that I think is ac­tu­ally likely, is that build­ing the al­igned AI takes an ex­tra three years or some­thing. And the ques­tion will be, “How much of a lead time would the peo­ple who are try­ing to build the al­igned one ac­tu­ally have? Is it ac­tu­ally three years, I don’t think it is...”

Wouldn’t some­one even­tu­ally kill ev­ery­one?

- [Stu­dent] Even if most peo­ple would not want to de­stroy the hu­man race, isn’t there still that risk there will just be one re­ally dan­ger­ous or crazy per­son who does de­liber­ately want to cause havoc? And how do we deal with that?

- Yeah. I think that long-term, it’s not ac­cept­able to have there be peo­ple who have the abil­ity to kill ev­ery­one. It so hap­pens that so far no one has been able to kill ev­ery­one. This seems good. I think long-term we’re ei­ther go­ing to have to fix the prob­lem where some por­tion of hu­mans want to kill ev­ery­one or fix the prob­lem where hu­mans are able to kill ev­ery­one.

And I think that you could prob­a­bly do this through reg­u­lat­ing re­ally dan­ger­ous tech­nol­ogy or mod­ify­ing how hu­mans work so they aren’t go­ing to kill ev­ery­one.

This isn’t a ridicu­lous change from the sta­tus quo. The U.S. gov­ern­ment em­ploys peo­ple who will come to your house and ar­rest you if you are try­ing to make smal­l­pox. And this seems good, be­cause I don’t think it would be good if any­one who wanted to could make smal­l­pox.

Long-term, hu­man­ity is not go­ing to let peo­ple kill ev­ery­one. Maybe it turns out that if you want to build an AGI that can kill ev­ery­one, you’d have to have at least three mil­lion su­per GPUs, or maybe you need three TPU pods. Either way, peo­ple are go­ing to be like, “Well, you’re not al­lowed to have three TPU pods un­less you’ve got the offi­cial li­cence. There’ll be reg­u­la­tion and surveillance. Maybe the gov­ern­ment runs all the TPU pods, a bit like how gov­ern­ments runs all the plu­to­nium and hope­fully all of the bioweapons.

So that’s the an­swer to the ques­tion, “Wouldn’t some­one always try to kill ev­ery­one?”. The an­swer is yes, un­less you make all the hu­mans so they aren’t go­ing to do that by mod­ify­ing them. But long-term we need to get the risk to zero by mak­ing it im­pos­si­ble, and it seems pos­si­ble to imag­ine us suc­ceed­ing at this.

- [Stu­dent] Do you think that the solu­tion is bet­ter achieved through some sort of pub­lic policy thing like that or by some­thing that’s a pri­vate tool that peo­ple can use? Like, should we go through gov­ern­ment or should it be some­thing crowd­sourced?

- I don’t like the term crowd­sourced very much.

- [Stu­dent] I don’t re­ally know why I used that, but some­thing that comes from some open source tool or some­thing like that, or some­thing pri­vate.

- I don’t have a strong opinion. It seems like it’s re­ally hard to get gov­ern­ments to do com­pli­cated things cor­rectly. Like their $1.2 billion quan­tum com­put­ing grant. (laughs) And so it seems like we’re a bit safer in wor­lds where we don’t need com­pli­cated gov­ern­ment ac­tion. Like, yeah, I just feel pretty screwed if I need the gov­ern­ment to un­der­stand why and how to reg­u­late TPU pods be­cause oth­er­wise peo­ple will make re­ally dan­ger­ous AI. This would be re­ally rough. Imag­ine try­ing to ex­plain this to var­i­ous poli­ti­ci­ans. Not go­ing to be a good time.

- [Stu­dent] (un­in­tel­ligible)

- Yeah. Most hu­mans aren’t su­per evil. When I oc­ca­sion­ally talk to se­nior peo­ple who work on gen­eral AI re­search, I’m like, “This per­son, they’re not a saint, but they’re a solid per­son”.

Here’s a re­lated ques­tion — what would hap­pen if you gave some billion­aire 10 squillion dol­lars? If you gave most billion­aires in Amer­ica 10 squillion dol­lars and they could just rule the world now, I think there’s like at least a 70% chance that this goes re­ally solidly well, es­pe­cially if they know that one of the things they can do with their AGI is ask it what they should do or what­ever. I think that pre­vents some of the moral risk. That’s where I’m at.

[Post talk note: Some other jus­tifi­ca­tions for this: I think that (like most peo­ple) billion­aires want, all else equal, to do good things rather than bad things, and I think that pow­er­ful tech­nolo­gies might ad­di­tion­ally be use­ful for helping peo­ple to do a bet­ter job of figur­ing out what ac­tions are ac­tu­ally good or bad ac­cord­ing to their val­ues. And to be clear, hope­fully some­thing bet­ter hap­pens than hand­ing con­trol of the fu­ture to a ran­domly se­lected billion­aire. But I think it’s worth be­ing re­al­is­tic about how bad this would be, com­pared to other things that might hap­pen.]

Sounds like there are some dis­agree­ments. Any­thing you want to say?

- [Stu­dent] Yeah. The world, and this coun­try es­pe­cially, is ruled by the 1%, and I don’t think they’re do­ing very good things. So I think when it comes to evil and al­ign­ment and how money is es­pe­cially dis­tributed in this coun­try — they don’t have ac­cess to AGI just yet, but it would scare me if it was put in their hands. Say, Elon Musk for in­stance. I mean, I don’t think he’s an evil per­son — he’s very ec­cen­tric, but I don’t think he’s evil — but he’s prob­a­bly one. Let’s say it was put in the hands of the Rock­efel­lers or some­body like that, I don’t think they would use it for good.

- Yeah, I think this is a place where peo­ple...

- [Stu­dent] It’s a poli­ti­cal ar­gu­ment, yeah.

- Yeah, I don’t know. My best guess is that the su­per rich peo­ple are rea­son­ably good, yeah.

So the place where I’m most scared about this is I care a lot about an­i­mal welfare and an in­ter­est­ing fact about the world is that things like tech­nol­ogy got a lot bet­ter and this meant that we suc­cess­fully harmed farm an­i­mals in much greater num­bers.

[Post talk note: If you in­clude wild an­i­mal suffer­ing, it’s not clear what the net effect of tech­nol­ogy on an­i­mal welfare has been. Either way, tech­nol­ogy has en­abled a tremen­dous amount of an­i­mal suffer­ing.]

And this is kind of a rea­son to worry about what hap­pens when you take peo­ple and you make them wealthier. On the other hand, I kind of be­lieve it’s a weird fluke about the world that an­i­mals have such a bad situ­a­tion. Like, I kind of think that most hu­mans ac­tu­ally do kind of have a prefer­ence against tor­tur­ing an­i­mals. And if you made ev­ery­one a squillion ba­billion­aire they would figure out the not-tor­tur­ing-an­i­mals thing. Th­ese are some things where my in­tu­ition comes from.

Crux 5: My re­search is the good kind

My re­search is the good kind. My work, or the things that I do, are re­lated to the ar­gu­ment that there are things that you have to figure out ahead of time if you want things to be good. I can’t talk about it in de­tail, be­cause MIRI doesn’t by de­fault dis­close all the re­search that it does. But that’s what I do.


I’m go­ing to give an es­ti­mate of how con­fi­dent I am in each of these. Every time I do this I get con­fused over whether I want to give ev­ery step con­di­tioned on the pre­vi­ous steps. We’re go­ing to do that.

  1. AI would be a big deal if it showed up here. I’m ~95% sure that if AGI was re­ally cheap and it showed up in a world like this, the world would sud­denly look re­ally differ­ent. I don’t think I’m al­lowed to use num­bers larger than 95%, be­cause of that one time I made that ter­rible er­ror. And it’s very hard to cal­ibra­tion train enough, that you’re al­lowed to say num­bers larger than 95%. But I feel re­ally damn sure that the world would look re­ally differ­ent if some­one built AGI.

  2. AI is plau­si­bly soon­ish and the next big deal. Given the pre­vi­ous one, not that the con­di­tional mat­ters that much for this one, I feel ~60% con­fi­dent.

  3. You can do good by think­ing ahead on AGI. It’s kind of rough, be­cause the nat­u­ral product of this isn’t like a prob­a­bil­ity, it’s like a weight­ing; it’s like how much worse is it than do­ing things. I’m go­ing to give this 70%.

  4. Align­ment solu­tions might be put to use by good­ish peo­ple if you have good enough ones. 70%.

  5. My re­search is the good kind. Maybe 50%?

Okay, cool, those are the num­bers. We can mul­ti­ply them all to­gether. 60% times 95% times 70% times 70% times 50%.

[Post-talk note: This turns out to be about 14%, which is some­what lower than my ac­tual in­tu­ition for how en­thu­si­as­tic I am about my work.]


I’m go­ing to take some more ques­tions for a bit.

- [Stu­dent] So, is this how MIRI ac­tu­ally chooses what to work on?

- No.

- [Stu­dent] So, how does MIRI choose where to al­lo­cate re­sources and then do re­search?

- I think MIRI is much more into par­tic­u­lar men­tal mo­tions.

- [Stu­dent] Men­tal mo­tions?

- The think­ing I’ve been de­scribing is the kind of think­ing that I do when I’m say­ing, “Should I, in­stead of my job, do a differ­ent job?” For in­stance, I could do EA move­ment-build­ing work. (Like for ex­am­ple com­ing to Stan­ford EA and talk­ing to Stan­ford stu­dents and giv­ing talks.) And I think this is pretty good and I do it some­times.

When I’m try­ing to think of what I should do for AI safety in terms of tech­ni­cal re­search, I would say mostly I just don’t use my own judg­ment. Mostly I’m just like, “Nate Soares, who runs MIRI, thinks that it would be helpful for him if I did this. And on the cou­ple of do­mains where I feel like I can eval­u­ate Nate, I think he’s re­ally smart.”

- [Stu­dent] Smart in what way? Like, what’s your met­ric?

- I think that when I talk, Nate is just re­ally, re­ally good at tak­ing very gen­eral ques­tions about the world and figur­ing out how to think about them in ways that get new true an­swers.

E.g., I talk to him about physics ⁠— and I feel qual­ified to think about some ar­eas of physics ⁠— and then he just has re­ally smart thoughts and he thinks about them in a re­ally clever way. And I think that when­ever I ar­gue with him about AI safety he says pretty smart things.

And then he might tell me he thinks this par­tic­u­lar re­search di­rec­tion is great. And then I more up­date based on my re­spect for Nate and based on his ar­gu­ments about what type of tech­ni­cal prob­lems would be good to solve, than I up­date based on my own judg­ment about the tech­ni­cal prob­lems. This is par­tic­u­larly be­cause there are wor­ld­view ques­tions about what type AI al­ign­ment re­search is helpful that I don’t know what I think of.

- [Stu­dent] Do you ever con­sider what you just en­joy do­ing in a com­pletely out­come-in­de­pen­dent way?

- I do oc­ca­sion­ally ask the ques­tion, what do I en­joy do­ing? And when I’m con­sid­er­ing po­ten­tial pro­jects, I give bonus of like 2x or 3x to ac­tivi­ties that I re­ally en­joy.

- [Stu­dent 2] Maybe this is too broad, but why did you choose, or was it a choice, to place your trust on re­search di­rec­tions in Nate Soares ver­sus like Paul Chris­ti­ano or some­body else?

- Well once upon a time I was in a po­si­tion where I could try to work for MIRI or I could try to work for Paul. I have a com­par­a­tive ad­van­tage of work­ing for MIRI. I have a com­par­a­tive dis­ad­van­tage at work­ing for Paul, com­pared to the av­er­age soft­ware en­g­ineer. Be­cause MIRI wanted some peo­ple who were good at screw­ing around with func­tional pro­gram­ming and type the­ory and stuff, and that’s me. And Paul wanted some­one who was good at mess­ing around with ma­chine learn­ing, and that’s not me. And I said, “Paul, how much worse do you think my work will be if I go to MIRI?” And he said, “Four times.” And then I crunched some num­bers. And I was like, “Okay, how right are differ­ent peo­ple likely to be about what AI al­ign­ment work is im­por­tant.” And I was like, “Well…”

I don’t ⁠— look, you asked. I’m go­ing to tell you what I ac­tu­ally thought. I don’t think it makes me sound very vir­tu­ous. I thought, “Eliezer Yud­kowsky from MIRI is way smarter than me. Nate Soares is way smarter than me. Paul Chris­ti­ano is way smarter than me. That’s two to one.” And that’s how I’m at MIRI.

I would say, time has gone on and now I’ve up­dated to­wards Paul’s view of the world in a lot of ways. But the com­par­a­tive ad­van­tage ar­gu­ment is keep­ing me at MIRI.

- [Stu­dent] So if you were to syn­the­size a hu­man be­ing through biotech­nol­ogy and cre­ate an ar­tifi­cial hu­man then does that count as AI or AGI?

- Eh, I mean I’m in­ter­ested in defin­ing words inas­much as they help me rea­son about the fu­ture. And I think an im­por­tant fact about mak­ing hu­mans is that it will change the world if and only if you know how to use that to make re­ally smart hu­mans. In that case I would call that in­tel­li­gence en­hance­ment, which we didn’t re­ally talk about but which does seem like it de­serves to be on the list of things that would to­tally change the world. But if you can just make ar­tifi­cial hu­mans — I don’t count IVF as AGI, even though there’s some re­ally stupid defi­ni­tion of AGI such that it’s AGI. And that’s just be­cause it’s more use­ful to have the word “AGI” re­fer to this com­put­ery thing where the prices might fall rapidly and the in­tel­li­gence might in­crease rapidly.

- [Stu­dent] And what if it’s like some cy­borg com­bi­na­tion of hu­man and com­puter and then those ar­gu­ments do ap­ply, with at least the com­puter part’s price fal­ling rapidly?

- Yep, that’s a good ques­tion. My guess is that the world is not rad­i­cally changed by hu­man-com­puter in­ter­faces, or brain in­ter­faces, be­fore it’s rad­i­cally changed by one of the other things, but I could be wrong. One of the ways in which that seems most likely to change the world is by en­abling re­ally crazy mind con­trol or lie de­tec­tion things.

- [Stu­dent] How likely do you think it is that be­ing aware of cur­rent re­search is im­por­tant for long-term AGI safety work? Be­cause I think a lot of the peo­ple from MIRI I talked to were kind of dis­mis­sive about know­ing about cur­rent re­search be­cause they think it’s so ir­rele­vant that even­tu­ally it won’t re­ally yield most benefit in the fu­ture. What’s your per­sonal opinion?

- It seems like one re­ally rele­vant thing that plays into this is whether the cur­rent ma­chine learn­ing stuff is similar in im­por­tant ways to the AI sys­tems that we’re go­ing to build in the fu­ture. To the ex­tent you be­lieve that it will be similar, I think the an­swer is yes, ob­vi­ously ma­chine learn­ing facts from now are more rele­vant.

Okay, the fol­low­ing is kind of sub­tle and I’m not quite sure I’m go­ing to be able to say it prop­erly. But re­mem­ber when I was say­ing re­lax­ations are one way you can think about AI safety? I think there’s a sense that if you don’t know how to solve a prob­lem in the re­laxed ver­sion — if I don’t even know how to do good things with my halt­ing or­a­cle on a USB drive — then I’m not go­ing to be able to al­ign ML sys­tems.

Part of this is that I think facts about ma­chine learn­ing should never make the prob­lem eas­ier. You should never rely on spe­cific facts about how ma­chine learn­ing works in your AI safety solu­tions, be­cause you can’t rely on those to hold as your sys­tems get smarter.

If em­piri­cal facts about ma­chine learn­ing sys­tems should never be re­lied on in your AI safety solu­tions, and there are just not that many non-em­piri­cal facts about ma­chine learn­ing, then if you just think of ma­chine learn­ing as mag­i­cal func­tion ap­prox­i­ma­tors, that’s just most of the struc­ture of ma­chine learn­ing that is safe to as­sume. So that’s an ar­gu­ment against car­ing about ma­chine learn­ing.

- [Stu­dent] Or any prior knowl­edge, I guess? The same ar­gu­ment could be made about any as­sump­tions about a sys­tem that might not hold in the fu­ture.

- That’s right. That’s right, it does in­deed hold there as well.

- [Stu­dent] Yeah.

- So the main rea­son to know about ma­chine learn­ing from this per­spec­tive, is it’s re­ally nice to have con­crete ex­am­ples. If you’re study­ing ab­stract alge­bra and you’ve never heard of any con­crete ex­am­ples of a group, you should to­tally just go out and learn 10 ex­am­ples of a group. And I think that if you have big the­o­ries about how in­tel­li­gence works or what­ever, or how func­tion ap­prox­i­ma­tors work, it’s ab­solutely worth it to know how ma­chine learn­ing works in prac­tice be­cause then you might re­al­ize that you’re ac­tu­ally an idiot and this is to­tally false. So I think that it’s very worth­while for AI safety re­searchers to know at least some stuff about ma­chine learn­ing. Feel free to quiz me and see whether you think I’m be­ing vir­tu­ous by my own stan­dards. I think it’s iffy. I think it’s like 50-50 on whether I should spend more or less time learn­ing ma­chine learn­ing, which is why I spend the amount of time I spend on it.

- [Stu­dent] From a the­o­ret­i­cal stand­point, like Mar­cus Hut­ter’s per­spec­tive, there’s a the­ory of gen­eral AI. So to make pow­er­ful AGI, it’s just a ques­tion of how to cre­ate a good ar­chi­tec­ture which can do Bayesian in­fer­ence, and it’s a ques­tion of how to run it well in hard­ware. It’s not like you need to have great in­sights which one guy could have, it’s more about en­g­ineer­ing. And then it’s not 10% which is added to cost to do safety; we need to have a whole team which would try to un­der­stand how to do safety. And it seems that peo­ple who don’t care about safety will build the AGI faster than that, sig­nifi­cantly faster than peo­ple who care about safety. And I mean how bad is it?

- I heard many sen­tences and then, “How bad is it?”. And the sen­tences made sense.

How bad is it? I don’t know. Pretty bad?

In terms of the stuff about AIXI, my an­swer is kind of long and I kind of don’t want to give it. But I think it’s a pretty bad sum­mary to say “we already know what the the­o­ret­i­cal frame­work is and we’re just do­ing en­g­ineer­ing work now”. That’s also true of liter­ally ev­ery other tech­ni­cal sub­ject. You can say all of chem­istry is like — I already know how to write down the Schrod­inger equa­tion, it’s “just en­g­ineer­ing work” to an­swer what chem­i­cals you get. Also, all of biol­ogy is just the en­g­ineer­ing work of figur­ing out how the Schrod­inger equa­tion ex­plains ants or some­thing. So I think that the en­g­ineer­ing work is find­ing good al­gorithms to do the thing. But this is also work which in­volves how­ever much the­o­ret­i­cal struc­ture. Happy to talk about this more later.

- [Stu­dent] Do you dis­agree with Paul Chris­ti­ano on any­thing?

- Yes.

- [Stu­dent] Or with other smart peo­ple?

- So, Paul Chris­ti­ano is re­ally smart and it’s hard to dis­agree with him, be­cause ev­ery time I try to dis­agree with him, I’d say some­thing like, “But what about this?” And he’s like, “Oh, well I would re­spond to that with this re­but­tal”. And then I’m like, “Oh geez, that was a good re­but­tal”. And then he’d say some­thing like, “But I think some similar ar­gu­ments against my po­si­tion which are stronger are the fol­low­ing” and then he rat­tles off four bet­ter ar­gu­ments against his po­si­tion and then he re­buts those and it’s re­ally great. But the places where I most think Paul is wrong, I think Paul is maybe wrong about… I mean, ob­vi­ously I’m bet­ting on MIRI be­ing bet­ter than he thinks. Paul would also think I should quit my job and work on meta stuff prob­a­bly.

- [Stu­dent] Meta stuff?

- Like, work on AI safety move­ment build­ing.

The biggest thing where I sus­pect Paul Chris­ti­ano is wrong is, if I had to pick a thing which feels like the sim­plest short sweet story for a mis­take, it’s that he thinks the world is metaphor­i­cally more made of liquids than solids.

So he thinks that if you want to think about re­search you can add up all the con­tri­bu­tions to re­search done by all the in­di­vi­d­u­als and each of these is a num­ber and you add the num­bers to­gether. And he thinks things should be smooth. Be­fore the year in which AGI is worth a trillion dol­lars, it should have been worth half a trillion dol­lars and you can look at the his­tory of growth curves and you can look at differ­ent tech­nolog­i­cal de­vel­op­ments and see how fast they were and you can in­fer all these things from it. And I think that when I talk to him, I think he’s more smooth-curve-fit­ting ori­ented than I am.

- [Stu­dent] Sorry, I didn’t fol­low that last part.

- A thing that he thinks is re­ally com­pel­ling is that world GDP dou­bles ev­ery 20 years, and has dou­bled ev­ery 20 years or so for the last 100 years, maybe 200 years, and be­fore that dou­bled some­what more slowly. And then be­fore the In­dus­trial Revolu­tion it dou­bled ev­ery cou­ple hun­dred years. And he’s like, “It would be re­ally damn sur­pris­ing if the time be­tween dou­blings fell by a fac­tor of two.” And he ar­gues about AI by be­ing like, “Well, this the­ory about AI can’t be true, be­cause if that was true then the world would have dou­bling times that changed by more than this ra­tio.”

[Post-talk note: I be­lieve the In­dus­trial Revolu­tion ac­tu­ally in­volved a fall of dou­bling times from 600 to 200 years, which is a dou­bling time re­duc­tion of 3x. Thanks to Daniel Koko­ta­jlo for point­ing this out to me once.]

- [Stu­dent] But I guess the most im­por­tant things are things that are sur­pris­ing. So all of these kind of, it just strikes me as sort of a—

- I mean, I think he thinks your plans are good ac­cord­ing to the ex­pected use­ful­ness they have. And he’s like, “Look, the world is prob­a­bly go­ing to have a lot of smooth curves. There’s prob­a­bly go­ing to be a four-year pe­riod in which the econ­omy dou­bles be­fore there’s a one-year pe­riod in which the econ­omy dou­bles.” And I’m less in­clined to take that kind of ar­gu­ment as se­ri­ously.

We are at time. So I want to get din­ner with peo­ple. So I’m go­ing to stand some­where and then if you stand close enough to me you might figure out where I’m get­ting din­ner, if you want to get din­ner with me af­ter­wards. Any­thing else sup­posed to hap­pen be­fore we leave here? Great, thanks so much.