My personal cruxes for working on AI safety

The following is a heavily edited transcript of a talk I gave for the Stanford Effective Altruism club on 19 Jan 2020. I had transcribe it, and then Linchuan Zhang, Rob Bensinger and I edited it for style and clarity, and also to occasionally have me say smarter things than I actually said. Linch and I both added a few notes throughout. Thanks also to Bill Zito, Ben Weinstein-Raun, and Howie Lempel for comments.

I feel slightly weird about posting something so long, but this is the natural place to put it.

Over the last year my beliefs about AI risk have shifted moderately; I expect that in a year I’ll think that many of the things I said here were dumb. Also, very few of the ideas here are original to me.


After all those caveats, here’s the talk:


It’s great to be here. I used to hang out at Stanford a lot, fun fact. I moved to America six years ago, and then in 2015, I came to Stanford EA every Sunday, and there was, obviously, a totally different crop of people there. It was really fun. I think we were a lot less successful than the current Stanford EA iteration at attracting new people. We just liked having weird conversations about weird stuff every week. It was really fun, but it’s really great to come back and see a Stanford EA which is shaped differently.

Today I’m going to be talking about the argument for working on AI safety that compels me to work on AI safety, rather than the argument that should compel you or anyone else. I’m going to try to spell out how the arguments are actually shaped in my head. Logistically, we’re going to try to talk for about an hour with a bunch of back and forth and you guys arguing with me as we go. And at the end, I’m going to do miscellaneous Q and A for questions you might have.

And I’ll probably make everyone stand up and sit down again because it’s unreasonable to sit in the same place for 90 minutes.

Meta level thoughts

I want to first very briefly talk about some concepts I have that are about how you want to think about questions like AI risk, before we actually talk about AI risk.

Heuristic arguments

When I was a confused 15 year old browsing the internet around 10 years ago, I ran across arguments about AI risk, and I thought they were pretty compelling. The arguments went something like, “Well, sure seems like if you had these powerful AI systems, that would make the world be really different. And we don’t know how to align them, and it sure seems like almost all goals they could have would lead them to kill everyone, so I guess some people should probably research how to align these things.” This argument was about as sophisticated as my understanding went until a few years ago, when I was pretty involved with the AI safety community.

I in fact think this kind of argument leaves a lot of questions unanswered. It’s not the kind of argument that is solid enough that you’d want to use it for mechanical engineering and then build a car. It’s suggestive and heuristic, but it’s not trying to cross all the T’s and dot all the I’s. And it’s not even telling you all the places where there’s a hole in that argument.

Ways heuristic arguments are insufficient

The thing which I think is good to do sometimes, is instead of just thinking really loosely and heuristically, you should try to have end-to-end stories of what you believe about a particular topic. And then if there are parts that you don’t have answers to, you should write them down explicitly with question marks. I guess I’m basically arguing to do that instead of just saying, “Oh, well, an AI would be dangerous here.” And if there’s all these other steps as well, then you should write them down, even if you’re just going to have your justification be question marks.

So here’s an objection I had to the argument I gave before. AI safety is just not important if AI is 500 years away and whole-brain emulation or nanotechnology is going to happen in 20 years. Obviously, in that world, we should not be working on AI safety. Similarly, if some other existential risk might happen in 20 years, and AI is just definitely not going to happen in the next 100 years, we should just obviously not work on AI safety. I think this is pretty clear once I point it out. But it wasn’t mentioned at all in my initial argument.

I think it’s good to sometimes try to write down all of the steps that you have to make for the thing to actually work. Even if you’re then going to say things like, “Well, I believe this because other EAs seem smart, and they seem to think this.” If you’re going to do that anyway, you might as well try to write down where you’re doing it. So in that spirit, I’m going to present some stuff.

- [Guest] There’s so many existential risks, like a nuclear war could show up at any minute.

- Yes.

- [Guest] So like, is there some threshold for the probability of an existential risk? What’s your criteria for, among all the existential risks that exist, which ones to focus on?

- That’s a great question, and I’m going to come back to it later.

- [Guest] Could you define a whole-brain emulation for the EA noobs?

- Whole-brain emulation is where you scan a human brain and run it on a computer. This is almost surely technically feasible; the hardest part is scanning human brains. There are a bunch of different ways you could try to do this. For example, you could imagine attaching a little radio transmitter to all the neurons in a human brain, and having them send out a little signal every time that neuron fires, but the problem with this is that if you do this, the human brain will just catch fire. Because if you just take the minimal possible energy in a radio transmitter, that would get the signal out, and then you multiply that by 100 billion neurons, you’re like, “Well, that sure is a brain that is on fire.” So you can’t currently scan human brains and run them. We’ll talk about this more later.

Thanks for the question. I guess I want to do a quick poll of how much background people are coming into this with. Can you raise your hand if you’ve spent more than an hour of thinking about AI risk before, or hearing talks about AI risk before?

Can you raise your hand if you know who Paul Christiano is, or if that name is familiar?Can you raise your hand if you knew what whole-brain emulation was before that question was asked?

Great. Can you raise your hand if you know what UDASSA is?

Great, wonderful.

I kind of wanted to ask a “seeing how many people are lying about things they know” question. I was considering saying a completely fake acronym, but I decided not to do that. I mean, it would have been an acronym for something, and they would have been like, “Why is Buck asking about that concept from theoretical biology?”

Ways of listening to a talk

All right, here’s another thing. Suppose you’re listening to a talk from someone whose job is thinking about AI risk. Here are two ways you could approach this. The first way is to learn to imitate my utterances. You could think, “Well, I want to know what Buck would say in response to different questions that people might ask him.”

And this is a very reasonable thing to do. I often talk to someone who’s smart. I often go talk to Paul Christiano, and I’m like, well, it’s just really decision-relevant to me to know what Paul thinks about all these topics. And even if I don’t know why he believes these things, I want to know what he believes.

Here’s the second way: You can take the things that I’m saying as scrap parts, and not try to understand what I overall believe about anything. You could just try to hear glimmers of arguments that I make, that feel individually compelling to you, such that if you had thought of that argument, you’d be like, “Yeah, this is a pretty solid argument.” And then you can try and take those parts and integrate them into your own beliefs.

I’m not saying you should always do this one, but I am saying that at least sometimes, your attitude when someone’s talking should be, “This guy’s saying some things. Probably he made up half of them to confuse me, and probably he’s an idiot, but I’m just going to listen to them, and if any of them are good, I’m going to try and incorporate them. But I’m going to assess them all individually.”

Okay, that’s the meta points. Ready for some cruxes on AI risk?

- [Guest] Just one clarification. So, does that mean then that the, in your belief, the whole-brain emulation is going to happen in 20 years?

- Sorry, what? I think whole-brain emulation is not going to happen in 20 years.

- [Guest] Okay, so the numbers you threw out were just purely hypothetical?

- Oh, yes, sorry, yes. I do in fact work on AI safety. But if I had these other beliefs, which I’m going to explain, then I would not work on AI safety. If I thought whole-brain emulation were coming sooner than AI, I would de-prioritize AI safety work.

- [Guest] Okay.


Something that would be great is that when I say things, you can write down things that feel uncompelling or confusing to you about the arguments. I think that’s very healthy to do. A lot of the time, the way I’m going to talk is that I’m going to say something, and then I’m going to say the parts of it that I think are uncompelling. Like, the parts of the argument that I present that I think are wrong. And I think it’s pretty healthy to listen out and try and see what parts you think are wrong. And then I’ll ask you for yours.

Crux 1: AGI would be a big deal if it showed up here

Okay, AGI would be a big deal if it showed up here. So I’m going to say what I mean by this, and then I’m going to give a few clarifications and a few objections to this that I have.

This part feels pretty clear. Intelligence seems really important. Imagine having a computer that was very intelligent; it seems like this would make the world look suddenly very different. In particular, one major way that the world might be very different is: the world is currently very optimized by humans for things that humans want, and if I made some system, maybe it would be trying to make the world be a different way. And then maybe the world would be that very different way instead.

So I guess under this point, I want to say, “Well, if I could just have a computer do smart stuff, that’s going to make a big difference to what the world is like, and that could be really good, or really bad.”

There’s at least one major caveat to this, which I think is required for this to be true. I’m curious to hear a couple of people’s confusion, or objections to this claim, and then I’ll say the one that I think is most important, if none of you say it quickly enough.

- [Guest] What do you mean by “showed up here”? Because, to my mind, “AGI” actually means general intelligence, meaning that it can accomplish any task that a human can, or it can even go beyond that. So what do you mean by “showed up here”?

- Yeah, so by “here”, I guess I’m trying to cut away worlds that are very different from this one. So for instance, I think that if I just said, “AGI would be a big deal if it showed up”, then I think this would be wrong. Because I think there are worlds were AGI would not be a big deal as much. For instance, what if we already have whole-brain emulation? I think in that world, AGI is a much smaller deal. So I’m trying to say that in worlds that don’t look radically different from this one, AGI is a big deal.

- [Guest] So you’re saying “if the world is identical, except for AGI”?

- That’s a good way of putting it. If the world looks like this, kind of. Or if the world looks like what I, Buck, expect it to look in 10 years. And then we get AGI ⁠— that would be a really different world.

Any other objections? I’ve got a big one.

- [Guest] I’m a bit confused about how agency, and intelligence and consciousness relate, and how an intelligence would have preferences or ways it would want the world to be. Or, like, how broad this intelligence should be.

- Yeah!

I’m going to write down people’s notes as I go, sometimes, irregularly, not correlated with whether I think they’re good points or not.

- [Guest] Do you have definitions of “AGI” and “big deal”?

- “AGI”: a thing that can do all the kind of smart stuff that humans do. By “big deal”, I mean it basically is dumb to try to make plans that have phases which are concrete, and happen after the AGI. So, by analogy, almost all of my plans would seem like stupid plans if I knew that there was going to be a major alien invasion in a year. All of my plans that are, like, 5-year-time-scale plans are bad plans in the alien invasion world. That’s what I mean by “big deal”.

[Post-talk note: Holden Karnofsky gives a related definition here: he defines “transformative AI” as “AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution”.]

- [Guest] I think one objection could be that if AGI were developed, we would be unable to get it to cooperate with us to do anything good, and it may have no interest in doing anything bad, in which case, it would not be a big deal.

- Yep, that makes sense. I personally don’t think that’s very likely, but that would be a way this could be wrong.

The main objection I have is that I didn’t mention what the price of the AGI is. For instance, I think a really important question is “How much does it cost you to run your AGI for long enough for it to do the same intellectual labor that a human could do in an hour?” For instance, if it costs $1 million an hour: almost no human gets paid $1 million an hour for their brains. In fact, I think basically no human gets paid that much. I think the most money that a human ever makes in a year is a couple billion dollars. And there’s approximately 2,000 working hours a year, which means that you’re making $500,000 an hour. So max human wage is maybe $500,000 per hour. I would love it if someone checks the math on this.

[Linch adds: 500K * 2000 = 1 billion. I assume “couple billion” is more than one. Sanity check: Bezos has ~100 billion accumulated in ~20 years, so 5B/​year; though unclear how much of Jeff Bezos’ money is paid for his brain vs. other things like having capital/​social capital. Also unclear how much Bezos should be valued at ex ante.]

So, a fun exercise that you can do is you can imagine that we have a machine that can do all the intellectual labor that a human can do, at some price, and then we just ask how the world looks different in that world. So for instance, in the world where that price is $500,000 an hour, that just does not change the world very much. Another one is: let’s assume that this is an AGI that’s as smart as the average human. I think basically no one wants to pay $500,000 an hour to an average human. I think that at $100 an hour, that’s the price of a reasonably well-trained knowledge worker in a first-world country, ish. And so I think at that price, $100 an hour, life gets pretty interesting. And at the price of $10 an hour, it’s really, really wild. I think at the price of $1 an hour, it’s just absurd.

Fun fact: if you look at the computation that a human brain does, and you say, “How much would it cost me to buy some servers on AWS that run this much?”, the price is something like $6 an hour, according to one estimate by people I trust. (I don’t think there’s a public citation available for this number, see here for a few other relevant estimates.) You estimate the amount of useful computational work done by the brain, using arguments about the amount of noise in various brain components to argue that the brain can’t possibly be relying on more than three decimal places of accuracy of how hard a synapse is firing, or something like that, and then you look at how expensive it is to buy that much computing power. This is very much an uncertain median guess rather than a bound, and I think it is also somewhat lower than the likely price of running a whole brain emulation (for that, see “Whole Brain Emulation: A Roadmap”).

But yeah, $6 an hour. So the reason that we don’t have AGI is not that we could make AGI as powerful as the brain, and we just don’t because it’s too expensive.

- [Guest] I’m just wondering, what’s some evidence that can make us expect that AGI will be super expensive?

- Well, I don’t know. I’m not particularly claiming that it will be particularly expensive to run. One thing that I am comfortable claiming is if something is extremely valuable, the first time that it happens, it’s usually about that expensive, meaning you don’t make much of a profit. There’s some kind of economic efficiency argument that if you can make $1 million from doing something, and the price is steadily falling, people will probably first do it at the time when the price is about $1 million. And so an interesting question is: if I imagine in every year, people are being reasonable, then how much is the world different in the year when AGI costs you $2,500 an hour to run versus, like, $10 an hour to run?Another fun exercise, which I think is pretty good, is you can look at Moore’s Law or something and say, “Well, let’s just assume the price of a transistor costs something like this. It falls by a factor of two every 18 months. Let’s suppose that one year it costs $10,000 an hour to run this thing, and then it halves every 18 months.” And you look at how the world changes over time, and it’s kind of an interesting exercise.

Other thoughts or objections?

- [Guest] Even if it’s more expensive, if it’s ridiculously faster than a human brain, it could still be valuable.

- Yeah. So for instance, I know people who make a lot of money being traders. These people are probably mostly three standard deviations above average for a human. Some of these humans get paid thousands of dollars an hour, and also if you can just scale how fast they run, linearly in price, it would be worth it to run them many times faster. This is per hour of human labor, but possibly, you can get it faster in serial time. Like, another thing you probably want to do with them is have a bunch of unmanned submarines, where it’s a lot less bad if your AI gets destroyed by a missile or something. Okay, any other thoughts?

- [Guest] So, yes, it wouldn’t necessarily be logical to run AGI if it was very expensive, but I still think people would do it, given that you have technology like quantum computers, which right now can’t do anything that a normal computer can’t do, and yet we pour millions and billions of dollars into building them and running them, and run all kinds of things on them.

- I mean, I think we don’t pour billions of dollars. Tell me if I’m wrong, please. But I would have thought that we spend a couple tens of millions of dollars a year, and some of that is because Google is kind of stupid about this, and some of it is because the NSF funds dumb stuff. I could just be completely wrong.

- [Guest] Why is Google stupid about this?

- As in like, sociologically, what’s wrong with them?

- [Guest] Yeah.

- I don’t know. Whatever, I think quantum computing is stupid. Like, controversial opinion.

- [Guest] There was a bill to inject $1.2 billion into quantum.

- Into quantum.

- [Guest] I think I read it on Gizmodo. I remember when this happened. The U.S. government or someone — I don’t know, Europe? — someone put a ton of money, like a billion dollars, into quantum research grants.

- Okay, but quantum… sorry, I’m not disagreeing with you, I’m just disagreeing with the world or something. Chemistry is just quantum mechanics of electrons. Maybe they just like that. I’d be curious if you could tell us. My guess is that we don’t pour billions of dollars. The world economy is like $80 trillion a year, right? The U.S. economy’s like $20 trillion a year.

- [Guest] Trump did in fact sign a $1.2 billion quantum computing bill.

- Well, that’s stupid.

- [Guest] Apparently, this is because we don’t want to fall behind in the race with China.

- Well, that’s also stupid.

- [Guest] But I can see something similar happening with AGI.

- Yeah, so one thing is, it’s not that dangerous if it costs a squillion billion dollars to run, because you just can’t run it for long enough for anything bad to happen. So, I agree with your points. I think I’m going to move forward slightly after taking one last comment.

- [Guest] Do you have any examples of technologies that weren’t a big deal, purely because of the cost?

- I mean, kind of everything is just a cost problem, right?

[Linch notes: We figured out alchemy in the early 1900s.]

- [Guest] Computers, at the beginning. Computers were so expensive that no one could afford them, except for like NASA.

- [Guest] Right, but over time, the cost decreased, so are you saying that...? Yeah, I’m just wondering, with AGI, it’s like, reasonable to think maybe the initial version is very expensive, but then work will be put into it and it’ll be less expensive. Is there any reason to believe that trend wouldn’t happen for AGI?

- Not that I know of. My guess is that the world looks one of two ways. One is that either you have something like the cost of human intellectual labor folds by a factor of ten for a couple years, starting at way too expensive and ending at dirt cheap. Or it happens even faster. I would be very surprised if it’s permanently too expensive to run AGI. Or, I’d be very, very, very surprised if we can train an AGI, but we never get the cost below $1 million.

And this isn’t even because of the $6 an hour number. Like, I don’t know man, brains are probably not perfect. It would just be amazing if evolution figured out a way to do it that’s like a squillion times cheaper, but we still figure out a way to do it. Like, it just seems to me that the cost is probably going to mostly be in the training. My guess is that it costs a lot more to train your AGI than to run it. And in the world where you have to spend $500,000 an hour to run your thing, you probably had to spend fifty gazillion dollars to train it. And that would be the place where I expect it to fail.

You can write down your other objections, and then we can talk about them later.

Crux 2: AGI is plausibly soonish, and the next big deal

All right, here’s my next crux. AGI is plausibly soon-ish, as in, less than 50 years, and the next big deal. Okay, so in this crux I want to argue that AGI might happen relatively soon, and also, it might happen before one of the other crazy things happen that would mean we should only focus on that thing instead.

So a couple of things that people have already mentioned, or that I mentioned, as potentially crazy things that would change the world. There’s whole-brain emulation. Can other people name some other things that would make the world radically different if they happened?

- [Guest] Very widespread genetic engineering.

- Yeah, that seems right. By the way, the definition of “big deal” that I want you guys to use is “you basically should not make specific concrete plans which have steps that happen after that thing happens”. I in fact think that widespread and wildly powerful genetic engineering of humans is one, such that you should not have plans that go after when the widespread genetic engineering happens, or you shouldn’t have specific plans.

- [Guest] Nuclear war.

- Yeah, nuclear war. Maybe other global catastrophic risks. So anything which looks like it might just really screw up what the world looks like. Anything which might kill a billion people. If something’s going to kill a billion people, it seems plausible that that’s really important and you should work on that instead. It’s not like a total slam dunk that you should work on that instead, but it seems plausible at least. Yeah, can I get some more?

- [Guest] What about nuclear fusion? I read an article saying that if any government could get that kind of technology, it could potentially trigger a war, just because it breaks the balance of power that is currently in place in international politics.

- Yeah, I can imagine something like that happening, maybe. I want to put that somewhat under other x-risks, or nuclear war. Another kind of thing that feels like is an example of destabilization of power. But destabilization of various types mostly is a thing because it leads to x-risk.

- [Guest] Do you consider P = NP to be such for that?

- Depends on how good the algorithm is. [Linch: The proof might also not be constructive.]

- [Guest] Yeah, it depends. In public key cryptography, there’s...

- I don’t really care about public key cryptography breaking… If P = NP, and there’s just like a linear time algorithm for like… If you can solve SAT problems of linear size and linear time, apologies for the jargon, I think that’s just like pretty close to AGI. Or that’s just like — if you have that technology, you can just solve any machine learning problem you want, by saying, “Hey, can you tell me the program which does the best on this particular score?” And that’s just a SAT problem. I think that it is very unlikely that there’s just like a really fast, linear time, SAT solving algorithm. Yeah, that’s an interesting one. Any others?

- [Guest] Like a plague, or a famine. Or like, terrible effects of climate change, or like a super volcano.

- Okay.

Natural x-risks, things that would kill everyone, empirically don’t happen that often. You can look at the earth, and you can be like, “How often have things happened that would have killed everyone if they happened now?” And the answer’s like, a couple times. Natural disasters which would qualify as GCRs but not x-risks are probably also rare enough that I am not that worried about them. So I think it’s most likely that catastrophic disasters that happen soon will be a result of technologies which were either invented relatively recently (eg nukes) or haven’t been developed yet.

In the case of climate change, we can’t use that argument, because climate change is anthropogenic; however, my sense is that experts think that climate change is quite unlikely to cause enough damage to be considered a GCR.

Another one I want to include is sketchy dystopias. We have never had an evil empire which has immortal god emperors, and perfect surveillance, and mind reading and lie detection. There’s no particular technical reason why you can’t have all these things. They might all be a lot easier than AGI. I don’t know, this seems like another one.

If I had to rank these in how likely they seem to break this claim, I’d rank them from most to least likely as:

  • Various biosecurity risks

  • Stable dystopias, nuclear war or major power war, whole brain emulation

  • Climate change

  • Super volcanos, asteroids

I want to say why I think AI risk is more likely than these things. Or getting AGI is more likely earlier.

But before I say that, you see how I wrote less than 50 years here? Even if I thought the world in 100 years was going to just be like the world like it is now, except with mildly better iPhones — maybe mildly worse iPhones, I don’t know, it’s not clear what the direction the trend is… I don’t know. Affecting the world in 100 years seems really hard.

And it seems to me that the stories that I have for how my work ends up making a difference to the world, most of those are just look really unlikely to work if AGI is more than 50 years off. It’s really hard to do research that impacts the world positively more than 50 years down the road. It’s particularly hard to do research that impacts a single event that happens 50 years in the future, positively. I just don’t think I can very likely do that. And if I learned that there was just no way we were going to have AGI in the next 50 years, I would then think, “Well, I should probably really rethink my life plans.”

AI timelines

Okay, so here’s a fun question. When are we going to get AGI? Here’s some ways of thinking about it.

One of them is Laplace’s Law of Succession. This one is: there is some random variable. It turns out that every year that people try to build an AGI, God draws a ball from an urn. And we see if it’s white or black. And if it’s white, he gives us an AGI. And if it’s black, he doesn’t give us an AGI. And we don’t know what proportion of balls in the urn are black. So we’re going to treat that as a random parameter between zero and one.

Now, the first year, your prior on this parameter theta, which is the proportion of years that God gives you an AGI — the first year, you have a uniform prior. The second year, you’re like, “Well, it sure seems like God doesn’t give us an AGI every year, because he didn’t give us one last year.” And I end up with a posterior where you’ve updated totally against the “AGI every year” hypothesis, and not at all against the “AGI never” hypothesis. And the next year, when you don’t get an AGI you update against, and against, and against.

So this is one way to derive Laplace’s Law of Succession. And if you use Laplace’s Law of Succession, then it means that after 60 years of trying to build an AGI, there is now a 1 in 62 chance that you get an AGI next year. So you can say, “Okay. Let’s just use Laplace’s Law of Succession to estimate time until AGI.” And this suggests that the probability of AGI in the next 50 years is around 40%. This is not the best argument in the world, but if you’re just trying to make arguments that are at least kind of vaguely connected to things, then Laplace’s Law of Succession says 40%.

- [Guest] What’s your threshold for even including such an argument in your overall thought process? I’m guessing there are a lot of arguments at that level of… I don’t know.

- I think there are fewer than 10 arguments that are that simple and that good.

- [Guest] This really depends on the size of the step you chose. You chose “one year” arbitrarily. It could have been one second ⁠— God draws a ball a second.

- No, that’s not it. There’s a limit, because in that case, if I choose my shorter time steps, then it’s less likely that God draws me a ball in the next time step. But I also get to check more time steps over the next year.

- [Guest] I see.

- [Guest 2] “Poisson process” is the word you’re looking for, I think.

- Yes, this is a Poisson process.

- [Guest] How is this argument different for anything else, really? Is the input parameter..

So you might say, what does this say about the risk of us summoning a demon next year? I’m going to say, “Well, we’ve been trying to summon demons for a long, long while. — Like 5,000 years.” I don’t know… I agree.

Here’s another way you can do the Laplace’s Law of Succession argument. I gave the previous argument based on years of research since 1960, because that’s when the first conference on AI was. You could also do it on researcher years. As in: God draws from the urn every time a researcher finishes their year of thinking about AI. And in this model, I think that you get a 50% chance in 10 years or something insane like that, maybe less. Because there are so many more researchers now than there used to be. So I think this one gives you ⁠— I’m going to say the medians ⁠— this one gives you around 60 years, which just like, Laplace’s Law of Succession always says you should wait as long as it’s been so far. On researcher years, you get like 10 years or less.

All right, here are some other models you can use. I’m just going to name some quickly. One thing you can do is, you can ask, “Look, how big is a human brain? Now, let’s pretend AGI will be a neural net. How much compute is required to train a policy that is that big? When will we have that amount of compute?” And you can do these kind of things. Another approach is, “How big is the human genome? How long does it take to train a policy that big?” Whatever, you do a lot of shit like this.

Honestly, the argument that’s compelling to me right now is the following. Maybe to build an AGI, you need to have pretty good machine learning, in the kind of way that you have today. Like, you have to have machine learning that’s good enough to learn pretty complex patterns, and then you have to have a bunch of smart people who from when they were 18, decided they were going to try and do really cool machine learning research in college. And then the smart people decide they’re going to try and build AGIs. And if this is the thing that you think is the important input to the AGI creation process, then I think that you notice the amount of smart 18 year olds who decided they wanted to go into AGI is way higher than it used to be. It’s probably 10 times higher than it was 10 years ago.

And if you have Laplace’s Law of Succession over how many smart 18 year olds who turn into researchers are required before you get the AGI, then that also gives you pretty reasonable probabilities of AGI pretty soon. It ends up with me having… today, I’m feeling ~70% confident of AGI in the next 50 years.

Why do I think it’s more likely than one of these other things? Basically, because it seems like it’s pretty soon.

It seems like whole-brain emulation isn’t going to happen that soon. Genetic engineering, I don’t know, and I don’t want to talk about it right now. Bio risk ⁠— there are a lot of people whose job is making really powerful smart ML systems. There are not very many people whose job is trying to figure out how to kill everyone using bioweapons. This just feels like the main argument for why AI is more urgent; it’s just really hard for me to imagine a world where people don’t try to build really smart ML systems. It’s not that hard for me to imagine a world where no very smart person ever dedicates their life to trying really hard to figure out how to kill everyone using synthetic biology. Like, there aren’t that many really smart people who want to kill everyone.

- [Guest] Why aren’t you worried about nuclear war? Like, people killing the U.S. and having nuclear war and a bunch of places where there are AI researchers, and then it just slows it down for awhile. Why think this is not that concerning?

- Ah, seems reasonably unlikely to happen. Laplace’s Law of Succession. We’ve had nuclear weapons for 80 years. (laughs)

Okay, you were like, “Why are you using this Laplace’s Law of Succession argument?” And I’m like, look. When you’re an idiot, if you have Laplace’s Law of Succession arguments, you’re at least limiting how much of an idiot you can be. I think there are just really bad predictors out there. There are people who are just like, “I think we’ll get into a nuclear war with China in the next three years, with a 50% probability.” And the thing is, I think that it actually is pretty healthy to be like, “Laplace’s Law of Succession. Is your current situation really all that different from all the other three-year periods since we’ve had nuclear weapons?”

[Linch notes: Anthropics seems like a nontrivial concern, especially if we’re conditioning on observer moments (or “smart observer moments”) rather than literally “years at least one human is alive”.]

- [Guest] Strictly, it places no limit on how much of an idiot you can be. Because you can modify your prior to get any posterior, using Laplace’s Law of Succession, if you’re careful. Basically. So, if you can justify using a uniform prior, then maybe it limits how much of an idiot you can be, but I don’t think that if a uniform prior yields idiocy, then, I’m not sure it does place a limit.

- For some reason, I feel like people who do this end up being less an idiot, empirically.

- [Guest] Okay, that’s fine.

- All right, we’re going to stand up, and jump up and down five times. And then we’re going to sit down again and we’re going to hear some more of this.

Crux 3: You can do good by thinking ahead on AGI

Okay, number three. You can do good by thinking ahead on AGI. Can one do good by thinking ahead on particular technical problems? The specific version of this is that the kind of AI safety research that I do is predicated on the assumption that there are technical questions which we can ask now such that if we answer them now, AI will then go better.

I think this is actually kind of sketchy as a claim and I think that I don’t see people push back on it quite enough and that meant that I was very happy about the people today who I talked to who pushed back on it, so bonus points to them.

So here’s two arguments that we can’t make progress now.

Problems solve themselves

One is in general, problems solve themselves

Imagine if I said to you: “One day humans are going to try and take humans to Mars. And it turns out that most designs of a spaceship to Mars don’t have enough food on them for humans to not starve over the course of their three-month-long trip to Mars. We need to work on this problem. We need to work on the problem of making sure that when people build spaceships to Mars they have enough food in them for the people who are in the spaceships.”

I think this is a stupid argument. Because people are just not going to fuck this one up. I would just be very surprised if all these people got on their spaceship and then they realized after a week oh geez, we forgot to pack enough food. Because people don’t want to die of starvation on a spaceship and people would prefer to buy things that aren’t going to kill them. And I think this is actually a really good default argument.

Another one is: “Most people have cars. It would be a tremendous disaster if everyone bought cars which had guns in the steering wheels such that if you turn on the accelerator, they shoot you in the face. That could kill billions of people.” And I’m like, yep. But people are not going to buy those cars because they don’t want to get shot in the face. So I think that if you want to argue for AI safety being important you have to argue for a disanalogy between those two examples and the AI safety case.

Thinking ahead is real hard

The other one is: thinking ahead is real hard. I don’t actually know of any examples ever where someone said, “It will be good if we solve this technical problem, because of this problem which is going to come up in 20 years.” I guess the only one I know of is those goddamn quantum computers again, where people decided to start coming up with quantum-resistant security ages ago, such that as soon as we get powerful quantum computers, even though they can break your RSA, you just use one of these other things. But I don’t think they did this because they thought it was helpful. I think they did it because they’re crypto nerds who like solving random theoretical problems. So I can’t name an example of anyone thinking ahead about a technical problem in a useful way.

- [Student] But even there, there’s a somewhat more precise definition of what a quantum computer even is. It’s not clear to me that there’s anything close for what AGI is going to look like. So even that example strikes me as weird.

- You’re saying it’s easier for them to solve their problem than it would be for us to do useful work on AI?

- At least there’s some definition. I actually don’t know what’s going on in their field at all. But I don’t know that there’s any definition of what AGI will look like.

- Yeah. I’m taking that as an argument for why even that situation is an easier case for thinking ahead than the AI safety case.

- Yeah, yeah, like here, what kind of assumption are we very sure about? And I think in our previous conversation you were saying the fact that some objective is going to be optimized or something.

Arguments for thinking ahead

Okay, so I want to argue for the claim that it’s not totally crazy to think about the AI alignment problem right now.

So here are some arguments I want to make, about why I think we can maybe do good stuff now.

By the way, another phrasing of this is, if you could trade one year of safety research now for x years of safety research the year that AGI is developed or five years before AGI is developed, what is the value of x at which you’re indifferent? And I think that this is just a question that you can ask people. And I think a lot of AI safety researchers think that the research that is done the year of building the AGI is just five times or 10 times more important. And I’m going to provide some arguments for why thinking ahead actually might be helpful.


One is relaxations of the problem. By “relaxation”, I mean you take some problem and instead of trying to solve it, you try to solve a different, easier problem.

Here’s what I mean by this: There are a variety of questions whose answer I don’t know, which seem like easier versions of the AI safety problem.

Here’s an example. Suppose someone gave me an infinitely fast computer on a USB drive and I want to do good in the world using my infinitely fast computer on a USB drive. How would I do this? I think this has many features in common with AI safety problem, but it’s just strictly easier because all I’m trying to do is to figure out how to use this incredibly smart, powerful thing that can do lots of stuff, and any thing which you can do with machine learning you can also do with this thing. You can either just run your normal machine learning algorithms or you can do this crazy optimizing over parameter space for whatever architecture you like, or optimizing over all programs for something.

This is just easier than machine learning, but I still don’t know how to use this to make a drug that helps with a particular disease. I’m not even quite sure how to use this safely to make a million dollars on the stock market, though I am relatively optimistic I’d be able to figure that one out. There’s a bunch of considerations.

If I had one of these infinitely fast computers, I don’t think I know how to do safe, useful things with it. If we don’t know how to answer this question now, then no matter how easy it is to align ML systems, it’s never going to get easier than this question. And therefore, maybe I should consider trying to solve this now.

Because if I can solve this now, maybe I can apply that solution partially to the ML thing. And if I can’t solve this now, then that’s really good to know, because it means that I’m going to be pretty screwed when the ML thing comes along.

Another relaxation you can do is you can pretend you have an amazing function approximator, where by “function approximator” I just mean an idealized neural net. If you have a bunch of labeled training data, you can put it in your magical function approximator and it’ll be a really good function approximator on this. Or if you want to do reinforcement learning, you can do this and it’ll be great. I think that we don’t know how to do safe, aligned things using an amazing function approximator, and I think that machine learning is just strictly more annoying to align than this. So that’s the kind of work that I think we can do now, and I think that the work that we do on that might either just be applicable or it might share some problems in common with the actual AI alignment problem. Thoughts, questions, objections?

- [Student] For the halting Oracle thing, are we assuming away the “what if using it for anything is inherently unsafe for spooky universal prior reasons” thing?

- That’s a really great question. I think that you are not allowed to assume away the spooky universal prior problems.

- [Student 2] So what was the question? I didn’t understand the meaning of the question.

- The question is… all right, there’s some crazy shit about the universal prior. It’s a really long story. But basically if you try to use the Solomonoff prior, it’s… sorry, nevermind. Ask me later. It was a technicality. Other questions or objections?

So all right, I think this claim is pretty strong and I think a lot of you probably disagree with it. The claim is, you can do research on AI safety now, even though we don’t know what the AGI looks like, because there are easier versions of the problem that we don’t know how to solve now, so we can just try and solve them. Fight me.

- [Student] You could technically make the problem worse by actually arriving to some conclusions that will help actual AI research, like not safety but like the capabilities research by accident.

- Seems right. Yeah, maybe you should not publish all the stuff that you come up with.

When you’re doing safety research, a lot of the time you’re implicitly trying to answer the question of what early AGI systems will look like. I think there’s a way in which safety research is particularly likely to run into dangerous questions for this reason.

- [Student] So if we say that AGI is at least as good as a human, couldn’t you just relax it to a human? But if you do relax it to just, say, “I’m going to try to make this human or this brain as safe as possible,” wouldn’t that be similar to operations research? In business school, where they design systems of redundancies in nuclear plants and stuff like that?

- So, a relaxation where you just pretend that this thing is literally a human — I think that this makes it too easy. Because I think humans are not going to try and kill you, most of the time. You can imagine having a box which can just do all the things that a human does at 10 cents an hour. I think that it’d be less powerful than an AGI in some ways, but I think it’s pretty useful. Like, if I could buy arbitrary IQ-100 human labor for 10 cents an hour, I would probably become a reseller of cheap human labor.

- [Student] I got a question from Discord. How interpretable is the function approximator? Do you think that we couldn’t align a function approximator with, say, the opacity of a linear model?

- Yes. I mean, in this case, if you have an opaque function approximator, then other problems are harder. I’m assuming away inner alignment problems (apologies for the jargon). Even linear models still have the outer alignment problem.

Analogy to security

Here’s another argument I want to make. I’m going to use security as an analogy. Imagine you want to make a secure operating system, which has literally zero security bugs, because you’re about to use it as the control system for your autonomous nuclear weapons satellite that’s going to be in space and then it’s going to have all these nuclear weapons in it.

So you really need to make sure that no one’s going to be able to hack it and you’re not able to change it and you expect it to be in the sky for 40 years. It turns out that in this scenario you’re a lot better off if you’ve thought about security at the start of the project than if you only try to think about security at the end of the project. Specifically it turns out that there are decisions about how to write software which make it drastically easier or harder to prove security. And you really want to make these decisions right.

And in this kind of a world, it’s really important that you know how one goes about building a secure system before you get down to the tricky engineering research of how to actually build the system. I think this is another situation which suggests that work done early might be useful.

Another way of saying this is to think of operating systems. I want to make an operating system which has certain properties, and currently no one knows how to make an operating system with these properties, but it’s going to need to be built on top of some other properties that we already understand about operating systems and we should figure out how to do those securely first.

This is an argument that people at MIRI feel good about and often emphasize. It’s easier to put security in from the start. Overall I think this is the majority of my reason for why I think that you can do useful safety work starting right now.

I want to give some lame reasons too, like lame meta reasons. Maybe it’s useful for field building. Maybe you think that AI safety research that happens today is just 10 times less useful than AI safety research that happens in the five years before the AGI is built. But if you want to have as much of that as possible it’s really helpful if you get the field built up now. And you have to do something with your researchers and if you have them do the best AI safety research they can, maybe that’s not crazy.

- [Student] Maybe if you jump the gun and you try to start a movement before it’s actually there and then it fizzles out, then it’s going to be harder to start it when it’s really important.

- Yep. So here’s an example of something kinda like that. There are people who think that MIRI, where I work, completely screwed up AI safety for everyone by being crazy on the internet for a long time. And they’re like, “Look, you did no good. You got a bunch of weird nerds on the internet to think AI safety is important, but those people aren’t very competent or capable, and now you’ve just poisoned the field, and now when I try to talk to my prestigious, legit machine learning friends they think that this is stupid because of the one time they met some annoying rationalist.” I think that’s kind of a related concern that is real. Yeah, I think it’s a strong consideration against doing this.

- [Student] I agree with the security argument, but it brings up another objection, which is: even if you “make progress”, people have to actually make use of the things you discovered. That means they have to be aware of it, it has to be cost effective. They have to decide if they want to do it.

- Yeah, all right, I’m happy to call this crux four.

Crux 4: good alignment solutions will be put to use

Good alignment solutions will be put to use, or might be put to use. So I in fact think that it’s pretty likely… So there are these terms like “competitiveness” and “safety tax” (or “alignment tax”) which are used to refer to the extent to which it’s easier to make an unaligned AI than an aligned AI. I think that if it costs you only 10% more to build an aligned AI, and if the explanation of why this AI is aligned is not that hard, as in you can understand it if spend a day thinking about it, I would put more than 50% probability on the people who try to build this AGI using that solution.

The reason I believe this is that when I talk to people who are trying to build AGIs, like people at DeepMind or OpenAI, I’m like, “Yep, they say the right things, like ‘I would like to build my AI to be aligned, because I don’t want to kill everyone’”. And I honestly believe them. I think it’s just a really common desire to not be the one who killed all of humanity. That’s where I’m at.

- [Student] I mean, as a counterargument, you could walk into almost any software company and they’ll pay tons of lip service to good security and then not do it, right?

- Yep, that’s right. And that’s how we might all die. And what I said is in the case where it’s really easy, in the case where it’s really cheap, and it costs you only 10% more to build the AGI that’s aligned, I think we’re fine. I am a lot more worried about worlds where it would have cost you $10 billion to build the subtly unaligned AI but it costs you $100 billion to build the aligned AI, and both of these prices fall by a factor of two every year.

And then we just have to wonder whether someone spends the $100 billion for the aligned AI before someone spends the $10 billion dollars for the unaligned AI; and actually all these figures are falling, maintaining a constant ratio. I think thinking about this is a good exercise.

And even scarier is the thing that I think is actually likely, is that building the aligned AI takes an extra three years or something. And the question will be, “How much of a lead time would the people who are trying to build the aligned one actually have? Is it actually three years, I don’t think it is...”

Wouldn’t someone eventually kill everyone?

- [Student] Even if most people would not want to destroy the human race, isn’t there still that risk there will just be one really dangerous or crazy person who does deliberately want to cause havoc? And how do we deal with that?

- Yeah. I think that long-term, it’s not acceptable to have there be people who have the ability to kill everyone. It so happens that so far no one has been able to kill everyone. This seems good. I think long-term we’re either going to have to fix the problem where some portion of humans want to kill everyone or fix the problem where humans are able to kill everyone.

And I think that you could probably do this through regulating really dangerous technology or modifying how humans work so they aren’t going to kill everyone.

This isn’t a ridiculous change from the status quo. The U.S. government employs people who will come to your house and arrest you if you are trying to make smallpox. And this seems good, because I don’t think it would be good if anyone who wanted to could make smallpox.

Long-term, humanity is not going to let people kill everyone. Maybe it turns out that if you want to build an AGI that can kill everyone, you’d have to have at least three million super GPUs, or maybe you need three TPU pods. Either way, people are going to be like, “Well, you’re not allowed to have three TPU pods unless you’ve got the official licence. There’ll be regulation and surveillance. Maybe the government runs all the TPU pods, a bit like how governments runs all the plutonium and hopefully all of the bioweapons.

So that’s the answer to the question, “Wouldn’t someone always try to kill everyone?”. The answer is yes, unless you make all the humans so they aren’t going to do that by modifying them. But long-term we need to get the risk to zero by making it impossible, and it seems possible to imagine us succeeding at this.

- [Student] Do you think that the solution is better achieved through some sort of public policy thing like that or by something that’s a private tool that people can use? Like, should we go through government or should it be something crowdsourced?

- I don’t like the term crowdsourced very much.

- [Student] I don’t really know why I used that, but something that comes from some open source tool or something like that, or something private.

- I don’t have a strong opinion. It seems like it’s really hard to get governments to do complicated things correctly. Like their $1.2 billion quantum computing grant. (laughs) And so it seems like we’re a bit safer in worlds where we don’t need complicated government action. Like, yeah, I just feel pretty screwed if I need the government to understand why and how to regulate TPU pods because otherwise people will make really dangerous AI. This would be really rough. Imagine trying to explain this to various politicians. Not going to be a good time.

- [Student] (unintelligible)

- Yeah. Most humans aren’t super evil. When I occasionally talk to senior people who work on general AI research, I’m like, “This person, they’re not a saint, but they’re a solid person”.

Here’s a related question — what would happen if you gave some billionaire 10 squillion dollars? If you gave most billionaires in America 10 squillion dollars and they could just rule the world now, I think there’s like at least a 70% chance that this goes really solidly well, especially if they know that one of the things they can do with their AGI is ask it what they should do or whatever. I think that prevents some of the moral risk. That’s where I’m at.

[Post talk note: Some other justifications for this: I think that (like most people) billionaires want, all else equal, to do good things rather than bad things, and I think that powerful technologies might additionally be useful for helping people to do a better job of figuring out what actions are actually good or bad according to their values. And to be clear, hopefully something better happens than handing control of the future to a randomly selected billionaire. But I think it’s worth being realistic about how bad this would be, compared to other things that might happen.]

Sounds like there are some disagreements. Anything you want to say?

- [Student] Yeah. The world, and this country especially, is ruled by the 1%, and I don’t think they’re doing very good things. So I think when it comes to evil and alignment and how money is especially distributed in this country — they don’t have access to AGI just yet, but it would scare me if it was put in their hands. Say, Elon Musk for instance. I mean, I don’t think he’s an evil person — he’s very eccentric, but I don’t think he’s evil — but he’s probably one. Let’s say it was put in the hands of the Rockefellers or somebody like that, I don’t think they would use it for good.

- Yeah, I think this is a place where people...

- [Student] It’s a political argument, yeah.

- Yeah, I don’t know. My best guess is that the super rich people are reasonably good, yeah.

So the place where I’m most scared about this is I care a lot about animal welfare and an interesting fact about the world is that things like technology got a lot better and this meant that we successfully harmed farm animals in much greater numbers.

[Post talk note: If you include wild animal suffering, it’s not clear what the net effect of technology on animal welfare has been. Either way, technology has enabled a tremendous amount of animal suffering.]

And this is kind of a reason to worry about what happens when you take people and you make them wealthier. On the other hand, I kind of believe it’s a weird fluke about the world that animals have such a bad situation. Like, I kind of think that most humans actually do kind of have a preference against torturing animals. And if you made everyone a squillion babillionaire they would figure out the not-torturing-animals thing. These are some things where my intuition comes from.

Crux 5: My research is the good kind

My research is the good kind. My work, or the things that I do, are related to the argument that there are things that you have to figure out ahead of time if you want things to be good. I can’t talk about it in detail, because MIRI doesn’t by default disclose all the research that it does. But that’s what I do.


I’m going to give an estimate of how confident I am in each of these. Every time I do this I get confused over whether I want to give every step conditioned on the previous steps. We’re going to do that.

  1. AI would be a big deal if it showed up here. I’m ~95% sure that if AGI was really cheap and it showed up in a world like this, the world would suddenly look really different. I don’t think I’m allowed to use numbers larger than 95%, because of that one time I made that terrible error. And it’s very hard to calibration train enough, that you’re allowed to say numbers larger than 95%. But I feel really damn sure that the world would look really different if someone built AGI.

  2. AI is plausibly soonish and the next big deal. Given the previous one, not that the conditional matters that much for this one, I feel ~60% confident.

  3. You can do good by thinking ahead on AGI. It’s kind of rough, because the natural product of this isn’t like a probability, it’s like a weighting; it’s like how much worse is it than doing things. I’m going to give this 70%.

  4. Alignment solutions might be put to use by goodish people if you have good enough ones. 70%.

  5. My research is the good kind. Maybe 50%?

Okay, cool, those are the numbers. We can multiply them all together. 60% times 95% times 70% times 70% times 50%.

[Post-talk note: This turns out to be about 14%, which is somewhat lower than my actual intuition for how enthusiastic I am about my work.]


I’m going to take some more questions for a bit.

- [Student] So, is this how MIRI actually chooses what to work on?

- No.

- [Student] So, how does MIRI choose where to allocate resources and then do research?

- I think MIRI is much more into particular mental motions.

- [Student] Mental motions?

- The thinking I’ve been describing is the kind of thinking that I do when I’m saying, “Should I, instead of my job, do a different job?” For instance, I could do EA movement-building work. (Like for example coming to Stanford EA and talking to Stanford students and giving talks.) And I think this is pretty good and I do it sometimes.

When I’m trying to think of what I should do for AI safety in terms of technical research, I would say mostly I just don’t use my own judgment. Mostly I’m just like, “Nate Soares, who runs MIRI, thinks that it would be helpful for him if I did this. And on the couple of domains where I feel like I can evaluate Nate, I think he’s really smart.”

- [Student] Smart in what way? Like, what’s your metric?

- I think that when I talk, Nate is just really, really good at taking very general questions about the world and figuring out how to think about them in ways that get new true answers.

E.g., I talk to him about physics ⁠— and I feel qualified to think about some areas of physics ⁠— and then he just has really smart thoughts and he thinks about them in a really clever way. And I think that whenever I argue with him about AI safety he says pretty smart things.

And then he might tell me he thinks this particular research direction is great. And then I more update based on my respect for Nate and based on his arguments about what type of technical problems would be good to solve, than I update based on my own judgment about the technical problems. This is particularly because there are worldview questions about what type AI alignment research is helpful that I don’t know what I think of.

- [Student] Do you ever consider what you just enjoy doing in a completely outcome-independent way?

- I do occasionally ask the question, what do I enjoy doing? And when I’m considering potential projects, I give bonus of like 2x or 3x to activities that I really enjoy.

- [Student 2] Maybe this is too broad, but why did you choose, or was it a choice, to place your trust on research directions in Nate Soares versus like Paul Christiano or somebody else?

- Well once upon a time I was in a position where I could try to work for MIRI or I could try to work for Paul. I have a comparative advantage of working for MIRI. I have a comparative disadvantage at working for Paul, compared to the average software engineer. Because MIRI wanted some people who were good at screwing around with functional programming and type theory and stuff, and that’s me. And Paul wanted someone who was good at messing around with machine learning, and that’s not me. And I said, “Paul, how much worse do you think my work will be if I go to MIRI?” And he said, “Four times.” And then I crunched some numbers. And I was like, “Okay, how right are different people likely to be about what AI alignment work is important.” And I was like, “Well…”

I don’t ⁠— look, you asked. I’m going to tell you what I actually thought. I don’t think it makes me sound very virtuous. I thought, “Eliezer Yudkowsky from MIRI is way smarter than me. Nate Soares is way smarter than me. Paul Christiano is way smarter than me. That’s two to one.” And that’s how I’m at MIRI.

I would say, time has gone on and now I’ve updated towards Paul’s view of the world in a lot of ways. But the comparative advantage argument is keeping me at MIRI.

- [Student] So if you were to synthesize a human being through biotechnology and create an artificial human then does that count as AI or AGI?

- Eh, I mean I’m interested in defining words inasmuch as they help me reason about the future. And I think an important fact about making humans is that it will change the world if and only if you know how to use that to make really smart humans. In that case I would call that intelligence enhancement, which we didn’t really talk about but which does seem like it deserves to be on the list of things that would totally change the world. But if you can just make artificial humans — I don’t count IVF as AGI, even though there’s some really stupid definition of AGI such that it’s AGI. And that’s just because it’s more useful to have the word “AGI” refer to this computery thing where the prices might fall rapidly and the intelligence might increase rapidly.

- [Student] And what if it’s like some cyborg combination of human and computer and then those arguments do apply, with at least the computer part’s price falling rapidly?

- Yep, that’s a good question. My guess is that the world is not radically changed by human-computer interfaces, or brain interfaces, before it’s radically changed by one of the other things, but I could be wrong. One of the ways in which that seems most likely to change the world is by enabling really crazy mind control or lie detection things.

- [Student] How likely do you think it is that being aware of current research is important for long-term AGI safety work? Because I think a lot of the people from MIRI I talked to were kind of dismissive about knowing about current research because they think it’s so irrelevant that eventually it won’t really yield most benefit in the future. What’s your personal opinion?

- It seems like one really relevant thing that plays into this is whether the current machine learning stuff is similar in important ways to the AI systems that we’re going to build in the future. To the extent you believe that it will be similar, I think the answer is yes, obviously machine learning facts from now are more relevant.

Okay, the following is kind of subtle and I’m not quite sure I’m going to be able to say it properly. But remember when I was saying relaxations are one way you can think about AI safety? I think there’s a sense that if you don’t know how to solve a problem in the relaxed version — if I don’t even know how to do good things with my halting oracle on a USB drive — then I’m not going to be able to align ML systems.

Part of this is that I think facts about machine learning should never make the problem easier. You should never rely on specific facts about how machine learning works in your AI safety solutions, because you can’t rely on those to hold as your systems get smarter.

If empirical facts about machine learning systems should never be relied on in your AI safety solutions, and there are just not that many non-empirical facts about machine learning, then if you just think of machine learning as magical function approximators, that’s just most of the structure of machine learning that is safe to assume. So that’s an argument against caring about machine learning.

- [Student] Or any prior knowledge, I guess? The same argument could be made about any assumptions about a system that might not hold in the future.

- That’s right. That’s right, it does indeed hold there as well.

- [Student] Yeah.

- So the main reason to know about machine learning from this perspective, is it’s really nice to have concrete examples. If you’re studying abstract algebra and you’ve never heard of any concrete examples of a group, you should totally just go out and learn 10 examples of a group. And I think that if you have big theories about how intelligence works or whatever, or how function approximators work, it’s absolutely worth it to know how machine learning works in practice because then you might realize that you’re actually an idiot and this is totally false. So I think that it’s very worthwhile for AI safety researchers to know at least some stuff about machine learning. Feel free to quiz me and see whether you think I’m being virtuous by my own standards. I think it’s iffy. I think it’s like 50-50 on whether I should spend more or less time learning machine learning, which is why I spend the amount of time I spend on it.

- [Student] From a theoretical standpoint, like Marcus Hutter’s perspective, there’s a theory of general AI. So to make powerful AGI, it’s just a question of how to create a good architecture which can do Bayesian inference, and it’s a question of how to run it well in hardware. It’s not like you need to have great insights which one guy could have, it’s more about engineering. And then it’s not 10% which is added to cost to do safety; we need to have a whole team which would try to understand how to do safety. And it seems that people who don’t care about safety will build the AGI faster than that, significantly faster than people who care about safety. And I mean how bad is it?

- I heard many sentences and then, “How bad is it?”. And the sentences made sense.

How bad is it? I don’t know. Pretty bad?

In terms of the stuff about AIXI, my answer is kind of long and I kind of don’t want to give it. But I think it’s a pretty bad summary to say “we already know what the theoretical framework is and we’re just doing engineering work now”. That’s also true of literally every other technical subject. You can say all of chemistry is like — I already know how to write down the Schrodinger equation, it’s “just engineering work” to answer what chemicals you get. Also, all of biology is just the engineering work of figuring out how the Schrodinger equation explains ants or something. So I think that the engineering work is finding good algorithms to do the thing. But this is also work which involves however much theoretical structure. Happy to talk about this more later.

- [Student] Do you disagree with Paul Christiano on anything?

- Yes.

- [Student] Or with other smart people?

- So, Paul Christiano is really smart and it’s hard to disagree with him, because every time I try to disagree with him, I’d say something like, “But what about this?” And he’s like, “Oh, well I would respond to that with this rebuttal”. And then I’m like, “Oh geez, that was a good rebuttal”. And then he’d say something like, “But I think some similar arguments against my position which are stronger are the following” and then he rattles off four better arguments against his position and then he rebuts those and it’s really great. But the places where I most think Paul is wrong, I think Paul is maybe wrong about… I mean, obviously I’m betting on MIRI being better than he thinks. Paul would also think I should quit my job and work on meta stuff probably.

- [Student] Meta stuff?

- Like, work on AI safety movement building.

The biggest thing where I suspect Paul Christiano is wrong is, if I had to pick a thing which feels like the simplest short sweet story for a mistake, it’s that he thinks the world is metaphorically more made of liquids than solids.

So he thinks that if you want to think about research you can add up all the contributions to research done by all the individuals and each of these is a number and you add the numbers together. And he thinks things should be smooth. Before the year in which AGI is worth a trillion dollars, it should have been worth half a trillion dollars and you can look at the history of growth curves and you can look at different technological developments and see how fast they were and you can infer all these things from it. And I think that when I talk to him, I think he’s more smooth-curve-fitting oriented than I am.

- [Student] Sorry, I didn’t follow that last part.

- A thing that he thinks is really compelling is that world GDP doubles every 20 years, and has doubled every 20 years or so for the last 100 years, maybe 200 years, and before that doubled somewhat more slowly. And then before the Industrial Revolution it doubled every couple hundred years. And he’s like, “It would be really damn surprising if the time between doublings fell by a factor of two.” And he argues about AI by being like, “Well, this theory about AI can’t be true, because if that was true then the world would have doubling times that changed by more than this ratio.”

[Post-talk note: I believe the Industrial Revolution actually involved a fall of doubling times from 600 to 200 years, which is a doubling time reduction of 3x. Thanks to Daniel Kokotajlo for pointing this out to me once.]

- [Student] But I guess the most important things are things that are surprising. So all of these kind of, it just strikes me as sort of a—

- I mean, I think he thinks your plans are good according to the expected usefulness they have. And he’s like, “Look, the world is probably going to have a lot of smooth curves. There’s probably going to be a four-year period in which the economy doubles before there’s a one-year period in which the economy doubles.” And I’m less inclined to take that kind of argument as seriously.

We are at time. So I want to get dinner with people. So I’m going to stand somewhere and then if you stand close enough to me you might figure out where I’m getting dinner, if you want to get dinner with me afterwards. Anything else supposed to happen before we leave here? Great, thanks so much.