Optimism, AI risk, and EA blind spots

Preface

I’m going to start this post with a personal story, in part because people tend to enjoy writing that does that. If you don’t want the gossip, just skip section one, the takeaway of which is: “EA has a strong cultural bias in favor of believing arbitrary problems are solvable”.

The gossip—and this takeaway—are not the only insight I’m trying to communicate. I don’t mean for this post to be a “community” post overall, but rather one that is action relevant to doing good on the object level.

N=1

I had a two week work trial with a prominent EA org. There were some red flags. Nobody would tell me the projected salary, despite the job opportunity taking place across the country and in one of the most expensive cities on Earth. But whatever. I quit my job and flew over.

It didn’t work out. My best guess is that this was for cultural reasons. My would-be manager didn’t think I’d been making fast enough progress understanding a technical framework, but the jobs I’ve had since have involved that framework, and I’ve received overwhelmingly positive feedback, working on products dramatically more complicated than the job opportunity called for. C’est la vie.

Much later, I was told some of the things in my file for that organization. I was told by the organization’s leader in a totally open way—nothing sneaky or “here’s the dirt”, just some feedback to help me improve. I appreciate this, and welcomed it. But here’s the part relevant to the post:

One of the negative things in my file was that someone had said I was “a bit of a downer”. Much like with my technical competency, maybe so. But it’s worth mentioning that in my day to day life, my coworkers generally think I’m weirdly positive, and often comment that my outlook is shockingly sanguine.

I believe that both are true. I’m unusually optimistic. But professional EA culture is much, much more so.

That’s not a bad thing (he said, optimistically). But it’s also not all good.

(Why) is there an optimism bias?

If you want to complete an ambitious project, it’s extremely useful to presume that (almost) any challenge can be met. This is a big part of being “agentic”, a much-celebrated and indeed valuable virtue within the EA community. (And also within elite circles more generally.) The high-end professional world has lots of upside opportunities and relatively little downside risk (you will probably always find a pretty great job as a fallback), so it’s rational to make lots of bets on long odds and try to find holy grails.

Therefore, people who are flagged as “ambitious”, “impressive”, “agentic”, will both be selected for and encouraged to further cultivate a mindset where you never say a problem is insurmountable, merely challenging or, if you truly must, “not a top priority right now”.

But yeah. No odds are too long to be worth a shot!

How is this action relevant?

To avoid burying the lede, it’s a major part of my reasoning to donate my 10% pledge to the Against Malaria Foundation, rather than x-risk reduction efforts. I’ll trace out the argument, then pile on the caveats.

On the 80,000 Hours Podcast, Will MacAskill put the odds of a misaligned AI takeover around 3%. Many community figures put the odds much higher, but I feel pretty comfortable anchoring on a combination of Will and Katja Grace, who put the odds at 7% that AI destroys the world. Low to mid single digits. Okay.

So here’s a valid argument, given its premises:

Premise One: There is at least a 6% chance that AI destroys the world, or removes all humans from it.

Premise Two: There exist interventions that can reliably reduce the risk we face by at least 10% (of the risk, not of the total—so 6% would turn into 5.4%, not −4%/​0%).

Premise Three: We can identify these interventions with at least 10% probability.

Premise Four: We can pursue these interventions, and have at least 10% odds of succeeding, provided we’ve found the right ones.

Premise Five: If the world ends, about 8 billion people die.

Conclusion One: Pursuing the basic plan entailed in premises 1-4 saves, in expectation, at least 480,000 lives (800,000 * 0.06 * 0.1 * 0.1 * 0.1).

Let’s take that as an anchor point and add two further premises.

Premise Six: The (next) best opportunity to save human lives is the Against Malaria Foundation, and saving lives through AMF costs approximately $4,000 per life.

Premise Seven: We want to save as many lives as possible in expectation.

Conclusion Two: We should pursue AI x-risk mitigation if strategies in line with the above premises cost $1.92B or less.

This is simplified in many ways. I could see arguments to challenge every premise in every direction. And, of course, longtermist arguments massively change the calculus by counting all future generations on the ledger (and guessing that there’s some chance there will be a truly staggering number of such generations).

We’ll come back to longtermism, which deserves its own section. But first let’s focus on premises three and four.

Are we overoptimistic about key premises?

Yes, I think so.

Premise Three: We can identify these interventions with at least 10% probability.

I won’t dispute that there exist interventions that would reduce risk. There exist actions that achieve nearly anything. But can we find them?

Recall that there are decent reasons to think goal alignment is impossible—in other words, it’s not a priori obvious that there’s any way to declare a goal and have some other agent pursue that goal exactly as you mean it.

Recall that engineering ideas very, very rarely work on the first try, and that if we only have one chance at anything, failure is very likely.

Recall that interventions that slow technological development in the face of strong economic pressures seem extremely hard to find, to the degree that I’m not sure one has ever worked (and if one had, I’d guess it had been in a dictatorship rather than a liberal democracy).

10% seems significantly too high to me, even to reduce the relative risk by 1%. As it looks to me, there are broad swathes of possible worlds where the problem either basically never comes up or solves itself, and broad swathes of possible worlds where our trajectory is already set in stone and we’re going down.

The presumption that we live in the sweet spot and need merely roll up our sleeves strikes me as an example of powerful optimism bias.

Premise Four: We can pursue these interventions, and have at least 10% odds of succeeding, provided we’ve found the right ones.

Recall that getting “humanity” to agree on a good spec for ethical behavior is extremely difficult: some places are against gene drives to reduce mosquito populations, for example, despite this saving many lives in expectation.

Recall that there is a gigantic economic incentive to keep pushing AI capabilities up, and referenda to reduce animal suffering in exchange for more expensive meat tend to fail.

Recall that we have to implement any solution in a way that appeals to the cultural sensibilities of all major and technically savvy governments on the planet, plus major tech companies, plus, under certain circumstances, idiosyncratic ultra-talented individual hackers.

The we-only-get-one-shot idea applies on this stage too.

So again, 10% strikes me as really optimistic. It’s worth mentioning here, too, that I don’t tend to see these premises valued at 10% in most analyses, or even part of the calculation. Most often it’s taken as a given that levers exist to reduce risk by at least 1% (or much more), and that we’re competent to push those levers.

$1.92B to save 480,000 lives in expectation is a great deal. But it seems really, really rosy to think we can accomplish simultaneous extremely difficult and currently poorly specified political, philosophical, and technical tasks on a global scale at that price point. Heck, we’ve been working on figuring out how good deworming is for a decade. This stuff is hard.

So, presuming shorttermism, AMF looks like the better option. Of course, we shouldn’t presume shorttermism. Let’s get into that.

The promised longtermism section

Thank you for your characteristic patience, longtermists.

I am going to risk looking stupid here, because longtermist arguments are often strong and generally really complicated. Please presume I know what I’m talking about and will get to relevant objections, then judge me extra if I in fact leave a big one out.

Here’s a longtermist argument as I understand it:

Premise One: If you do the math on certain x-risk reduction initiatives with the proposed benefit as being “save 8 billion lives”, it may or may not be cost effective.

Premise Two: However, extinction is a really big deal beyond that, because we lose not only the 8 billion lives, but also however many lives there would have been counterfactually.

These premises are both totally solid. I am not nearly so arrogant as to pick a fight with the legendary Derek Parfit.

Premise Three: In expectation, there are extremely many people in the long-term future, such that we should model x-risk reduction initiatives as saving orders of magnitude more lives than a mere 8 billion.

I am not as sold on premise three.

Some basic (and troubling) anchor points

Recall that until nuclear weapons were developed, humanity had no realistic shot at accidentally messing up the biome we live in so badly that our survival as a species was at risk.

Recall that bioweapons—including engineered pathogens that could kill almost everyone—are possible and protections against biological attack are spectacularly underfunded (for the record, I am very in favor of improving this state of affairs and think people are doing incredible work here).

Scenarios if we pass the (potential) AI sieve

Imagine that we create well-aligned AGI. There are several things this could mean.

Scenario One: The AGI does global surveillance good enough to prevent rogue actors from destroying the planet, no matter how powerful technology gets. Everyone’s cool with this level of surveillance, and it doesn’t cause permanent rule by some bad ideology or other. Also, no cosmic threats happen to manifest.

Scenario Two: Same as scenario one, but there’s a black hole/​alien invasion/​unstoppable asteroid/​solar flare/​some other astronomical event we don’t know about yet that unavoidably destroys the planet in the next millennium or two. (I don’t think this scenario is likely, but it is possible.)

Scenario Three: The AGI causes massive technological progress, but there are actually lots of AGIs basically at once. None of them is trying to kill us all, but none of them is given permission to surveil everyone all the time. We have many more “make sure the world doesn’t get destroyed” sieves to get through as a species, and over time it gets easier and easier for rogue actors or industrial accidents to kill us all. Eventually, and probably in under 1,000 years, one does.

Scenario Four: The AGI causes massive technological progress, but less than we currently imagine a “singularity” to look like, due to limits to returns from intelligence far above what our species can reach, but far short of godlike powers. Same problems as scenario three, but slower.

And many others! The most likely case to me is that if AI x-risk is solved or turns out not to be a serious issue, and we just keep facing x-risks in proportion to how strong our technology gets, forever. Eventually we draw a black ball and all die. If technology keeps improving really fast, that’s likely in the next 500 years or so. The second most likely case to me is stuff just gets so weird as to be unrecognizable, but it’s not straightforwardly catastrophic.

Key Longtermist Objection: Use expected value

Okay, a longtermist might say. Maybe the odds are really slim that we thread this needle, and then also the subsequent needles required to create an interstellar civilization spanning billions of years. But the value of that scenario is so high that if you shut up and multiply, it’s worth putting a lot of resources in that direction.

To which I say… man, I dunno, this starts to feel like Pascal’s Mugging. There are a lot of unknown unknowns, interstellar travel seems really hard and like no particular generation has a strong incentive to bear the huge sacrifices to make it happen, and it’s just very suspicious to suppose we’re at the precipice of the one sieve that matters most and all further ones are comfortably manageable by our descendants. We’re just getting into territory that feels roughly analogous to assigning some probability mass to each fundamentalist major religion being true. I can’t easily put into words why I don’t want to do that, but I really, really don’t, and I feel like digging deeper into it will make me less sane rather than more sane.

End of longtermist section

So okay, am I anti-longtermism? No, I don’t think so. I think Will MacAskill’s argument that we’re dramatically underfunding the long term future is just straightforwardly right, even if that future only extends 500 years in expectation. On net we should move the needle up.

But that doesn’t mean the most cost effective interventions are longtermist-motivated x-risk reduction, just that longtermist organizations and projects are way better than the baseline in terms of projects that exist and are funded/​well regarded.

So, scrap AI x-risk projects?

No! Actually, this is awkward, because I actually spend a bunch of my own time professionally working on those. I find AI really interesting, and I think that much like with longtermism, more resources should go toward AI safety rather than fewer, and we as a planet should take AI risk more seriously.

But much like Tyler Cowen’s view on EA as a whole, I think something can be overrated locally and (badly) underrated globally. Buy your local AI safety researcher a coffee. Heck, buy your local AI safety research editor (that’s me) a coffee, while you’re at it.

But does it beat AMF? Is it clearly the highest value altruistic project available? Is it a slam dunk that reducing x-risk from AI should be our North Star?

I don’t think so. I think the assumptions behind that conclusion are biased severely—maybe irrecoverably—by a culture of intense optimism. When I put on my AI x-risk hat, I myself prefer to have that optimism. Conditional on having selected that as a project to work on, it’s probably good.

But when I’m deciding where to donate, and I do the math, I genuinely think the most effectively altruistic option is just saving little children from malaria.