AI Progress: The Game Show

I’ll say here at the top that this post is primarily inspired by the two most recent episodes of the Future of Life Institute podcast, which are both sections of an interview with Connor Leahy. I can’t pretend to have a strong grasp on all of Connor’s thoughts and feelings about AI risk, but I think it’s fair to say that he is largely pessimistic about the most likely outcomes of AI research, and that the timelines for risk are shorter than what many people anticipate.

Some rough priors

So first I want to ask, what do you believe the odds are of AI producing extremely bad outcomes within your lifetime? For the purposes of this post, you can be very fuzzy with your probabilities, about what “very bad” means, and what “in your lifetime” means. For myself, I’m thinking about something like billions of deaths/civilizational collapse/permanent totalitarian state/human extinction/extinction of all life, and sometime in the next 50 years.

Next, what are the odds of AI bringing about extremely good outcomes (again, you can be fuzzy with the definitions) in the same time period?

The thought experiment

Imagine you are in the audience for a television game show. The premise is simple: someone is going to be brought up onto the stage, where there are 100 identical briefcases. The contestant will choose one briefcase to open. Inside 65 briefcases is absolutely nothing, nothing happens if they are selected and the game ends. 30 random briefcases each hold a golden ticket indicating that the contestant and everyone in the audience has won an incredible sum of money, say $1 billion each. The final 5 random briefcases hold canisters of poison gas that will kill everyone in the audience. Does it kill you quickly and painlessly, or slowly and painfully? Stay tuned and you might find out!

If you’re in the audience, you might be asking yourself if you even want to play this game at all. And you know, it kind of seems like most people in the audience don’t even understand what’s going on or what the potential “prizes” are. But you don’t have long to contemplate it, because just a few people in the audience have decided that it’s clearly rational to open a briefcase and they’re clambering over each other to open one as quickly as possible.

Would you want to open a briefcase? Would you even want to be in the audience? If you found yourself in the audience against your will and see someone about to open a briefcase against your wishes, do you try to stop them? By what means?

Conclusion

There are days where the metaphor above feels like an apt description of the AI research landscape. Obviously you can change the distribution of “prizes” in the briefcases to match your priors about possible outcomes. But how much does that distribution have to change for your conclusions about the correct course of action to change as well?

Even if the negative consequences are lesser- say, permanent unemployment and poverty for a significant fraction of my lifetime- I still don’t think I want to play the game. At the very least I don’t want to rush into it. But nobody asked me. Nobody asked my six year old daughter. Nobody asked the vast majority of the audience who aren’t even aware that they are the stakes in a game that somebody else is playing.

The EA community has tremendous overlap with the AI community. The 80,000 Hours job board is advertising jobs for Open AI right now. Many AI researchers consider themselves to be EAs as well. Why are we pursuing this course of action as quickly as we can? It’s not fair, it’s not just, it’s not right. It’s like a trolley problem where there’s plenty of time to apply the brakes or dismantle the tracks, but instead we’re accelerating. It would only take the actions of a very small number of people to slow or halt advances in AI until we have a much clearer view of the safety of advanced systems. The first and most important step is to just STOP.