[Linkpost] Scott Alexander reacts to OpenAI's latest post

Link post

Scott Alexander recently wrote a post about OpenAI’s Planning for AGI and beyond. I found it thoughtful, and I think others here might want to read or discuss it.

Some highlights:

ExxonMobil analogy

Imagine ExxonMobil releases a statement on climate change. It’s a great statement! They talk about how preventing climate change is their core value. They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend. So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible. They even put in firm commitments that people can hold them to.
An environmentalist, reading this statement, might have thoughts like:
Wow, this is so nice, they didn’t have to do this.
I feel really heard right now!
They clearly did their homework, talked to leading environmentalists, and absorbed a lot of what they had to say. What a nice gesture!
And they used all the right phrases and hit all the right beats!
The commitments seem well thought out, and make this extra trustworthy.
But what’s this part about “in the future, when climate change starts to be a real threat”?
Is there really a single, easily-noticed point where climate change “becomes a threat”?
If so, are we sure that point is still in the future?
Even if it is, shouldn’t we start being careful now?
Are they just going to keep doing normal oil company stuff until that point?
Do they feel bad about having done normal oil company stuff for decades? They don’t seem to be saying anything about that.
What possible world-model leads to not feeling bad about doing normal oil company stuff in the past, not planning to stop doing normal oil company stuff in the present, but also planning to do an amazing job getting everything right at some indefinite point in the future?
Are they maybe just lying?
Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil?
This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond.

Doomer argument: Acceleration burns time

Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.
Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty.

Response to OpenAI’s argument that gradual deployment helps society prepare for dangerous AI systems

You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:
Release AI #1
Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
Then release AI #2
Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
And so on . . .
Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them.

Response to three other arguments in favor of acceleration (“we want safety-conscious actors to be ahead”, compute overhang, and “we want to demonstrate dangers as quickly as possible so the world takes AI safety more seriously”)

These three lines of reasoning argue that that burning a lot of timeline now might give us a little more timeline later. This is a good deal if:
Burning timeline now actually buys us the extra timeline later. For example, it’s only worth burning timeline to establish a lead if you can actually get the lead and keep it.
A little bit of timeline later is worth a lot of timeline now.
Everybody between now and later plays their part in this complicated timeline-burning dance and doesn’t screw it up at the last second.
I’m skeptical of all of these.
DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.
The alignment researchers I’ve talked to say they’ve already got their hands full with existing AIs. Probably they could do better work with more advanced models, but it’s not an overwhelming factor, and they would be happiest getting to really understand what’s going on now before the next generation comes out. One researcher I talked to said the arguments for acceleration made sense five years ago, when there was almost nothing worth experimenting on, but that they no longer think this is true.
Finally, all these arguments for burning timelines require that lots of things go right later. The same AI companies burning timelines now turn into model citizens when the stakes get higher, and convert their lead into improved safety instead of capitalizing on it to release lucrative products. The government responds to an AI crisis responsibly, rather than by ignoring it or making it worse.

OpenAI’s impact on timelines thus far

On the other hand—man, they sure have burned a lot of timeline. The big thing all the alignment people were trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave. Even after Musk left, the remaining team did everything to challenge everyone else to a race short of shooting a gun and waving a checkered flag.
OpenAI still hasn’t given a good explanation of why they did this. Absent anything else, I’m forced to wonder if it’s just “they’re just the kind of people who would do that sort of thing”—in which case basically any level of cynicism would be warranted.
I hate this conclusion. I’m trying to resist it. I want to think the best of everyone. Individual people at OpenAI have been very nice to me. I like them. They’ve done many good things for the world.

FTX fallout and analogy to OpenAI

Scott, suppose a guy named Sam, who you’re predisposed to like because he’s said nice things about your blog, founds a multibillion dollar company. It claims to be saving the world, and everyone in the company is personally very nice and says exactly the right stuff. On the other hand it’s aggressive, seems to cut some ethical corners, and some of your better-emotionally-attuned friends get bad vibes from it. Consider the possibility that either they’re lying and not as nice as they sound, or at the very least that they’re not as smart as they think they are and their master plan will spiral out of control before they’re able to get to the part where they do the good things.

Praise for commitment to independent evals, stop-and-assist clause, and hope for a more safety-conscious OpenAI moving forward

Realistically we’re going to thank them profusely for their extremely good statement, then cross our fingers really hard that they’re telling the truth.
OpenAI has unilaterally offered to destroy the world a bit less than they were doing before. They’ve voluntarily added things that look like commitments—some enforceable in the court of public opinion, others potentially in courts of law. Realistically we’ll say “thank you for doing that”, offer to help them turn those commitments into reality, and do our best to hold them to it. It doesn’t mean we have to like them period, or stop preparing for them to betray us. But on this particular sub-sub-topic we should take the W.

Where OpenAI goes, other labs might follow. The past eight years of OpenAI policy have been far from ideal. But this document represents a commitment to move from safety laggard to safety model, and I look forward to seeing how it works out.

[Linkpost] Scott Alexander reacts to OpenAI’s latest post