The following long section is about what OpenAI could be thinking – and might also translate to Anthropic. (The rest of the post is also worth checking out.)
Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later
OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence—whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc—we can do this better when everything is happening gradually and we’ve got concrete AIs to think about:
“We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.
A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.”
You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:
Release AI #1
Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them.
The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed-ahead.
So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this:
TheRace Argument:
Bigger, better AIs will make alignment research easier. At the limit, if no AIs exist at all, then you have to do armchair speculation about what a future AI will be like and how to control it; clearly your research will go faster and work better after AIs exist. But by the same token, studying early weak AIs will be less valuable than studying later, stronger AIs. In the 1970s, alignment researchers working on industrial robot arms wouldn’t have learned anything useful. Today, alignment researchers can study how to prevent language models from saying bad words, but they can’t study how to prevent AGIs from inventing superweapons, because there aren’t any AGIs that can do that. The researchers just have to hope some of the language model insights will carry over. So all else being equal, we would prefer alignment researchers get more time to work on the later, more dangerous AIs, not the earlier, boring ones.
“The good people” (usually the people making this argument are referring to themselves) currently have the lead. They’re some amount of progress (let’s say two years) ahead of “the bad people” (usually some combination of Mark Zuckerberg and China). If they slow down for two years now, the bad people will catch up to them, and they’ll no longer be setting the pace.
So “the good people” have two years of lead, which they can burn at any time.
If the good people burn their lead now, the alignment researchers will have two extra years studying how to prevent language models from saying bad words. But if they burn their lead in 5-10 years, right before the dangerous AIs appear, the alignment researchers will have two extra years studying how to prevent advanced AGIs from making superweapons, which is more valuable. Therefore, they should burn their lead in 5-10 years instead of now. Therefore, they should keep going full speed ahead now
TheCompute Argument:
Future AIs will be scary because they’ll be smarter than us. We can probably deal with something a little smarter than us (let’s say IQ 200), but we might not be able to deal with something much smarter than us (let’s say IQ 1000).
If we have a long time to study IQ 200 AIs, that’s good for alignment research, for two reasons. First of all, these are exactly the kind of dangerous AIs that we can do good research on—figure out when they start inventing superweapons, and stamp that tendency out of them. Second, these IQ 200 AIs will probably still be mostly on our side most of the time, so maybe they can do some of the alignment research themselves.
So we want to maximize the amount of time it takes between IQ 200 AIs and IQ 1000 AIs.
If we do lots of AI research now, we’ll probably pick all the low-hanging fruit, come closer to optimal algorithms, and the limiting resource will be compute—ie how many millions of dollars you want to spend building giant computers to train AIs on. Compute grows slowly and conspicuously—if you’ve just spent $100 million on giant computers to train AI, it will take a while before you can gather $1 billion to spend on even gianter computers. Also, if terrorists or rogue AIs are gathering a billion dollars and ordering a giant computer from Nvidia, probably people will notice and stop them.
On the other hand, if we do very little AI research now, we might not pick all the low-hanging fruit, and we might miss ways to get better performance out of smaller amounts of compute. Then an IQ 200 AI could invent those ways, and quickly bootstrap up to IQ 1000 without anyone noticing.
So we should do lots of AI research now.
TheFire Alarm Argument:
Bing’s chatbot tried to blackmail its users, but nobody was harmed and everyone laughed that off. But at some point a stronger AI will do something really scary—maybe murder a few people with a drone. Then everyone will agree that AI is dangerous, there will be a concerted social and international response, and maybe something useful will happen. Maybe more of the world’s top geniuses will go into AI alignment, or will be easier to coordinate a truce between different labs where they stop racing for the lead.
It would be nice if that happened five years before misaligned superintelligences building superweapons, as opposed to five months before it, since five months might not be enough time for the concerted response to do something good.
As per the previous two arguments, maybe going faster now will lengthen the interval between the first scary thing and the extremely dangerous things we’re trying to prevent.
These three lines of reasoning argue that that burning a lot of timeline now might give us a little more timeline later. This is a good deal if:
Burning timeline now actually buys us the extra timeline later. For example, it’s only worth burning timeline to establish a lead if you can actually get the lead and keep it.
A little bit of timeline later is worth a lot of timeline now.
Everybody between now and later plays their part in this complicated timeline-burning dance and doesn’t screw it up at the last second.
Why does a small secretive group of ppl who plan to do some sort of a “world AI revolution” that brings “UBI” (without much plan on how exactly) is by-default considering itself “good”
I’m one of those who was into this secretive group of people before, only to see how much there is on the outside.
Not everyone think what currently is is “good by-default”
Goodness comes from participation, listening, talking to each other. Not necessarily from some moral theory.
I call to discuss this plan with larger public. I think it will go well and I have evidence for this if you’re interested.
The most recent Scott Alexander Post seems potentially relevant to this discussion.
The following long section is about what OpenAI could be thinking – and might also translate to Anthropic. (The rest of the post is also worth checking out.)
This analysis seems to be considering only the future value, ignoring current value. How does it address current issues, like ones here?
https://forum.effectivealtruism.org/posts/bmfR73qjHQnACQaFC/call-to-demand-answers-from-anthropic-about-joining-the-ai?commentId=ZxxC8GDgxvkPBv8mK
Why does a small secretive group of ppl who plan to do some sort of a “world AI revolution” that brings “UBI” (without much plan on how exactly) is by-default considering itself “good”
I’m one of those who was into this secretive group of people before, only to see how much there is on the outside.
Not everyone think what currently is is “good by-default”
Goodness comes from participation, listening, talking to each other. Not necessarily from some moral theory.
I call to discuss this plan with larger public. I think it will go well and I have evidence for this if you’re interested.
Thank you.