Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope
The people arguing against stopping (or pausing) either have long timelines or low p(doom).
The tl;dr is the title. Below I try to provide a succinct summary of why I think this is the case (read just the headings on the left for a shorter summary).
Timelines are short
The time until we reach superhuman AI is short.
“While superintelligence seems far off now, we believe it could arrive this decade.”—OpenAI
GPT-4
Artificial General Intelligence (AGI) is now in its ascendency. GPT-4 is already ~human-level at language and image recognition, and is showing sparks of AGI. This was surprising to most, and led to many people reducing their AI timelines.
GPT-4’s university exam [OpenAI, 2023].Massive investment
Following GPT-4, the race to superhuman AI has become frantic. Google DeepMind has heavily invested in a competitor (Gemini), and Amazon has recently invested $4B into Anthropic, following the success of its Claude models. We now have 7 of the world’s 10 largest companies heavily investing into frontier AI[1]. Inflection and xAI have also entered the race.
Scaling continues unabated
It’s looking highly likely that the current paradigm of AI architecture (Foundation models), basically just scales all the way to AGI. There is little evidence to suggest otherwise. Thus it’s likely that all that is required to get AGI is throwing more money (compute and data) at it. This is ready happening to the point where AGI is likely “in the post” already, with a delivery time of 2-5 years[2].
Graph adapted[3] from Extrapolating performance in language modeling benchmarks [Owen, 2023]. An MMLU score of 90% represents human expert level across many (57) subjects.
General cognition
Foundation model AIs are “General Cognition Engines”[4]. Large multimodal models – text-, image-, audio-, video-, VR/games-, robotics[5]-manipulation by a single AI – will arrive very soon and will be ~human-level at many things: physical as well as mental tasks; blue collar jobs[6] in addition to white collar jobs.
Takeoff to Artificial Superintelligence (ASI)
Regardless of whether an AI might be considered AGI (or TAI) or superhuman AI, what’s salient from an extinction risk (x-risk) perspective is whether it is able to speed up AI research. Once an AI becomes as good at AI engineering as a top AI engineer (reaches the “Ilya Threshold”), then it can be used to rapidly speed up further AI development. Or, an AI that is highly capable at STEM could pose an x-risk as is—a “teleport” to x-risk as opposed to a “foom”. Worrying developments pushing in these directions include larger context windows, plugins, planners and early attempts at recursive self-improvement.
There are no bottlenecks
Compute: Following Epoch, it looks like there is currently a 10.5x[7] growth of effective training FLOP every year. So if we were at 1E25 FLOP in 2022 (GPT-4), then that is 1E28 effective FLOP in 2025 and 1E31 in 2028. And 1E28-1E31 is probably AGI (near perfect performance on human-level tasks)[8].
This isn’t just a theoretical projection: Inflection are claiming to be on for having 100x GPT-4 (2E27 FLOP) in 18 months[9]. Google may well have a Zettaflop (1E21) cluster by 2025; 4 months of runtime (1E7s) gives 1E28 FLOP[10]. It’s plausible that an actor spending $10B on compute now could get a 1E28 FLOP model in months.
Projection of training FLOP for frontier AI over the next 4 years (see the above two paragraphs for a detailed explanation).
Data: There are billions of high-res video cameras (phones), and sensors in the world. There is YouTube (owned by Google). There is a fast-growing synthetic data industry[11]. For specific tasks that require a high skill level, there is training data provided by narrow AI specialised in the skill. Regarding the need for active learning for AGI, see DeepMind’s Adaptive Agent and the Voyager “lifelong learning” Minecraft agent, significant steps in this direction. Also, humans reach human level performance with a fraction of the data used by current models, so we know this is possible.
Ability to influence the real world: Error correction; design iteration; trial and error[12]. All are clearly not insurmountable, given the evidence we have of human success in the physical world. But can it happen quickly with AI? Yes. There are many ways that AI can easily break out into the physical world, involving hacking networked machinery and weapons systems, social persuasion, robotics, and biotech.
p(doom|AGI) is high
“I haven’t found anything. Usually any proposal I read about or hear about, I immediately see how it would backfire, horribly.” “My position is that lower level intelligence cannot indefinitely control higher level intelligence”—Roman Yampolskiy
The most likely (default) outcome of AGI is doom[13][14]. Superhuman AI is near (see above), and viable solutions for human safety from extinction (x-safety) are nowhere in sight[15]. If you apply a security mindset (Murphy’s Law) to the problem of securing x-safety for humans in a world with smarter-than-human AI, it should quickly become apparent that it is very difficult.
Slow alignment progress
The slow progress of the field to date is further evidence for this. The alignment paradigms used for today’s frontier models only appear to make them relatively safe because they are (currently) weaker than us[16]. If the “grandma’s bedtime story napalm recipe” prompt engineering hack actually led to the manufacture of napalm, it would be immediately obvious how poor today’s level of practical AI alignment is[17].
4 disjunctive paths to doom
There are 4 major unsolved disjunctive components to x-safety: outer alignment, inner alignment, misuse risk, multi-agent coordination. And a variety of corresponding threat models.
Any given approach that might show some promise on one or two of these still leaves the others unsolved. We are so far from being able to address all of them. And not only do we have to address them all, but we have to have totally watertight solutions, robust to AI being superhuman in its capabilities (in contrast to today’s “weak methods”). In the limit of superintelligent AI, alignment and other x-safety protections (those against misuse or conflict) need to be perfect. Else all the doom flows through the tiniest crack in our defence[18].
The nature of the threat
The Shoggoth meme.
X-risk-posing AI will be:
A General Cognition Engine (see above); an AGI.
Superhumanly capable at STEM.
Alien (merely very good at imitating humans).
Fast (we’ll be like plants to it).
Very difficult to stop if open sourced (or leaked).
Situationally aware (and deceptive).
Capable of goal mis-generalisation[19] (mesa-optimisation).
Misaligned (see above).
Uncontrollable (see below).
X-safety from ASI might be impossible
We don’t even know for sure at this stage whether x-safety from superhuman AI is even possible. A naive prior would actually suggest that it may not be. Since when has a more intelligent species indefinitely controlled a less intelligent species? And there is in fact currently ongoing work to establish impossibility proofs for ASI alignment[20]. These being accepted could go a long way to securing a moratorium on further frontier AI development.
A global stop to frontier AI development is our only reasonable hope
I had referred to this as an “indefinite global pause” in a draft, but following Scott Alexander’s taxonomy of pause debate positions, I realise that a stop (and restart when safe) is a cleaner way of putting it[21].
“There are only two times to react to an exponential: too early, or too late.”—Connor Leahy[22]
X-safety research needs time to catch up
X-safety research is far behind relative to AI capabilities and shortened timelines. This creates a dire need for extra time to be bought for it to catch up. There are 100s of PhDs that could be done just on getting closer to perfect alignment for current sota models. We could stop for at least 5 years before coming close to running out of useful x-safety research that needs doing.
The pause needs to be indefinite (i.e. a stop) and global
No serious, or (I think) realistic, pause is going to be temporary or local (confined to one country). Many of the posts arguing against a pause seem to be making unrealistic assumptions along these lines. A pause needs to be global and indefinite, until either there is a global consensus on x-safety (amongst ~all experts), or a global democratic mandate to resume scaling of AI (this could happen even if expert consensus on x-risk is 1%, or 10%, I guess, if there are other problems that require AGI to fix, or just a majority desire to make such a gamble for a shot at utopia). This is not to say that it may not start off as local, partial (e.g. regulation based on existing law) or finite, but the goal remains for it to be global and indefinite.
Restart slowly when there is a consensus on AI x-safety
Worries about overhangs in terms of compute and algorithms are also based on unreasonable assumptions: if there are strict limits on compute (and data), they aren’t suddenly going to be lifted. Assuming that there is a global consensus on a way forward that has a high chance of being existentially safe, ratcheting up slowly would be the way to go, doing testing and evals along the way.
Ideally, the consensus on x-safety should be extended to a global democratic mandate to proceed with AGI development, given that it affects everyone on the planet. A global referendum would be the aspiration here. But the initial agreement to stop should have baked into it a mechanism for determining consensus on x-safety and restarting. I hope that the upcoming UK AI Safety Summit can be a first step toward such an agreement.
Consider that maybe we should just not build AGI at all
The public doesn’t want it[23]. Uncertainty over extreme risks alone justifies a stop. It’s almost a cliche now for climate change and cancer cures to be trotted out as reasons to pursue AGI. EAs might also think about solutions to poverty and all disease. But we are already making good progress on these (including with existing narrow AI), and there’s no reason that we can’t solve them without AGI. We don’t need AGI for an amazing future.
Objections answered
We don’t need to keep scaling AI capabilities to do good alignment research.
The argument by the big AI labs that we need to advance capabilities to be able to better work on alignment is all too conveniently aligned with their money-making and power-seeking incentives. Responsible Scaling policies are deeply flawed; it’s basically an oxymoron when AGI is so close. The danger is already apparent enough (see above) to stop scaling now. The same applies to conditional pauses, trying to predict when we will get dangerous AI, or notice “near-dangerous” AI. Also, as mentioned above, there is plenty of AI Safety research that is possible during a stop[24] (e.g. mech. interp.).
We don’t need a global police state to enforce an indefinite moratorium.
The worry is about world-ending compute becoming more and more accessible as it continues to become cheaper, and algorithms keep advancing. But hardware development and research can also be controlled. And computers already report a lot of data—adding restrictions along the lines of “can’t be used for training dangerous frontier AI models” would not be such a big change. More importantly, a taboo can go a long way to reducing the need for enforcement. Take for example, human reproductive cloning. This is so morally abhorrent that it is not being practised anywhere in the world. There is no black market in need of a global police state to shut it down. AGI research could become equally uncool once the danger, and loss of sovereignty, it represents is sufficiently well appreciated. Nuclear non-proliferation is another example—there is a lot of international monitoring and control of enriched uranium without a global police state. The analogy here is GPUs being uranium and large GPU clusters being enriched uranium.
Scott Alexander has the caveat “absent much wider adoption of x-risk arguments” when talking about enforcement of a pause. I think that we can get that much wider adoption of x-risk arguments (indeed we are already seeing it), and a taboo on AGI / superhuman AI to go along with it.
Fears over AI leading to more centralised control and a global dictatorship are all the more reason to shut down AI now.
Even if your p(doom) is (still) low (despite considering the above), we need a global democratic mandate for something as momentous as AGI
Whilst I think AGI is 0-5 years away and p(doom|AGI) is ~90%[25], I want to remind people that a p(doom) of ~10% is ~1B human lives lost in expectation. I know many people working on AI, and EAs, and rationalists, would happily play a round of Russian Roulette for a shot at utopia; but it’s outrageous to force this on the whole world without a democratic mandate. Especially given the general public do not want the “utopia” anyway: they do not want smarter than human AI in the first place.
The benefits are speculative, so they don’t outweigh the risks.
Assumptions of wealth generation, economic abundance and public benefit from AGI are ungrounded without proof that they are even possible—and they aren’t (yet), given a lack of scalable-to-ASI solutions to alignment, misuse, and coordination. A stop isn’t taking away the ladder to heaven, for such a ladder does not exist, even in theory, as things stand[26]. Yes, it is a bit depressing realising that the world we live in may actually be closer to the Dune universe (no AI) than the Culture universe (friendly ASI running everything). But we can still have all the nice things (including a cure for ageing) without AGI; it might just take a bit longer than hoped. We don’t need to be risking life and limb driving through red lights just to be getting to our dream holiday a few minutes earlier.
There is precedent for global coordination at this level.
Examples include nuclear, biological, and chemical weapons treaties; the CFC ban; and taboos on human reproductive cloning and Eugenics. There are many other groups already in opposition to AI due to ethical concerns. A stop is a common solution to most concerns[27].
There aren’t any better options
They all seem far worse:
Superalignment—trying to solve a fiendishly difficult unsolved problem, with no known solution even in theory, to a deadline, sounds like a recipe for failure (and catastrophe).
Pivotal act—requires some minimal level of alignment to pull off. Again, pretty hubristic to expect this to not end in disaster.
Hope for the best—not going to cut it when extinction is on the line and there is nowhere to run or hide.
Hope for a catastrophe first—given human history and psychology (precedents being Covid, Chernobyl), it unfortunately seems likely that the bodies will have to start piling up before sufficient action is taken on regulation/lockdown. But this should not be an actual strategy. We really need to do better this time. Especially when there may not actually be a warning shot, and the first disaster could send us irrevocably toward extinction.
Acknowledgements: For helpful comments and suggestions that have improved the post, I thank Miles Tidmarsh, Jaeson Booker, Florent Berthet and Joep Meindertsma. They do not necessarily endorse everything written in the post, and all remaining shortcomings are my own.
- ^
Including NVidia, Meta, Tesla; and Apple.
- ^
See section below - “There are no bottlenecks”—for more details.
- ^
Rather than max and min estimates for GPT-4 compute, the best estimate from Epoch (2E25) is taken.
- ^
Perhaps if they were actually called this, rather than misnomered as Large Language Models (or Foundation models), then people would be taking this more seriously!
- ^
See also: Tesla’s Optimus, Fourier Intelligence’s GR-1.
- ^
Although scaling general-purpose robotics to automate most physical jobs would likely take a number of years, by which point it is likely to be eclipsed by ASI.
- ^
4.2x from growth in training compute, and 2.5x from algorithm improvements.
- ^
Note also that Ajeya Cotra in her “Bio Anchors” report has ~1E28 and 1E31 FLOP estimates for the Lifetime Anchor and Short horizon neural network hypotheses (p8). And see also the graph above from Owen (2023).
- ^
15 months from now if taken from Mustafa Suleyman’s first pronouncement of this.
- ^
Factoring a continued 2.5x/yr effective increase from algorithmic improvements makes this 1E29 effective FLOP in 2025.
- ^
Whilst Epoch have gathered some data showing that the stock high quality text data might soon be exhausted, their overall conclusion is that there is only a “20% chance that the scaling (as measured in training compute) of ML models will significantly slow down by 2040 due to a lack of training data.”
- ^
Examples along these lines already in use are prompt engineering to increase performance, and code generation to solve otherwise unsolvable problems.
- ^
p(doom|AGI) means probability of doom, given AGI; “doom” means existential catastrophe, or for our purposes here, human extinction.
- ^
I have yet to hear convincing arguments against this.
- ^
See “4 disjunctive paths to doom” below.
- ^
I’ve linked to some of my (and others’) criticisms of other posts in the AI Pause Debate throughout this post.
- ^
Note that the RepE approach only stops ~90% of the jailbreaks. The nature of neural networks suggests that progress in their alignment is statistical in nature and can only asymptote toward 100%. A terrorist intent on misuse isn’t going to be thwarted by having to try prompting their model 10 times.
- ^
One could argue that watertight solutions aren’t necessary, given the example of humans not value loading perfectly between generations, but I don’t think this holds, given massive power imbalances and large amounts of subjective time for accidents to happen, and the likelihood of a singleton emerging given first mover advantage. At the very least we would need to survive the initial foom in order for secondary safeguard systems to be put in place. And imperfect value loading could still lead to non-extinction existential catastrophes of the form of being only left a small amount of the cosmos (e.g. just Earth) by the AI; in which case the non-AGI counterfactual world would probably be better.
- ^
See also The Alignment Problem from a Deep Learning Perspective, p4.
- ^
Although the above caveat from fn. 16 about imperfect value loading (and astronomical waste) might still apply. As might a caveat about the possibility of limited ASI.
- ^
That also makes it easier to distinguish from the other positions. Yudkowsky’s TIME article articulates this view forcefully.
- ^
- ^
Note this is mostly based on a recent survey of the US public. Attitudes in other countries may be more in favour.
- ^
Scott Alexander: “a lot hinges on whether alignment research would be easier with better models. I’ve only talked to a handful of alignment researchers about this, but they say they still have their hands full with GPT-4.”
- ^
The remaining ~10% for “we’re fine” is nearly all exotic exogenous factors (related to the simulation hypothesis, moral realism being true—valence realism?, consciousness, DMT aliens being real etc), that I really don’t think we can rely on to save us!
- ^
- ^
Although we should beware the failure mode of a UBI/profit share being accepted as a solution to mass unemployment, as this doesn’t prevent x-risk!
- I bet Greg Colbourn 10 k€ that AI will not kill us all by the end of 2027 by 4 Jun 2024 16:37 UTC; 195 points) (
- Funding circle aimed at slowing down AI—looking for participants by 25 Jan 2024 23:58 UTC; 92 points) (
- 4 Mar 2024 22:40 UTC; 47 points) 's comment on Greg_Colbourn’s Quick takes by (
- Giving away copies of Uncontrollable by Darren McKee by 14 Dec 2023 17:00 UTC; 39 points) (
- 5 Mar 2024 17:52 UTC; 4 points) 's comment on Greg_Colbourn’s Quick takes by (
- 28 Oct 2023 18:20 UTC; 4 points) 's comment on We’re Not Ready: thoughts on “pausing” and responsible scaling policies by (
- 25 Oct 2023 10:47 UTC; 3 points) 's comment on Pause For Thought: The AI Pause Debate by (
- 15 Oct 2023 11:48 UTC; 3 points) 's comment on RSPs are pauses done right by (
- 20 Dec 2023 16:32 UTC; 0 points) 's comment on OpenAI, DeepMind, Anthropic, etc. should shut down. by (
- 25 Oct 2023 10:57 UTC; -1 points) 's comment on Pause For Thought: The AI Pause Debate by (
I tend to put P(doom) around 80%, so I think I’m on the pessimistic side, and I tend to think short timelines are at least a real and serious possibility that we should be planning for. Nevertheless, I disagree with a global stop or a pause being the “only reasonable hope”—global stops and pauses seem basically unworkable to me. I’m much more excited about governmentally enforced Responsible Scaling Policies, which seem like the “better option” that you’re missing here.
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it’s pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won’t develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they’d make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.)
I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say “trust us— we can scale responsibly, so we don’t need actual government regulation.” If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
Thanks! A few quick responses/questions:
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
I wrote up a bunch of my thoughts on this in more detail here.
What should happen there is that the leading lab is forced to stop and try to demonstrate that e.g. they understand their model sufficiently such that they can keep scaling. Then:
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.
Hi Evan,
What is your median time from now until human extinction? If it is only a few years, I would be happy to set up a bet like this one.
I mention Responsible Scaling!
EDIT to add: I’m interested in a response from evhub (or anyone else) to the points raised against Responsible Scaling (see links for more details).
I guess I’m not really sure what your objection is to Responsible Scaling Policies? I see that there’s a bunch of links, but I don’t really see a consistent position being staked out by the various sources you’ve linked to. Do you want to describe what your objection is?
I guess the closest there is “the danger is already apparent enough” which, while true, doesn’t really seem like an objection. I agree that the danger is apparent, but I don’t think that advocating for a pause is a very good way to address that danger.
The consistent position is that further scaling is reckless at this stage; it can’t be done in a “responsible” way, unless you think subjecting the world to a 10-25% risk of extinction is a responsible thing to be doing!
What is a better way of addressing the danger? Waiting for it to get more intense and more apparent by scaling further!? Waiting until a disaster actually happens? Actually pausing, or stopping (and setting an example), rather than just advocating for a pause?
Perhaps the crux is related to how dangerous you think current models are? I’m quite confident that we have at least a couple additional orders of magnitude of scaling before the world ends, so I’m not too worried about stopping training of current models, or even next-generation models. But I do start to get worried with next-next-generation models.
So, in my view, the key is to make sure that we have a well-enforced Responsible Scaling Policy (RSP) regime that is capable of preventing scaling unless hard safety metrics are met (I favor understanding-based evals for this) before the next two scaling generations. That means we need to get good RSPs into law with solid enforcement behind them and—at least in very short timeline worlds—that needs to happen in the next few years. By far the best way to make that happen, in my opinion, is to pressure labs to put out good RSPs now that governments can build on.
I don’t think the current models are dangerous, but perhaps they could be if used for long enough on improving AI. A couple of orders of magnitude (or a couple of generations) is only a couple of years! This is soon enough to be pushing as hard as we can for a pause right now!
Why try and take it right down to the wire with RSPs? Why over-complicate things? The stakes couldn’t be bigger (extinction). It’s super reckless to not just be saying “It seems quite likely we’re getting to world-ending models in 2-5 years. Let’s not keep going any longer. Let’s just stop now.” The tradeoff [edit: for Anthropic] for a few tens of $Bs of extra profit really doesn’t seem worth it!
I mean, yes, obviously we should be doing everything we can right now. I just think that a RSP-gated pause is the right way to do a pause. I’m not even sure what it would mean to do a pause without an RSP-like resumption condition.
Because it’s more likely to succeed. RSPs provides very clear and legible risk-based criteria that are much more plausibly things that you could actually get a government to agree to.
This seems extremely disingenuous and bad faith. That’s obviously not the tradeoff and it confuses me why you would even claim that. Surely you know that I am not Sam Altman or Dario Amodei or whatever.
The actual tradeoff is the probability of success. If I thought e.g. just advocating for a six month pause right now was more effective at reducing existential risk, I would do it.
Have the resumption condition be a global consensus on an x-safety solution or a global democratic mandate for restarting (and remember there are more components of x-safety than just alignment—also misuse and multi-agent coordination).
I think if governments actually properly appreciated the risks, they could agree to an unconditional pause.
Sorry. I’m looking at it at the company level. Please don’t take my critiques as being directed at you personally. What is in it for Anthropic and OpenAI and DeepMind to keep going with scaling? Money and power, right? I think it’s pushing it a bit at this stage to say that they, as companies, are primarily concerned with reducing x-risk. If they were they would’ve stopped scaling already. Forget the (suicide) race. Set an example to everyone and just stop!
This seems basically unachievable and even if it was achievable it doesn’t even seem like the right thing to do—I don’t actually trust the global median voter to judge whether additional scaling is safe or not. I’d much rather have rigorous technical standards than nebulous democratic standards.
That’s why we should be pushing them to have good RSPs! I just think you should be pushing on the RSP angle rather than the pause angle.
Fair. And where I say “global consensus on an x-safety”, I mean expert opinion (as I say in the OP). I expect the public to remain generally a lot more conservative than the technical experts though, in terms of risk they are willing to tolerate.
The RSP angle is part of the corporate “big AI” “business as usual” agenda. To those of us playing the outside game it seems very close to safetywashing.
I’ve written up more about why I think this is not true here.
Thanks. I’m not convinced.
Why are people downvoting my reply without comment, and upvoting evhub’s comment? It’s the most upvoted comment, even though he clearly didn’t even ctrl-F for “Responsible Scaling” / notice that I’d addressed it in the OP!
Thanks for sharing your thoughts, Greg!
Assuming you believe there is a 75 % chance of AGI within the next 5 years, the above suggests your median time from now until doom is 3.70 years (= 0.5*(5 − 0)/0.75/0.9). Is your median time from now until human extinction also close to 3.70 years? If so, we can set up a bet similar to the one between Bryan Caplan and Eliezer Yudkowsky:
I send you 10 k 2023-$ in the next few days.
If humans do not go extinct in 3.70 years, or until the end of 2027 for simplicity, you send me 19 k 2023-$.
The expected profit is:
For you, 500 2023-$ (= 10*10^3 − 0.5*19*10^3).
For me, 9.00 k 2023-$ (= −10*10^3 + 19*10^3), as I think the chance of humans going extinct until the end of 2027 is basically negligible. I would guess around 10^-7 per year.
The expected profit is quite positive for both of us, so we would agree on the bet as long as my (your) marginal earnings after loosing 10 k 2023-$ (19 k 2023-$) would still go towards donations, which arguably do not have much diminishing returns.
I guess my marginal earnings after loosing 10 k 2023-$ would still go towards donations, so I am happy to take the bet.
Hi Vasco, sorry for the delay getting back to you. I have actually had a similar bet offer up on X for nearly a year (offering to go up to $250k) with only one taker for ~$30 so far! My one is you give x now and I give 2x in 5 years, which is pretty similar. Anyway, happy to go ahead with what you’ve suggested.
I would donate the $10k to PauseAI (I would say $10k to PauseAI in 2024 is much greater EV than $19k to PauseAI at end of 2027).
[BTW, I have tried to get Bryan Caplan interested too, to no avail—if anyone is in contact with him, please ask him about it.]
As much as I may appreciate a good wager, I would feel remiss not to ask if you could get a better result for amount of home equity at risk by getting a HELOC and having a bank be the counterparty? Maybe not at lower dollar amounts due to fixed costs/fees, but likely so nearer the $250K point—especially with the expectation that interest rates will go down later in the year.
I don’t have a stable income so I can’t get bank loans (I have tried to get a mortgage for the property before and failed—they don’t care if you have millions in assets, all they care about is your income[1], and I just have a relatively small, irregular rental income (Airbnb). But I can get crypto-backed smart contract loans, and do have one out already on Aave, which I could extend.).
Also, the signalling value of the wager is pretty important too imo. I want people to put their money where their mouth is if they are so sure that AI x-risk isn’t a near term problem. And I want to put my money where my mouth is too, to show how serious I am about this.
I think this is probably because they don’t want to go through the hassle of actually having to repossess your house, so if this seems at all likely they won’t bother with the loan in the first place.
Thanks for following up, Greg! Strongly upvoted. I will try to understand how I can set up a contract describing the bet with your house as collateral.
Could you link to the post on X you mentioned?
I will send you a private message with Bryan’s email.
Definitely seek legal advice in the country and subdivision (e.g., US state) where Greg lives!
You may think of this as a bet, but I’ll propose an alternative possible paradigm: it’s may be a plain old promissory note backed by a mortgage. That is, a home-equity loan with an unconditional balloon payment in five years. Don’t all contracts in which one party must perform in the future include a necessarily implied clause that performance is not necessary in the event that the human race goes extinct by that time? At least, I don’t plan on performing any of my future contractual obligations if that happens . . . .
So even assuming this wouldn’t be unenforceable as gambling, it might run afoul of the rules for mortgage lending (e.g., because the implied interest rate [~14.4%?] is seen as usurious, or because it didn’t comply with local or national laws regulating mortgage lending). That is a pretty regulated industry in general. It would definitely need to follow all the formalities for secured lending against real property: we require those formalities to make sure the borrower knows what he is getting into, and to give notice to other would-be lenders that they would be further back in line on repayment.
I should also note that it is pretty difficult in many places to force a sale on someone’s primary residence if you hold certain types of security interests (as opposed to, e.g., a primary mortgage). So you might be holding a lien that doesn’t have much practical value unless/until Greg decides to sell and there is value after paying off whoever is ahead in line on payment. Again, I can only advise seeking legal counsel in the right jurisdiction.
The off-the-wall thought I have is that Greg might be able to get around some difficulties by delivering a promissory note backed by a recorded security interest to an unrelated charity. But at the risk of sounding like a broken record, everyone would need legal advice from someone licensed in the jurisdiction before embarking on any approach in this rather unusual and interesting scenario.
Thanks for sharing your thoughts, Jason!
Cool, thanks. I link to one post in the comment above. But see also.
Thanks! Could you also clarify where is your house, whether you live there or elsewhere, and how much cash you expect to have by the end of 2027 (feel free to share the 5th percentile, median and 95th percentile)?
It’s in Manchester, UK. I live elsewhere—renting currently, but shortly moving into another owned house that is currently being renovated (I’ve got a company managing the would-be-collateral house as an Airbnb, so no long term tenants either). Will send you more details via DM.
Cash is a tricky one, because I rarely hold much of it. I’m nearly always fully invested. But that includes plenty of liquid assets like crypto. Net worth wise, in 2027, assuming no AI-related craziness, I would be expect it to be in the 7-8 figure range, 5-95% maybe $500k-$100M).
Update. I bet Greg Colbourn 10 k€ that AI will not kill us all by the end of 2027.
Greg can presumably also just take out a loan? I think this will likely dominate the bet you proposed given that your implied interest rates are very high.
The bet might be nice symbolism though.
As I say above, I’ve been offering a similar bet for a while already. The symbolism is a big part of it.
I can currently only take out crypto-backed loans, which have been quite high interest lately (don’t have a stable income so can’t get bank loans or mortgages), and have considered this but not done it yet.
Thanks for the suggestion, Ryan. As I side note, I would be curious to know how my comment could be improved, as I see it was downvoted. I guess it is too adversarial.
I feel like there is a nice point there, but I am not sure I got it. By taking a loan, Greg would loose purchasing power in expectation (meanwhile, I have replaced “$” by “2023-$” in my comment), but he would gain it by taking the bet. People still take loans because they could value additional purchasing power now more than in the future, but this is why I said the bet would only make sense if my and Greg’s marginal earnings would continue to go towards donations if we lost the bet. To ensure this, I would consider the bet a risky investment, and move some of my investments from stocks to bonds to offset at least part of the increase in risk. Even then, I would want to set up an agreement with signatures from both of us, and a 3rd party before going ahead with the bet.
Yes, I think symbolism would plausibly dominate the benefits for Greg.
The key thing is that you don’t have to pay off loans if we’re all dead. So all loans are implicitly bets about whether society will collapse by some point.
Re risk, as per my offer on X, I’m happy to put my house up as collateral if you can be bothered to get the paperwork done. Otherwise happy to just trade on reputation (you can trash mine publicly if I don’t pay up).
(I didn’t downvote it.)
Would be interested to see your reasoning for this, if you have it laid out somewhere. Is it mainly because you think it’s ~impossible for AGI/ASI to happen in that time? Or because it’s ~impossible for AGI/ASI to cause human extinction?
I have not engaged so much with AI risk, but my views about it are informed by considerations in the 2 comments in this thread. Mammal species usually last 1 M years, and I am not convinced by arguments for extinction risk being much higher (I would like to see a detailed quantitative model), so I start from a prior of 10^-6 extinction risk per year. Then I guess the risk is around 10 % as high as that because humans currently have tight control of AI development.
To be consistent with 10^-7 extinction risk, I would guess 0.1 % chance of gross world product growing at least 30 % in 1 year until 2027, due to bottlenecks whose effects are not well modelled in Tom Davidson’s model, and 0.01 % chance of human extinction conditional on that.
Interesting. Obviously I don’t want to discourage you from the bet, but I’m surprised you are so confident based on this! I don’t think the prior of mammal species duration is really relevant at all, when for 99.99% of the last 1M years there hasn’t been any significant technology. Perhaps more relevant is homo sapiens wiping out all the less intelligent hominids (and many other species).
On the question of priors, I liked AGI Catastrophe and Takeover: Some Reference Class-Based Priors. It is unclear to me whether extinction risk has increased in the last 100 years. I estimated an annual nuclear extinction risk of 5.93*10^-12, which is way lower than the prior for wild mammals of 10^-6.
I see in your comment on that post, you say “human extinction would not necessarily be an existential catastrophe” and “So, if advanced AI, as the most powerful entity on Earth, were to cause human extinction, I guess existential risk would be negligible on priors?”. To be clear: what I’m interested in here is human extinction (not any broader conception of “existential catastrophe”), and the bet is about that.
Agreed.
See my comment on that post for why I don’t agree. I agree nuclear extinction risk is low (but probably not that low)[1]. ASI is really the only thing that is likely to kill every last human (and I think it is quite likely to do that given it will be way more powerful than anything else[2]).
But too be clear, global catastrophic / civilisational collapse risk from nuclear is relatively high (these often get conflated with “extinction”).
Not only do I think it will kill every last human, I think it’s quite likely it will wipe out all known carbon-based life.
I upvoted this offer. I have an alert for bet proposals on the forum, and this is the first genuine one I’ve seen in a while.
“I think AGI is 0-5 years away” != “I am certain AGI will happen in within five years.” I think it is best read as implying somewhere between 51 and 100% confidence, at least standing alone. Depending on where it is set, you probably should offer another ~12-18 months.
Nice point, Jason! I have adjusted the numbers above to account for that. I have also replaced “$” by “2023-$” to account for inflation.
Re p(doom) being high, I don’t think you need to commit to the view that the most likely outcome of AGI is doom. Surveys of AI researchers put the risk from rogue AI at 5%. In the XPT survey professional forecasters put the risk of extinction from AI at 0.03% by 2050 and 0.38% by 2100, whereas domain experts put the chance at 1.1% and 3%, respectively. Though they had longer timelines than you seem to endorse here.
I think your argument goes through even if the risk is 1% conditional on AGI and that seems like an estimate unlikely to upset too many people, so I would just go with that
I still don’t understand where the 95% for non-doom is coming from. I think it’s useful to look at actual mechanisms for why people think this (and so far I’ve found them lacking). The qualifications of the “professional forecasters” in the XPT survey are in doubt (and again, it was pre-GPT-4).
The argument might go through even if the risk is 1%, but people sure aren’t acting like that. At least in EA, broadly speaking (where I imagine the average p(doom|AGI) is closer to 10%). Also, I’d rather just say what I actually believe, even if it sounds “alarmist”. At least I’ve tried to argue for it in some detail. The main reason I am prioritising this so much is because I think it’s the most likely reason I, and everyone I know and love, will die. Unless we stop it. Forget longtermism and EA: people need to understand that this is a threat to their own personal near-term survival.
I think you come across as over-confident, not alarmist, and I think it hurts how you come across quite a lot. (We’ve talked a bit about the object level before.) I’d agree with John’s suggested approach.
Relatedly, I also think that your arguments for “p(doom|AGI)” being high aren’t convincing to people that don’t share your intuitions, and it looks like you’re relying on those (imo weak) arguments, when actually you don’t need to
I’m crying out for convincing gears-level arguments against (even have $1000 bounty on it), please provide some.
The issue is that both sides of the debate lack gears-level arguments. The ones you give in this post (like “all the doom flows through the tiniest crack in our defence”) are more like vague intuitions; equally, on the other side, there are vague intuitions like “AGIs will be helping us on a lot of tasks” and “collusion is hard” and “people will get more scared over time” and so on.
I’d say it’s more than a vague intuition. It follows from alignment/control/misuse/coordination not being (close to) solved and ASI being much more powerful than humanity. I think it should be possible to formalise it, even. “AGIs will be helping us on a lot of tasks”, “collusion is hard” and “people will get more scared over time” aren’t anywhere close to overcoming it imo.
These are what I mean by the vague intuitions.
Nobody has come anywhere near doing this satisfactorily. The most obvious explanation is that they can’t.
To be fair, I think I’m partly making wrong assumptions about what exactly you’re arguing for here.
On a slightly closer read, you don’t actually argue in this piece that it’s as high as 90% - I assumed that because I think you’ve argued for that previously, and I think that’s what “high” p(doom) normally means.
I do think it is basically ~90%, but I’m arguing here for doom being the default outcome of AGI; I think “high” can reasonably be interpreted as >50%.
I feel like this is a case of death by epistemic modesty, especially when it isn’t clear how these low p(doom) estimates are arrived at in a technical sense (and a lot seems to me like a kind of “respectability heuristic” cascade). We didn’t do very well with Covid as a society in the UK (and many other countries), following this kind of thinking.
What part of Greg writing comes across as over confident?
I suppose one solution might be to say that your personal view is that pdoom is >50%, but a range of estimates suggest >1% is plausible
Thanks for this post Greg.
Re your point about scaling, the Michael et al survey of NLP researchers suggests that researchers don’t think scaling will take us all the way there.
Figure 4.
Based on my limited understanding I agree with you that it seems pretty plausible that scaling does take us to human level AI and beyond, but the experts seem to disagree and I’m not sure why
Interesting. I’ll note that this survey was pre-GPT-4 (or even before GPT3.5 was in widespread use? May-June 2022) when (I think) people were still sceptical of LLMs being able to do well on university exams, amongst many other things. Would be interesting to see a similar survey that is post-GPT-4 (I’ve not been able to find anything). I predict that it will show a significantly higher % agreeing.
In general I think any survey on AI that was conducted in the pre-GPT-4 era is now woefully out of date.
I would agree, relying on pre-GPT4 estimates seems flawed.
Hmm… that isn’t exactly the question I’d like the answer too, which is more scaling + minor incremental improvements + creative prompting.
I agree with most of the points in this post (AI timelines might be quite short; probability of doom given AGI in a world that looks like our current one is high; there isn’t much hope for good outcomes for humanity unless AI progress is slowed down somehow). I will focus on one of the parts where I think I disagree and which feels like a crux for me on whether advocating AI pause (in current form) is a good idea.
You write:
I think framings like these do a misleading thing where they use the word “we” to ambiguously refer to both “humanity as a whole” and “us humans who are currently alive”. The “we” that decides how much risk to take is the humans currently alive, but the “we” that enjoys the dream holiday might be humans millions of years in the future.
I worry that “AI pause” is not being marketed honestly to the public. If people like Wei Dai are right (and I currently think they are), then AI development may need to be paused for millions of years potentially, and it’s unclear how long it will take unaugmented or only mildly augmented humans to reach longevity escape velocity.
So to a first approximation, the choice available to humans currently alive is something like:
Option A: 10% chance utopia within our lifetime (if alignment turns out to be easy) and 90% human extinction
Option B: ~100% chance death but then our descendants probably get to live in a utopia
For philosophy nerds with low time preference and altruistic tendencies (into which I classify many EA people and also myself), Option B may seem obvious. But I think many humans existing today would rather risk it and just try to build AGI now, rather than doing any AI pause, and to the extent that they say they prefer pause, I think they are being deceived by the marketing or acting under Caplanian Principle of Normality, or else they are somehow better philosophers than I expected they would be.
(Note: if you are so pessimistic about aligning AI without a pause that your probability on that is lower than the probability of unaugmented present-day humans reaching longevity escape velocity, then Option B does seem like a strictly better choice. But the older and more unhealthy you are, the less this applies to you personally.)
Are you simplifying here, or do you actually believe that “utopia in our lifetime” or “extinction” are the only two possible outcomes given AGI? Do you assign a 0% chance that we survive AGI, but don’t have a utopia in the next 80 years?
What if AGI stalls out at human level, or is incredibly expensive, or is buggy and unreliable like humans are? What if the technology required for utopia turns out to be ridiculously hard even for AGI, or substantially bottlenecked by available resources? What if technology alone can’t create a utopia, and the extra tech just exacerbates existing conflicts? What if AGI access is restricted to world leaders, who use it for their own purposes?
What if we build an unaligned AGI, but catch it early and manage to defeat it in battle? What if early, shitty AGI screws up in a way that causes a worldwide ban on further AGI development? What if we build an AGI, but we keep it confined to a box and can only get limited functionality out of it? What if we build an aligned AGI, but people hate it so much that it voluntary shuts off? What if the AGI that gets built is aligned to the values of people with awful views, like religious fundamentalists? What if AGI wants nothing to do with us and flees the galaxy? What if [insert X thing I didn’t think of here]?.
IMO, extinction and utopia are both unlikely outcomes. The bulk of the probability lies somewhere in the middle.
I was indeed simplifying, and e.g. probably should have said “global catastrophe” instead of “human extinction” to cover cases like permanent totalitarian regimes. I think some of the scenarios you mention could happen, but also think a bunch of them are pretty unlikely, and also disagree with your conclusion that “The bulk of the probability lies somewhere in the middle”. I might be up for discussing more specifics, but also I don’t get the sense that disagreement here is a crux for either of us, so I’m also not sure how much value there would be in continuing down this thread.
I would agree that “utopia in our lifetime” or “extinction” seems like a false dichotomy. What makes you say that you predict the bulk of the probability lies somewhere in the middle?
How about an Option A.1: pause for a few years or a decade to give alignment a chance to catch up? At least stop at the red lights for a bit to check whether anyone is coming, even if you are speeding!
I think this easily goes through, even for 1-10% p(doom|AGI), as it seems like ageing is basically already a solved problem or will be within a decade or so (see the video I linked to—David Sinclair; and there are many other people working in the space with promising research too).
I’m not an expert on most of the evidence in this post, but I’m extremely suspicious of the claim that GPT-4 represents AI that is “~ human level at language”, unless you mean something by this that is very different from what most people would expect.
Technically, GPT-4 is superhuman at language because whatever task you are giving it is in English, and the median human’s English proficiency is roughly nil. But a more commonsense interpretation of this statement is that a prompt-engineered AI and a trained human can do the task roughly as well.
What you link to shows the results of how GPT-4 performs on a bunch of different exams. This doesn’t really show how language is used in the real world, especially since the exams very closely match past exams that were in the training data. It’s good at some of them, but also extremely bad at others (AP English Literature and Codeforces in particular), which is an issue if you’re making a claim that it’s roughly human level.
Furthermore, language isn’t just putting words together in the right order and with the right inflection. It also includes semantic information (what the actual meaning of the sentences is) and pragmatic information (is the language conveying what it is trying to convey, not just the literal meaning). I’m not sure whether pragmatics in particular would be relevant for AI risk, but the fact that anecdotally even GPT-4 is pretty bad at pragmatics prevents a literal interpretation of your statement.
In my opinion, the best evidence for GPT-4 not being human level at language is that, in the real world, GPT-4 is much cheaper than a human but consistently unable to outcompete humans. News organizations have a strong incentive to overhype GPT-caused automation, but the examples that they’ve found are mostly of people saying that either GPT-4 or GPT-3 (it’s not always clear which) did their job much worse than them, but good enough for clients. Take https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/ as a typical story.
Exams aren’t exactly the real world, but the popular example of GPT-4 doing well on exams is https://www.slowboring.com/p/chatgpt-goes-to-harvard. This both ignores citations (which is a very important part of college writing, and one that GPT-3 couldn’t do whatsoever and which GPT-4 still is significantly below what I would expect from a human) and relies on the false belief that Harvard is a hard school to do well at (grade inflation!)
I still agree with two big takeaways of your post, that an AI pause would be good and that we don’t necessarily need AGI for a good future, but that’s more because it’s robust to a lot of different beliefs about AI than because I agree with the evidence provided. Again, a lot of the evidence is stuff that I don’t feel particularly knowledgeable about, I picked this claim because I’ve had to think about it before and because it just feels false from my experience using GPT-4.
GPT-4 is also proficient at many other languages, so I don’t think English is the appropriate benchmark! Is GPT-4 as good as the median human at language in general? I think yes. In fact it’s probably quite a lot better.
Can you link to examples? Most examples I’ve seen on X are people criticising chatGPT-3.5 (or other models) and then someone coming along showing chatGPT-4 getting it right!
It’s nearly always GPT-3 (or 3.5). We only need to be concerned about the best AI models, not the lower tiers! I’ve heard anecdotes, in real life, of people who are using GPT-4 to do parts of their jobs—e.g. writing long emails that their boss was impressed with (they didn’t tell them it was chatGPT!)
Harvard is one of the best schools in the world. The average human is quite far from being smart enough to get in to it. I don’t think saying this is helping the credibility of your argument! Seems a lot like goalpost moving.
Thanks, good to know :)
GPT4 is clearly above the median human when it comes to a range of exams. Do we have examples of GPT4′s comparison to the median human in non-exam like conditions?
I agree with most of your conclusions in this post. I feel uncomfortable. I’ll write more once I have processed some of my emotions and can think in a more clear manner.
Hope everything is okay.
PS. I’m doing AI Safety movement building in Australia and New Zealand, so if you need someone to talk to, feel free to reach out.
Hi Chris thanks for reaching out. Obviously things with the world aren’t ok, it seems insane that every country is staring down a massive national security risk and they haven’t done much about it.
How is movement building going?
I’ll reply via PM.
I know William is already aware, but for others who aren’t, there are groups focused on getting a pause: PauseAI (Discord) and the AGI-Moratorium-HQ Slack are two of them. And there are now quite a lot of people on X with ⏸️ or ⏹️ in their name. I find that it makes me feel better being pro-active about doing something about it.
I spoke with William at length yesterday. The situation is dire, but I don’t think it’s impossible.
Yea, thanks for the talk Greg, it was informative.
Strong upvote.
GPT-1 was released 2018. GPT-4 has shown sparks of AGI.
We have early evidence of self-improvement—or conservatively—positive feedback loops are evident.
Open AI intends to build ~AGI to automate alignment research. Sam Altman is attempting to raise $7T USD to build more GPUs.
Anthropic CEO estimates 2-3 years until AGI.
Meta has gone public about their goal of open-sourcing AGI.
Superalignment might even be impossible.
It seems to be difficult to defend a world against rouge AGIs, it seems difficult for aligned AIs to defend us.
I’m skeptical about the tractability of making AGI development taboo within a very few years. It seems like this plan would require moderate timelines in order to be viable.
That said, I’m starting to wonder whether we should be trying to gain support for a pause in theory: specifically, people to agree that in an ideal world, we would pause AI development where we are now.
That could help open up the Overton window.
AGI development is already taboo outside of tech circles. Per the September poll by the AIPI, only 12% disagree that “Preventing AI from quickly reaching superhuman capabilities” should be an important AI policy goal. (56% strongly agree, 20% somewhat agree, 8% somewhat disagree, 4% strongly disagree, 12% not sure.) Despite the fact that world leaders are themselves influenced by tech circles’ positions, leaders around the world are quite clear that they take the risk seriously.
The only reason AGI development hasn’t been halted already is that the general public does not yet know that big tech is both trying to build AGI, and actually making real progress towards it.
The taboo only really needs to kick in on moderate timelines, so we’re in luck :) On short timelines, only massive data centres and the leading AI labs need to be regulated.
I’ve now made post this into an X thread (with some slight edits and some condensing).
Upvoted your post because you made some good points, but I think your analogy between human cloning and AI training is totally wrong.
There is no black market in human cloning, and no police state trying to stop it, because no one benefits very much from cloning. Cloning is just not that useful. Whereas if we stop corporate AI development for 40 years but computer hardware keeps improving, anyone can get rich by training an AGI on their gaming laptop. It would be like trying to confiscate all the drug imports in a world where everyone with a cell phone is a drug addict.
Thanks. I also address the “get rich” point though! People can’t get rich from it AGI because they lose control of it (/the world ends) before they get rich. AGI is not that useful either, because it’s uncontrollable and has negative externalities that will come back and swamp any hoped for benefits, even for the producer (i.e. x-risk).