Cutting AI Safety down to size

This is more of a personal how-I-think post than an invitation to argue the details.

This post discuss two sources of bloat, mental gymnastics, and panic in AI Safety:

p(doom)/timeline creep
lesswrong dogma accommodates embedded libertarian assumptions

I think there’s been an inflation of the stakes of AI Safety unchecked for a long time. I feel it myself since I switched into AI Safety. When you’re thinking about AI capabilities every day it’s easy to become more and more scared and convinced that doom is almost certain, SOON.

My general take on timelines and p(doom) is that the threshold for trying to get AI paused should be pretty low (5% chance of extinction or catastrophe seems more than enough to me) and that, above that, differences in p(doom) have their uses to keep track of but generally aren’t action-relevant. But the subjective sense of doom component of my p(doom) has risen in the last year. In 2023 my p(doom) was something like 20-40% for the worst outcomes (extinction, societal collapse, s-risk) with 40-50% probability mass on very bad outcomes such as society or sentient life becoming bad through things short of x-risk like mass job loss and instability, AI-enabled authoritarianism, or creating solipsistic little AI worlds around people that strangle sentient-sentient connection. Now, I feel my p(x-risk) going up, more like 60% by my feels, somewhat due to actual AI advances and political outcomes, but mainly because I have felt scared and powerless about the development of AI every day in the intervening time. I’ve generally refused to participate in timeline talk because I think it’s too reductive and requires too much explanation to be useful for conveying your worldview, but ditto my subjective internal sense of timelines. The scope of the problem has already reached its physical limits— the lightcone— in the minds of many here. How could this not also be happening for others who have been living on this IV drip of fear longer than me?

This p(doom) creep is not harmless. I wrote last week that freaking about AI x-risk doesn’t help, and it sure doesn’t. It sabotages the efforts most likely to improve the AI risk situation.

Many people think that they need the sense of urgency to take effective action, that if the problem were smaller or we had more time, they and others would just put it off or bounce off of it. 1) people are bouncing off of the insane and false-sounding levels of urgency you are presenting (at some point you’re just a street preacher yelling about the apocalypse) and 2) maybe you can scare yourself into working a little harder and a little faster in the short term, but inflated urgency is a recipe for burnout. The reason I hate timeline talk is that we’re not just approaching one event— there are lots of harms and risks it would be good to mitigate, and they dawn at different speeds, and some are undoable and others aren’t. Only in the very worst scenarios does everything happen in an instant, but even in that case thinking of FOOM as some kind of deadline is a really unhelpful guide to action. Action should be guided by when we can intervene, not when we’re fucked.

So, here’s my vision of the stakes in AI Safety, cut down to size:

The problem with powerful AI is that our world is fragile, depending on many delicate equilibria, many of which we are unaware of because our capabilities level is too low to easily upset them. Introducing a more powerful intelligence, without our many biological constraints, will disrupt many of those equilibria in ways we either can’t control or constrain or can’t even anticipate will be disrupted because we completely take them for granted.
The harms of this disruption will not come all at once, and shades of many of them are already here (things like algorithmic bias, deepfakes, job loss likely soon). It’s not that there could never be *better* equilibria established for a lot of these things with the aid of AI, like a world where people don’t have to work to have what they need, but disrupting the equilibrium we have causes chaos->suffering and we do not have a plan for getting to the better equilibrium. (Up until recently, the AI Safety community’s deadass plan was that the aligned AI would figure it out.)
Given the capabilities of current models, at any time, there could be a bad accident due to the models pursuing goals that we didn’t anticipate they would have, that they hid during training and testing, following human instructions that are antisocial (like war or terrorism) or following instructions that were just unwise (the operator doesn’t realize they’re instructing the model to destroy the atmosphere or something). The nature and magnitude of accidents possible grows as the model training scales, because the model is more and more able to plow through our evolved societal ecosystem that we depend on.
The scale of the danger really could cripple civilization or cause extinction, and the possibility of this alone is reason enough to pursue pausing frontier AI development. Furthermore, the people of the world (I’m judging mostly by US polls, but everything I’ve seen is consistent) don’t want their world radically disrupted, so it’s not okay for people who want to build AGI and take a crazy chance to improve the world to do that.

One way my take^ differs from the standard LessWrong debate positions is that I don’t think we need extraordinary circumstances to justify telling people they can’t build something dangerous or sufficiently disruptive. I’m not that much of a libertarian, so I’m allowed to care about the softer harms leading up to the possibility of extinction. I think the fact that I’m not articulating AI danger in such a way as to accommodate strong libertarianism makes my version less tortured and way easier to understand.

My headcanon is that one reason the traditional AI Safety crowd has such high (90+%) p(doom) estimates is that they need AI to be a uniquely dangerous technology to justify intervention to control it rather than feeling free to acknowledge that advancing technological capabilities almost by definition carries risk to society that society might want to mitigate. Or, rather, they need a bright red line to distinguish AI from the rest of the reference class of technology, which is always good. The idea that building more advanced technology is always “progress” and that “progress” is always good just doesn’t stand up to the reality of society’s vulnerability to major disruptions, and I believe the libertarian AI Safety wing really struggles to reconcile the danger they see and want to prevent with their political and economic ideology. Free of the libertarian identity and ideology, I have no trouble acting on a lower and, I believe, more realistic p(doom).

I also feel free to share that I put my remaining 10-15% probability mass on us just missing something entirely in our models and nothing that bad happening with AGI development– say LLMs are aligned by default or something and then scaled-up LLMs protect us from other architectures that could be misaligned, or we miraculously get lucky and do “hit a wall” with machine learning that gives us crucial time. I think there is a real possibility that things with advanced AI will accidentally turn out fine even without intervention. But, Holly, you run PauseAI US–doesn’t that undermine your message? A lot in AI Safety seem to think you need a guarantee of catastrophic AI harm for it to be worth trying to pause or regulate it. Why would you need that? Imo it’s because they are strong libertarians and think with anything less than a guarantee of harm, the developers’ rights to do whatever they want take precedence. They don’t want to look like “Luddites”. They may also have Pollyannaish beliefs about the market sorting everything out as long as not literally everyone dies and callous
“price of progress” thinking toward everyone who does suffer and die in that scenario.

I’m not saying you shouldn’t be a libertarian. I am saying that you need to realize that libertarianism is a separate set of values that you bring to the facts and probabilities of AI danger. The two have been so tightly intertwined for decades in AI Safety that the ideological assumptions of libertarianism in so much AI Safety dogma have been obscured and are not properly owned up to as political views. You can be someone who thinks that you can’t interfere with the development of AI if it is merely very harmful to society and doesn’t kill everyone, but that’s not a fact about AI– that’s a fact about your values. And can you imagine if we applied this standard toward any other danger in our lives? It’s been pointed out there are more regulations on selling a sandwich than making AGI. Any given sandwich made in unsanitary conditions probably won’t poison you. There are extreme libertarians who think the government shouldn’t be able to control how hygienically people make sandwiches they sell. So? That doesn’t mean it’s a fact that it’s immoral to demand safety guarantees. I think it’s extremely moral to demand safety on something we have as much reason to believe to be dangerous and deadly as advanced AI. Rather than (I allege) ratchet up your p(doom) to a near certainty of doom to be allowed to want regultions on AI, you could just have a less extreme view on when it is okay to regulate potential dangers.