How to think about slowing AI
Slowing AI[1] is many-dimensional. This post presents variables for determining whether a particular kind of slowing improves safety. Then it applies those variables to evaluate some often-discussed scenarios.
Variables
Many variables affect whether an intervention improves AI safety.[2] Here are four crucial variables at stake when slowing AI progress:[3]
Time until critical systems are deployed.[4] More time seems good for alignment research, governance, and demonstrating risks of powerful AI.
Length of crunch time. In this post, “crunch time” means the time near critical systems before they are deployed.[5] More time until critical systems are deployed is good; more such time near critical systems is especially good. A lab is more likely to (be able to) pay an alignment tax for a critical system if it has more time to pay the tax for that system. Time near critical systems also seems especially good for alignment research and potentially for demonstrating risks of powerful AI and doing governance.
Safety level of labs that develop critical systems.[6] This can be improved both by making labs safer and by differentially slowing unsafe labs.
Propensity to coordinate or avoid racing.[7] This is associated with many factors, but plausible factors relevant to slowing AI seem to be there are few leading labs, they like/trust each other, and they are all in the same country (or at least allied countries) (in part because regulation is one possible cause of not-racing).
One lab’s progress, especially on the frontier, tends to boost other labs. Labs leak their research both intentionally (publishing research and deploying models) and unintentionally.
Some interventions would differentially slow relatively safe labs (relevant to 3). Some interventions (especially policies that put a ceiling on AI capabilities or inputs) would differentially slow leading labs (relevant to 4). Both outcomes are worse than uniform slowing and potentially net-negative.
If something slows progress temporarily, after it ends progress may gradually partially catch up to the pre-slowing trend, such that powerful AI is delayed but crunch time is shortened (relevant to 1 and 2).[8]
Coordination may facilitate more coordination later (relevant to 4).
Current leading labs (Google DeepMind, OpenAI, and maybe Anthropic) seem luckily safety-conscious (relevant to 3). Current leading labs seem luckily concentrated in America (relevant to 4).[9]
Some endogeneities in AI progress may give rise to considerations about the timing of slowing. For example, the speed at which the supply of (ML training) compute responds to (expected) demand determines the effect of slowing soon on future supply. Or perhaps slowing affects the distribution of talent between dangerous AI paths, safe AI paths, and non-AI stuff. Additionally, some kinds of slowing increase or decrease the probability of similar slowing later.
Scenarios
Magic uniform slowing of all dangerous AI: great. This delays dangerous AI and lengthens crunch time. It has negligible downside.
A leading safety-conscious lab slows now, unilaterally: bad. This delays dangerous AI slightly. But it makes the lab irrelevant, thus making the labs that develop critical systems less safe and making the lab unable to extend crunch time by staying at the frontier for now and slowing later.
All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.
All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs’ lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs’ safety practices would be irrelevant).
Strong global treaty: great. A strong global agreement to stop dangerous AI, with good operationalization of ‘dangerous AI’ and strong verification, would seem to stop labs from acting unsafely[10] and thus eliminate AI risk. The downside is the risk of the treaty collapsing and progress being faster and distributed among more labs and jurisdictions than otherwise.
Strong US regulation:[11] good. Like “strong global treaty,” this stops labs from acting unsafely—but not in all jurisdictions. Insofar as this differentially slows US AI progress, it could eventually cause AI progress to be driven by labs outside the regulation’s reach.[12] If so, the regulation—and the labs it slowed—would cease to be relevant, and it would likely have been net-negative: it would cause critical systems to be created by labs other than the relatively-safety-conscious currently-leading ones and cause leading labs to be more globally diffuse.
US moratorium now: bad. A short moratorium (unless succeeded by a strong policy regime) would slightly delay dangerous AI on net, but also cause progress to be faster for a while after it ends (when AI is stronger and so time is more important), increase the number of leading labs (especially by adding leading labs outside the US), and result in less-safe leading labs (because current leading labs are relatively safety-conscious). A long moratorium would delay dangerous AI, but like in “strong US regulation” the frontier of AI progress would eventually be surpassed by labs outside the moratorium’s reach.
Which scenarios are realistic; what interventions are tractable? These questions are vital for determining optimal actions, but I will not consider them here.
Thanks to Rose Hadshar, Harlan Stewart, and David Manheim for comments on a draft.
This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.
- ^
That is, slowing progress toward dangerous AI, or AI that would cause an existential catastrophe. Many kinds of AI seem safe, such as vision, robotics, image generation, medical imaging, narrow game-playing, and prosaic data analysis—maybe everything except large language models, some bio/chem stuff, and some reinforcement learning. Note that in this post, I assume that AI safety is sufficiently hard that marginal changes in my variables are very important.
- ^
This post is written from the perspective that powerful AI will eventually appear and AI safety is mostly about increasing the probability that it will be aligned. Note that insofar as other threats arise before powerful AI or intermediate AI systems pose threats, it’s better for powerful AI to arrive faster—but I ignore this here.
- ^
See my Slowing AI: Foundations for more.
- ^
In this post, a critical system is one whose deployment would cause an existential catastrophe if misaligned or be able to execute a pivotal act if aligned. This concept is a simplification: capabilities that could cause catastrophe are not identical to capabilities that could execute a pivotal act, ‘cause catastrophe’ and ‘execute a pivotal act’ depend on not just the system but also the world, ‘catastrophe or not’ and ‘pivotal act or not’ aren’t really binary, and deployment is not binary. Nevertheless, it is a useful concept.
- ^
This concept is a simplification insofar as “near critical systems” is not binary. Separately, note that some interventions could lengthen total time to critical systems but reduce crunch time or vice versa. For example, slowing now in a way that causes progress to partially catch up to the old trend later would lengthen total time but reduce crunch time.
Separately, I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems’ near-dangerous capabilities.
- ^
This concept is a simplification: non-lab actors may be central to safety, especially the creators of tools/plugins/scaffolding/apps to integrate with ML models.
- ^
The other variables are implicitly by default, without much coordination.
- ^
See my Cruxes for overhang.
- ^
Coordination seems easier if leading labs are concentrated in a single state, in part because it can be caused by regulation. (Additionally, the AI safety community has relatively more influence over government in the US, so US regulatory effectiveness and thus US lead is good, all else equal.)
Observations about current leads are relevant insofar as (1) those leads will be sustained over time and (2) dangerous AI is sufficiently close that current leaders are likely to be leaders in crunch time by default.
On the risk of differentially slowing US labs, see my Cruxes on US lead for some domestic AI regulation.
- ^
Or in terms of the above variables, a strong global treaty would delay dangerous AI, cause labs to be safer, and (insofar as it discriminates between safe and unsafe labs) differentially slow unsafe labs.
- ^
I imagine “strong global treaty” and “strong US regulation” as including miscellaneous safety standards/regulations but focusing on oversight of large training runs, enforcing a ceiling on training compute and/or doing model evals during large training runs and stopping runs that fail an eval until the lab can ensure the model is safe.
- ^
Labs outside US regulation’s reach could eventually dominate AI progress due to some combination of the following (overlapping):
The US fails to get a large coalition to join it
Labs in coalition states can effectively move to non-coalition states to escape the regulation
Labs in non-coalition states can quickly catch up to the frontier given slowed progress in the coalition
Coalition export controls fail to deny compute to labs in non-coalition states
Other attempted extraterritorialization of the regulation fails
(Also just there being a substantial tradeoff between speed and (legible) safety, such that the regulation substantially slows the labs it affects)
(Also just powerful AI being far off, such that outside labs have longer to catch up to the slowed coalition labs)
- Aim for conditional pauses by 25 Sep 2023 1:05 UTC; 100 points) (
- 22 Sep 2023 22:03 UTC; 10 points) 's comment on Will Aldred’s Quick takes by (
I would be more inclined to agree with this if there was a set of criteria we had that indicated we were in “crunch time” which we are very likely to meet before dangerous systems and haven’t met now. Have people generated such a set? Without that, how do we know when “crunch time” is, or for that matter, if we’re already here?
The problem I have with the scenarios is that they are end-state scenarios without considering who does anything or how negotiations proceed. But unlike in idealized though experiments, in social and geopolitical systems, the process by which the goal is pursued, not the stated goal state, actually determines what the end state looks like.
(totally agree thinking about end-states is insufficient, but I think it’s a necessary first step and this kind of thinking reveals big cruxes and some real disagreements)
We are already in crunch time, doubly so post GPT-4. What predictors are you using that aren’t yet being triggered?
I also agree with David Manheim that the path matters; and therefore incremental steps such as a US moratorium are likely net positive, especially considering that it is crunch time, now. International treaties can be built from such a precedent, and the US is probably at least 1-2 years ahead of the rest of the world currently.
“Crunch time” has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don’t seem to be in that crunch time yet.
I agree that US domestic policy can lead to international law; that should be a consideration.
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
Thanks. I don’t share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we’ve already got once we’ve solved mechanistic interpretability for the current ones (and we’re so far off that—the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we’re basically at crunch time now in terms of having a good shot at averting extinction.
I am actually interested answers to my question, it wasn’t rhetorical (and not sure why my comment was downvoted—disagreement votes: fine).
There’s also a lot of overlap between disagreeing with someone and liking a post. If you disagree with something, you are more likely to not like it. I don’t love this about the voting system but I don’t really have a better alternative to suggest.