One of the three major threads in this post (I think) is noticing pause downsides: in reality, an “AI pause” would have various predictable downsides.
Part of this is your central overhang concerns, which I discuss in another comment. The rest is:
Illegal AI labs develop inside pause countries, remotely using training hardware outsourced to non-pause countries to evade detection. Illegal labs would presumably put much less emphasis on safety than legal ones.
There is a brain drain of the least safety-conscious AI researchers to labs headquartered in non-pause countries. Because of remote work, they wouldn’t necessarily need to leave the comfort of their Western home.
Non-pause governments make opportunistic moves to encourage AI investment and R&D, in an attempt to leap ahead of pause countries while they have a chance. Again, these countries would be less safety-conscious than pause countries.
Safety research becomes subject to government approval to assess its potential capabilities externalities. This slows down progress in safety substantially, just as the FDA slows down medical research.
Legal labs exploit loopholes in the definition of a “frontier” model. Many projects are allowed on a technicality; e.g. they have fewer parameters than GPT-4, but use them more efficiently. This distorts the research landscape in hard-to-predict ways.
It becomes harder and harder to enforce the pause as time passes, since training hardware is increasingly cheap and miniaturized.
Whether, when, and how to lift the pause becomes a highly politicized culture war issue, almost totally divorced from the actual state of safety research. The public does not understand the key arguments on either side.
Relations between pause and non-pause countries are generally hostile. If domestic support for the pause is strong, there will be a temptation to wage war against non-pause countries before their research advances too far:
“If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.” — Eliezer Yudkowsky
There is intense conflict among pause countries about when the pause should be lifted, which may also lead to violent conflict.
AI progress in non-pause countries sets a deadline after which the pause must end, if it is to have its desired effect.[8] As non-pause countries start to catch up, political pressure mounts to lift the pause as soon as possible. This makes it hard to lift the pause gradually, increasing the risk of dangerous fast takeoff scenarios (see below).
My high-level take: suppose for illustration that “powerful AI” is binary and powerful AI would appear by default (i.e. with no pause) in 2030 via a 1e30 FLOP training run. (GPT-4 used about 2e25 FLOP.) Several of these concerns would apply to 1e23 FLOP ceiling but not to a 1e28 FLOP ceiling—a 1e28 ceiling would delay powerful AI (powerful AI would be reached by inference-time algorithmic progress and compute increase) but likely not become evadable, let other countries surpass US, etc. I mostly agree with Nora that low-ceiling pauses are misguided—but the upshot of that for me is not “pause is bad” but “pauses should have a high ceiling.”
Unfortunately, it’s pretty uncertain when powerful AI would appear by default, and even if you know the optimal threshold for regulation you can’t automatically cause that to occur. But some policy regimes would be more robust to mistakes than “aim 2–3 OOMs below when powerful AI would appear and pause there”—e.g. starting around 1e26 today and doubling every year.
Specific takes:
Yeah, policy regimes should have enforcement to prevent evasion. This is a force pushing toward higher ceilings. Maybe you think evasion is inevitable? I don’t, at least for reasonably high ceilings, although I don’t know much about it.
Idk, depends on the details of the policy. I think some experts think US regulation on training runs would largely apply beyond US borders (extraterritoriality); my impression is US can disallow foreign companies from using US nationals’ labor, at least.
I agree that if US pauses and loses its lead that’s a big downside. I don’t think that’s inevitable, although it is a force pushing toward less ambitious pauses / higher ceilings. See Cruxes on US lead for some domestic AI regulation.
Hmm, I don’t see how most pause-proposals I’ve heard of would require government approval for safety research. The exception is proposals that would require government approval for fine-tuning frontier LLMs. Is that right; would it be fine if the regulation only hit base-model training runs (or fine-tuning with absurd amounts of compute, like >1e23 FLOP)?
My impression is that pause regulation would probably use the metric training compute, which is hard to game.
Yeah :( so you have to raise the ceiling over time, or set it sufficiently high that you get powerful AI from inference-time progress [algorithmic progress + hardware progress + increased spending] before the policy becomes unenforceable. Go for a less-ambitious pause like 1e27 training FLOP or something.
This isn’t directly bad but it entails maybe the pause is suddenly reversed which is directly bad. This is a major risk from a pause, I think.
To some extent I think it would be good for a liberal pause alliance to impose its will on defectors. To some extent the US / the liberal pause alliance should set the ceiling sufficiently high that they can pause without losing their lead and should attempt to slow defectors e.g. via export controls.
Maybe sorta. Violence sounds implausible.
Partially agree. So (a) go for a less-ambitious pause like 1e27 training FLOP or something, (b) try to slow other countries, and (c) note they partially slow when the US slows because US progress largely causes foreign progress (via publishing research and sharing models).
Super cruxy for the effects of various possible pauses is how long it would take e.g. China to catch up if US/UK/Europe paused, and how much US/UK/Europe could slow China. I really wish we knew this.
Just something that jumped out at me. Suppose a pause is on 1e28+ training runs.
The human brain is made of modules organized in a way we don’t understand. But we do know the frontal lobes associated with executive functions are a small part of the total tissue.
This means an AI system could be a collection of a few dozen specialized 1e28 models separated by api calls, hosted in a common data center for low latency interconnects.
If a “few dozen” is 100+ modules the total compute used would be 1e30 and it might be possible to make this system an AGI with difficult training tasks to cause this level of cognitive development through feedback.
Especially with “meta” system architectures where new modules could be automatically added to improve score where deficiencies are present in a way that training existing weights is leading to regressions.
Interesting—something to watch out for! Perhaps it could be caught by limiting the number of training runs any individual actor can do that are close to / at the FLOP limit (to 1/year?). Of course then actors intent on it could try and use a maze of shell companies or something, but that could be addressed by requiring complete financial records and audits.
Sure. In practice there’s the national sovereignty angle though. This just devolves to each party “complies” with the agreement, violating it in various ways. Too much incentive to defect.
The US government just never audits its secret national labs, China just never checks anything, Israel just openly decides they can’t afford to comply at all etc. Everyone claims to be in compliance.
Really depends on how much of a taboo develops around AGI. If it’s driven underground it becomes much less likely to happen given the resources required.
So my thought on this is I think of flamethrowers and gas shells and the worst ww1 battlefields. I am not sure what taboo humans won’t violate in order to win.
This isn’t war though. What are some peace-time examples of taboo violations (especially state-sanctioned ones)? I can only really think of North Korea and a handful of other pariah states (none of which would be capable of developing AGI).
This can be avoided with a treaty that requires full access given to international inspectors. This already happens with the IAEA and was set up even in the far greater tensions of the cold war. If someone like Iran tries to kick out the inspectors, everyone assumes they’re trying to develop nuclear weapons and takes serious action (harsh sanctions, airstrikes, even the threat of war).
If governments think of this as an existential threat, they should agree to it for the same reasons they did with the IAEA. And while there’s big incentives to defect (unless they have very high p(doom)), there is also the knowledge that kicking out inspectors will lead to potential war and their rivals defecting too.
If this turns out to be feasible, one solution would be to have people on-site (or make TSMC put hardware level controls in place) to randomly sample from the training data several times a day to verify outside data isn’t involved in the training run.
One of the three major threads in this post (I think) is noticing pause downsides: in reality, an “AI pause” would have various predictable downsides.
Part of this is your central overhang concerns, which I discuss in another comment. The rest is:
(I have some relevant ideas in Cruxes on US lead for some domestic AI regulation and Cruxes for overhang.)
My high-level take: suppose for illustration that “powerful AI” is binary and powerful AI would appear by default (i.e. with no pause) in 2030 via a 1e30 FLOP training run. (GPT-4 used about 2e25 FLOP.) Several of these concerns would apply to 1e23 FLOP ceiling but not to a 1e28 FLOP ceiling—a 1e28 ceiling would delay powerful AI (powerful AI would be reached by inference-time algorithmic progress and compute increase) but likely not become evadable, let other countries surpass US, etc. I mostly agree with Nora that low-ceiling pauses are misguided—but the upshot of that for me is not “pause is bad” but “pauses should have a high ceiling.”
Unfortunately, it’s pretty uncertain when powerful AI would appear by default, and even if you know the optimal threshold for regulation you can’t automatically cause that to occur. But some policy regimes would be more robust to mistakes than “aim 2–3 OOMs below when powerful AI would appear and pause there”—e.g. starting around 1e26 today and doubling every year.
Specific takes:
Yeah, policy regimes should have enforcement to prevent evasion. This is a force pushing toward higher ceilings. Maybe you think evasion is inevitable? I don’t, at least for reasonably high ceilings, although I don’t know much about it.
Idk, depends on the details of the policy. I think some experts think US regulation on training runs would largely apply beyond US borders (extraterritoriality); my impression is US can disallow foreign companies from using US nationals’ labor, at least.
I agree that if US pauses and loses its lead that’s a big downside. I don’t think that’s inevitable, although it is a force pushing toward less ambitious pauses / higher ceilings. See Cruxes on US lead for some domestic AI regulation.
Hmm, I don’t see how most pause-proposals I’ve heard of would require government approval for safety research. The exception is proposals that would require government approval for fine-tuning frontier LLMs. Is that right; would it be fine if the regulation only hit base-model training runs (or fine-tuning with absurd amounts of compute, like >1e23 FLOP)?
My impression is that pause regulation would probably use the metric training compute, which is hard to game.
Yeah :( so you have to raise the ceiling over time, or set it sufficiently high that you get powerful AI from inference-time progress [algorithmic progress + hardware progress + increased spending] before the policy becomes unenforceable. Go for a less-ambitious pause like 1e27 training FLOP or something.
This isn’t directly bad but it entails maybe the pause is suddenly reversed which is directly bad. This is a major risk from a pause, I think.
To some extent I think it would be good for a liberal pause alliance to impose its will on defectors. To some extent the US / the liberal pause alliance should set the ceiling sufficiently high that they can pause without losing their lead and should attempt to slow defectors e.g. via export controls.
Maybe sorta. Violence sounds implausible.
Partially agree. So (a) go for a less-ambitious pause like 1e27 training FLOP or something, (b) try to slow other countries, and (c) note they partially slow when the US slows because US progress largely causes foreign progress (via publishing research and sharing models).
Super cruxy for the effects of various possible pauses is how long it would take e.g. China to catch up if US/UK/Europe paused, and how much US/UK/Europe could slow China. I really wish we knew this.
Just something that jumped out at me. Suppose a pause is on 1e28+ training runs.
The human brain is made of modules organized in a way we don’t understand. But we do know the frontal lobes associated with executive functions are a small part of the total tissue.
This means an AI system could be a collection of a few dozen specialized 1e28 models separated by api calls, hosted in a common data center for low latency interconnects.
If a “few dozen” is 100+ modules the total compute used would be 1e30 and it might be possible to make this system an AGI with difficult training tasks to cause this level of cognitive development through feedback.
Especially with “meta” system architectures where new modules could be automatically added to improve score where deficiencies are present in a way that training existing weights is leading to regressions.
Interesting—something to watch out for! Perhaps it could be caught by limiting the number of training runs any individual actor can do that are close to / at the FLOP limit (to 1/year?). Of course then actors intent on it could try and use a maze of shell companies or something, but that could be addressed by requiring complete financial records and audits.
Sure. In practice there’s the national sovereignty angle though. This just devolves to each party “complies” with the agreement, violating it in various ways. Too much incentive to defect.
The US government just never audits its secret national labs, China just never checks anything, Israel just openly decides they can’t afford to comply at all etc. Everyone claims to be in compliance.
Really depends on how much of a taboo develops around AGI. If it’s driven underground it becomes much less likely to happen given the resources required.
So my thought on this is I think of flamethrowers and gas shells and the worst ww1 battlefields. I am not sure what taboo humans won’t violate in order to win.
This isn’t war though. What are some peace-time examples of taboo violations (especially state-sanctioned ones)? I can only really think of North Korea and a handful of other pariah states (none of which would be capable of developing AGI).
This can be avoided with a treaty that requires full access given to international inspectors. This already happens with the IAEA and was set up even in the far greater tensions of the cold war. If someone like Iran tries to kick out the inspectors, everyone assumes they’re trying to develop nuclear weapons and takes serious action (harsh sanctions, airstrikes, even the threat of war).
If governments think of this as an existential threat, they should agree to it for the same reasons they did with the IAEA. And while there’s big incentives to defect (unless they have very high p(doom)), there is also the knowledge that kicking out inspectors will lead to potential war and their rivals defecting too.
If this turns out to be feasible, one solution would be to have people on-site (or make TSMC put hardware level controls in place) to randomly sample from the training data several times a day to verify outside data isn’t involved in the training run.