I’ll admit to a perhaps overly mean-spirited or exasperated tone in that section, but I think the content itself is good actually(tm)?
I agree with you that LLM tech might not scale to AGI, and thus AGI might not arrive as soon as many hope/fear. But this doesn’t really change the underlying concern?? It seems pretty plausible that, if not in five years, we might get something like AGI within our lifetime via some improved, post-LLM paradigm. (Consider the literal trillions of dollars, and thousands of brilliant researchers, now devoting their utmost efforts towards this goal!) If this happens, it does not take some kind of galaxy-brained rube-goldberg argument to make an observation like “if we invent a technology that can replace a lot of human labor, that might lead to extreme power concentration of whoever controls the technology / disempowerment of many people who currently work for a living”, either via “stable-totalitarianism” style takeovers (people with power use powerful AI to maintain and grow this power very effectively) or via “gradual disempowerment” style concerns (once society no longer depends on a broad base of productive, laboring citizens, there is less incentive to respect those citizens’ rights and interests).
Misalignment / AI takeover scenarios are indeed more complicated and rube-goldberg-y IMO. But the situation here is very different from what it was ten years ago—instead of just doing Yudkowsky-style theorycrafting based on abstract philosophical principles, we can do experiments to study and demonstrate the types of misalignment we’re worried about (see papers by Anthropic and others about sleeper agents, alignment faking, chain-of-thought unfaithfulness, emergent misalignment, and more). IMO the detailed science being done here is more grounded than the impression you’d get by just reading people slinging takes on twitter (or, indeed, by reading comments like mine here!). Of course if real AGI turns out to be in a totally new post-LLM paradigm, that might invalidate many of the most concrete safety techniques we’ve developed so far—but IMO that makes the situation worse, not better!
In general, the whole concept of dealing with existential risks is that the stakes are so high that we should start thinking ahead and preparing to fight them, even if it’s not yet certain they’ll occur. I agree it’s not certain that LLMs will scale to AGI, or that humanity will ever invent AGI. But it certainly seems plausible! (Many experts do believe this, even if they are in the minority on that survey. Plus, like the entire US stock market these days is basically obsessed with figuring out whether AI will turn out to be a huge deal or a nothingburger or something in-between, so the market doesn’t consider it an obvious guaranteed-nothingburger. And of course all the labs are racing to get as close to AGI as fast as possible, since the closer you get to AGI, the more money you can make by automating more and more types of labor!) So we should probably start worrying now, just like we worry about nuclear war even though it seems hopefully unlikely to me that Putin or Xi Jinping or the USA would really decide to launch a major nuclear attack even in an extreme situation like an invasion of Taiwan. New technologies sometimes have risks; AI might (not certain, but definitely might) become and EXTREMELY powerful new technology, so the risks might be large!
If the argument were merely that there’s something like a 1 in 10 million chance of a global catastrophic event caused by AGI over the next 100 years and we should devote a small amount of resources to this problem, then you could accept flimsy, hand-wavy arguments. But Eliezer Yudkowsky forecasts a 99.5% chance of human extinction from AGI “well before 2050”, unless we implement his aggressive global moratorium on AI R&D. The flimsy support that can justify a small allocation of resources can’t justify a global moratorium on AI R&D, enforced by militaries. (Yudkowsky says that AI datacentres that violate the moratorium should be blown up.)
Yudkowsky is on the extreme end, but not by much. Some people want to go to extreme lengths to stop AI. Pause AI calls for a moratorium similar to what Yudkowsky recommends. The amount of funding and attention to AI existential risk from EA is not a small percentage but a very large one. So, whatever would support a modest, highly precautionary stance toward AI risk does not support what is actually happening.
I’ll take your concern about AI concentrating wealth and power if there is widespread labour automation as an example of what I mean with regard to flimsy evidentiary/argumentative support. Okay, let’s imagine that in, say, 2050, we have humanoid robots that are capable of automating most paid human work that currently exists, both knowledge work and work that has a physical dimension. Let’s suppose the robots are:
Built with commodity hardware that’s roughly as expensive as a Vespa or a used car
Sold directly to consumers
Running free, open source software and free, open source/open weights AI models
Programmed to follow the orders of their owner, locked with a password and/or biometric security
Would your concern about wealth and power concentration apply in such a scenario? It’s hard to see how it would. In this scenario, humanoid robots with advanced AI would be akin to personal computers or smartphones. Powerful but so affordable and widely distributed that the rich and powerful hardly have any technological edge over the poor and powerless. (A billionaire uses an iPhone, the President of the United States uses an iPhone, and the cashier at the grocery store uses an iPhone.)
You could also construct a scenario where humanoid robots are extremely expensive, jealously kept by the companies that manufacture them and not sold, run proprietary software and closed models, and obey only the manufacturer’s directives. In that case, power/wealth concentration would be a concern.
So, which scenario is more likely to be true? What is more likely to be the nature of these advanced humanoid robots in 2050?
We have no idea. There is simply no way for us to know, and as much as we might want to know, be desperate to know, twist ourselves in knots trying to work it out, we won’t get any closer to the truth than when we started. The uncertainty is irreducible.
Okay, so let’s accept we don’t know. Shouldn’t we prepare for the latter scenario, just in case? Maybe. How?
Coming up with plausible interventions or preparations at this early stage is hopeless. We don’t know which robot parts need to be made cheaper. The companies that will make the robots probably don’t exist yet. Promoting open source software or open AI models in general today won’t stop any company in the future from using proprietary software and closed models. Even if we passed a law now mandating all AI and robotics companies had to use open source software and open models — would we really want to do that, based on a hunch? — that law could easily be repealed in the future.
Plus, I made the possibility space artificially small. I made things really simple by presenting a binary choice two scenarios. In reality, there is a combinatorial explosion of possible permutations of different technical, design, and business factors involved. Most likely including ones we can’t imagine now and that, if we were shown a Wikipedia article from the future describing one, we still wouldn’t understand it. So, there is a vast space of possibilities based on what we can already imagine, and there will probably be even more possibilities based on new technology and new science that we can’t yet grasp.
Saying “prepare now” sounds sensible and precautionary, but it’s not actionable.
Also, fundamental question: why is preparing earlier better? Let’s say in 2025 humanoid robots account for 0% of GDP (this seems true), in 2030 they’ll account for 1%, in 2040 for 25%, and in 2050 for 50%. What do we gain by trying to prepare while humanoid robots are at 0% of GDP? Once they’re at 1% of GDP, or even 0.5%, or 0.25%, we’ll have a lot more information than we do now. I imagine that 6 months spent studying the problem while the robots are at 1% of GDP will be worth much more than 5 years of research at the 0% level.
Perhaps a good analogy is scientific experiments. The value of doing theory or generating hypotheses in the absence of any experiments or observations — in the absence of any data, in other words — is minimal. For the sake of illustration, let’s imagine you’re curious about how new synthetic drugs analogous to LSD but chemically unlike any existing drugs — not like any known molecules at all — might affect the human mind. Could they make people smarter temporarily? Or perhaps cognitively impaired? Could they make people more altruistic and cooperative? Or perhaps paranoid and distrustful?
You have no data in this scenario: you can’t synthesize molecules, you can’t run simulations, there are no existing comparisons, natural or synthetic, and you certainly can’t test anything on humans or animals. All you can do is think about it.
Would time spent in this pre-empirical state of science (if it can be called science) have any value? Let’s say you were in that state for… 50 years… 100 years… 500 years… 1,000 years… would you learn anything? Would you gain any understanding? Would you get any closer to truth? I think you wouldn’t, or you would so marginally that it wouldn’t matter.
Then if you suddenly had data, if you could synthesize molecules, run simulations, and test drugs on live subjects, in a very short amount of time you would outstrip, many times over, whatever little knowledge you might have gained from just theorizing and hypothesizing about it. A year of experiments would be worth more than a century of thought. So, if for some reason, you knew you couldn’t start experiments for another hundred years, there would be very little value in thinking about the topic before then.
The whole AGI safety/alignment and AGI preparedness conversation seems to rely on the premise that non-empirical/pre-empirical science is possible, realistic, and valuable, and that if we, say, spend $10 million of grant money on it, it will have higher expected value than giving it to GiveWell’s top charities, or pandemic preparedness, or asteroid defense, or cancer research, or ice cream cones for kids at county fairs, or whatever else. I don’t see how this could be true. I don’t see how this can be justified. It seems like you basically might as well light the money on fire.
Empirical safety/alignment research on LLMs might have value if LLMs scale to AGI, but that’s a pretty big ‘if’. For over 15 years, up until — I’m not sure, maybe around 2016? — Yudkowsky and MIRI still thought symbolic AI would lead to AGI in the not-too-distant future. In retrospect, that looks extremely silly. (Actually, I thought it looked extremely silly at the time, and said so, and also got pushback from people in EA way back then too. Plus ça change! Maybe in 2035 we’ll be back here again.) The idea that symbolic AI could ever lead to AGI, even in 1,000 years, just looks unbelievably quaint where you compare symbolic AI systems to a system like AlphaGo, AlphaStar, or ChatGPT. Deep learning/deep RL-based systems still have quite rudimentary capabilities compared to the average human being, or, in some important ways, even compared to, say, a cat, and when you compare how much simpler and how much less capable symbolic AI systems are to these deep neural network-based systems, it’s ridiculous. Symbolic AI is not too different from conventional software, and the claim that symbolic AI would someday soon ascend to AGI feels not too different from the claim that, in the not-too-distant future, Microsoft Windows will learn how to think. The connection between symbolic AI and human general intelligence seems to boil down to, essentially, a loose metaphorical comparison between software/computers and human brains.
I don’t think the conflation of LLMs with human general intelligence is quite as ridiculous as it was with symbolic AI, but it is still quite ridiculous. Particularly when people make absurd and plainly false claims that GPT-4 is AGI (as Leopold Aschenbrenner did) or o3 is AGI (as Tyler Cowen did), or that GPT-4 is a “very weak AGI” (as Will MacAskill did). This seems akin to saying a hot air balloon is a spaceship, or a dog is a bicycle. It’s hard to even know what to say.
As for explicitly, substantively making the argument about why LLMs won’t scale to AGI, there are two distinct and independent arguments. The first argument involves pointing out the limits to LLM scaling. The second argument involves pointing out the fundamental research problems that scaling can’t solve.
I used to assume that people who care a lot about AGI alignment/safety as an urgent priority must have thoughtful replies to these sorts of arguments. Increasingly, I get the impression that most of those people have simply never thought about them before, and weren’t even aware such arguments existed.
I’ll admit to a perhaps overly mean-spirited or exasperated tone in that section, but I think the content itself is good actually(tm)?
I agree with you that LLM tech might not scale to AGI, and thus AGI might not arrive as soon as many hope/fear. But this doesn’t really change the underlying concern?? It seems pretty plausible that, if not in five years, we might get something like AGI within our lifetime via some improved, post-LLM paradigm. (Consider the literal trillions of dollars, and thousands of brilliant researchers, now devoting their utmost efforts towards this goal!) If this happens, it does not take some kind of galaxy-brained rube-goldberg argument to make an observation like “if we invent a technology that can replace a lot of human labor, that might lead to extreme power concentration of whoever controls the technology / disempowerment of many people who currently work for a living”, either via “stable-totalitarianism” style takeovers (people with power use powerful AI to maintain and grow this power very effectively) or via “gradual disempowerment” style concerns (once society no longer depends on a broad base of productive, laboring citizens, there is less incentive to respect those citizens’ rights and interests).
Misalignment / AI takeover scenarios are indeed more complicated and rube-goldberg-y IMO. But the situation here is very different from what it was ten years ago—instead of just doing Yudkowsky-style theorycrafting based on abstract philosophical principles, we can do experiments to study and demonstrate the types of misalignment we’re worried about (see papers by Anthropic and others about sleeper agents, alignment faking, chain-of-thought unfaithfulness, emergent misalignment, and more). IMO the detailed science being done here is more grounded than the impression you’d get by just reading people slinging takes on twitter (or, indeed, by reading comments like mine here!). Of course if real AGI turns out to be in a totally new post-LLM paradigm, that might invalidate many of the most concrete safety techniques we’ve developed so far—but IMO that makes the situation worse, not better!
In general, the whole concept of dealing with existential risks is that the stakes are so high that we should start thinking ahead and preparing to fight them, even if it’s not yet certain they’ll occur. I agree it’s not certain that LLMs will scale to AGI, or that humanity will ever invent AGI. But it certainly seems plausible! (Many experts do believe this, even if they are in the minority on that survey. Plus, like the entire US stock market these days is basically obsessed with figuring out whether AI will turn out to be a huge deal or a nothingburger or something in-between, so the market doesn’t consider it an obvious guaranteed-nothingburger. And of course all the labs are racing to get as close to AGI as fast as possible, since the closer you get to AGI, the more money you can make by automating more and more types of labor!) So we should probably start worrying now, just like we worry about nuclear war even though it seems hopefully unlikely to me that Putin or Xi Jinping or the USA would really decide to launch a major nuclear attack even in an extreme situation like an invasion of Taiwan. New technologies sometimes have risks; AI might (not certain, but definitely might) become and EXTREMELY powerful new technology, so the risks might be large!
If the argument were merely that there’s something like a 1 in 10 million chance of a global catastrophic event caused by AGI over the next 100 years and we should devote a small amount of resources to this problem, then you could accept flimsy, hand-wavy arguments. But Eliezer Yudkowsky forecasts a 99.5% chance of human extinction from AGI “well before 2050”, unless we implement his aggressive global moratorium on AI R&D. The flimsy support that can justify a small allocation of resources can’t justify a global moratorium on AI R&D, enforced by militaries. (Yudkowsky says that AI datacentres that violate the moratorium should be blown up.)
Yudkowsky is on the extreme end, but not by much. Some people want to go to extreme lengths to stop AI. Pause AI calls for a moratorium similar to what Yudkowsky recommends. The amount of funding and attention to AI existential risk from EA is not a small percentage but a very large one. So, whatever would support a modest, highly precautionary stance toward AI risk does not support what is actually happening.
I’ll take your concern about AI concentrating wealth and power if there is widespread labour automation as an example of what I mean with regard to flimsy evidentiary/argumentative support. Okay, let’s imagine that in, say, 2050, we have humanoid robots that are capable of automating most paid human work that currently exists, both knowledge work and work that has a physical dimension. Let’s suppose the robots are:
Built with commodity hardware that’s roughly as expensive as a Vespa or a used car
Sold directly to consumers
Running free, open source software and free, open source/open weights AI models
Programmed to follow the orders of their owner, locked with a password and/or biometric security
Would your concern about wealth and power concentration apply in such a scenario? It’s hard to see how it would. In this scenario, humanoid robots with advanced AI would be akin to personal computers or smartphones. Powerful but so affordable and widely distributed that the rich and powerful hardly have any technological edge over the poor and powerless. (A billionaire uses an iPhone, the President of the United States uses an iPhone, and the cashier at the grocery store uses an iPhone.)
You could also construct a scenario where humanoid robots are extremely expensive, jealously kept by the companies that manufacture them and not sold, run proprietary software and closed models, and obey only the manufacturer’s directives. In that case, power/wealth concentration would be a concern.
So, which scenario is more likely to be true? What is more likely to be the nature of these advanced humanoid robots in 2050?
We have no idea. There is simply no way for us to know, and as much as we might want to know, be desperate to know, twist ourselves in knots trying to work it out, we won’t get any closer to the truth than when we started. The uncertainty is irreducible.
Okay, so let’s accept we don’t know. Shouldn’t we prepare for the latter scenario, just in case? Maybe. How?
Coming up with plausible interventions or preparations at this early stage is hopeless. We don’t know which robot parts need to be made cheaper. The companies that will make the robots probably don’t exist yet. Promoting open source software or open AI models in general today won’t stop any company in the future from using proprietary software and closed models. Even if we passed a law now mandating all AI and robotics companies had to use open source software and open models — would we really want to do that, based on a hunch? — that law could easily be repealed in the future.
Plus, I made the possibility space artificially small. I made things really simple by presenting a binary choice two scenarios. In reality, there is a combinatorial explosion of possible permutations of different technical, design, and business factors involved. Most likely including ones we can’t imagine now and that, if we were shown a Wikipedia article from the future describing one, we still wouldn’t understand it. So, there is a vast space of possibilities based on what we can already imagine, and there will probably be even more possibilities based on new technology and new science that we can’t yet grasp.
Saying “prepare now” sounds sensible and precautionary, but it’s not actionable.
Also, fundamental question: why is preparing earlier better? Let’s say in 2025 humanoid robots account for 0% of GDP (this seems true), in 2030 they’ll account for 1%, in 2040 for 25%, and in 2050 for 50%. What do we gain by trying to prepare while humanoid robots are at 0% of GDP? Once they’re at 1% of GDP, or even 0.5%, or 0.25%, we’ll have a lot more information than we do now. I imagine that 6 months spent studying the problem while the robots are at 1% of GDP will be worth much more than 5 years of research at the 0% level.
Perhaps a good analogy is scientific experiments. The value of doing theory or generating hypotheses in the absence of any experiments or observations — in the absence of any data, in other words — is minimal. For the sake of illustration, let’s imagine you’re curious about how new synthetic drugs analogous to LSD but chemically unlike any existing drugs — not like any known molecules at all — might affect the human mind. Could they make people smarter temporarily? Or perhaps cognitively impaired? Could they make people more altruistic and cooperative? Or perhaps paranoid and distrustful?
You have no data in this scenario: you can’t synthesize molecules, you can’t run simulations, there are no existing comparisons, natural or synthetic, and you certainly can’t test anything on humans or animals. All you can do is think about it.
Would time spent in this pre-empirical state of science (if it can be called science) have any value? Let’s say you were in that state for… 50 years… 100 years… 500 years… 1,000 years… would you learn anything? Would you gain any understanding? Would you get any closer to truth? I think you wouldn’t, or you would so marginally that it wouldn’t matter.
Then if you suddenly had data, if you could synthesize molecules, run simulations, and test drugs on live subjects, in a very short amount of time you would outstrip, many times over, whatever little knowledge you might have gained from just theorizing and hypothesizing about it. A year of experiments would be worth more than a century of thought. So, if for some reason, you knew you couldn’t start experiments for another hundred years, there would be very little value in thinking about the topic before then.
The whole AGI safety/alignment and AGI preparedness conversation seems to rely on the premise that non-empirical/pre-empirical science is possible, realistic, and valuable, and that if we, say, spend $10 million of grant money on it, it will have higher expected value than giving it to GiveWell’s top charities, or pandemic preparedness, or asteroid defense, or cancer research, or ice cream cones for kids at county fairs, or whatever else. I don’t see how this could be true. I don’t see how this can be justified. It seems like you basically might as well light the money on fire.
Empirical safety/alignment research on LLMs might have value if LLMs scale to AGI, but that’s a pretty big ‘if’. For over 15 years, up until — I’m not sure, maybe around 2016? — Yudkowsky and MIRI still thought symbolic AI would lead to AGI in the not-too-distant future. In retrospect, that looks extremely silly. (Actually, I thought it looked extremely silly at the time, and said so, and also got pushback from people in EA way back then too. Plus ça change! Maybe in 2035 we’ll be back here again.) The idea that symbolic AI could ever lead to AGI, even in 1,000 years, just looks unbelievably quaint where you compare symbolic AI systems to a system like AlphaGo, AlphaStar, or ChatGPT. Deep learning/deep RL-based systems still have quite rudimentary capabilities compared to the average human being, or, in some important ways, even compared to, say, a cat, and when you compare how much simpler and how much less capable symbolic AI systems are to these deep neural network-based systems, it’s ridiculous. Symbolic AI is not too different from conventional software, and the claim that symbolic AI would someday soon ascend to AGI feels not too different from the claim that, in the not-too-distant future, Microsoft Windows will learn how to think. The connection between symbolic AI and human general intelligence seems to boil down to, essentially, a loose metaphorical comparison between software/computers and human brains.
I don’t think the conflation of LLMs with human general intelligence is quite as ridiculous as it was with symbolic AI, but it is still quite ridiculous. Particularly when people make absurd and plainly false claims that GPT-4 is AGI (as Leopold Aschenbrenner did) or o3 is AGI (as Tyler Cowen did), or that GPT-4 is a “very weak AGI” (as Will MacAskill did). This seems akin to saying a hot air balloon is a spaceship, or a dog is a bicycle. It’s hard to even know what to say.
As for explicitly, substantively making the argument about why LLMs won’t scale to AGI, there are two distinct and independent arguments. The first argument involves pointing out the limits to LLM scaling. The second argument involves pointing out the fundamental research problems that scaling can’t solve.
I used to assume that people who care a lot about AGI alignment/safety as an urgent priority must have thoughtful replies to these sorts of arguments. Increasingly, I get the impression that most of those people have simply never thought about them before, and weren’t even aware such arguments existed.