I worry there’s a negative example bias in the section about working with AI companies/accumulating power and influence, vs. working outside the system.
You point to cases where something bad happened, and say that some of the people complacent in the bad thing didn’t protest because they wanted to accumulate power/influence within the system.
But these should be matched by looking for cases where something good happened because people tried to accumulate power/influence within a system.
I think this is a significant percent of all good things that have ever happened. Just to give a trivial example, slavery ended because people like Abraham Lincoln successfully accumulated power within the federal government, which at the time was pro-slavery and an enforcer of slavery. If abolitionists had tried to “stay pure” by refusing to run for office, they probably would have gotten nowhere.
Or an even clearer example: Jimmy Carter ended segregation in Georgia by pretending to be extremely racist, winning the gubernatorial election on the strength of the racist vote, then showing his true colors and ending segregation.
(is it cheating to use government as an example? I don’t think so—you mentioned Vietnam)
You also mention academia and say that maybe the desire of academics to “work within the system” prevents intellectual change. I would argue that any time an academic consensus changes—which historically has been pretty common—it’s been because someone worked within the system, got their PhD, and used their prestige to advocate for the new better position. If nobody who disagreed with an academic consensus ever did that, paradigms would never change, and academia would be much worse.
(here I think a good example is wokeness—the colleges are full of people who said they decided that such-and-such a field was racist and it was their duty to change it from within. Those people won, and they’ll keep winning until people with alternate ideologies are equally dedicated)
I also think there’s a bias in this space towards thinking that the current AI situation is maximally cursed compared to all counterfactuals. Suppose nobody who cared about alignment had founded an AI company. We’d still have Moore’s Law and compute costs would still go down. Using modern chips, it costs $20 to train a GPT-2 equivalent (this might be slightly conditioning on chip or algorithmic progress spurred by OpenAI, but I think it’s a useful comparison point). If OpenAI hadn’t done it, eventually someone else would have. So maybe in this world, since OpenAI/Anthropic/DeepMind don’t exist, the top AI companies are Google (not Deepmind), Meta, and Baidu, they’re 1-5 years of algorithmic progress and getting-scaling-running behind where they are now, and they all have the Yann LeCunn approach to alignment (or in Baidu’s case have never even heard the term). Is subtracting 1-5 years from timelines in exchange for making most big AI companies have alignment teams and at-least-mildly-concerned CEOs, a good trade? I can’t really say, but I don’t understand everyone else’s strong conviction that it isn’t. What would we have done with 1-5 years extra timeline? MIRI-style agent foundations research? Try to lobby politicians to pause a thing that wasn’t happening?
(in fact, for this counterfactual to be fair, there can’t be any alignment discussion at all—if there’s alignment discussion, it inspires Sam Altman and the rest. So I think we would just let those 1-5 years pass by without using them strategically, unless we can somehow do the alignment research in secret with no public community to speak of.)
I don’t want to argue that working within the system is definitely better—I’m on the fence, because of a combination of your considerations and the ones above. My cruxes are -
1. What is the chance that PauseAI activism will work?
2. If it does work, is there a plan for what to do with the pause?
3. Does pro-pause activism now complement or substitute for pro-pause activism later? (eg mid-intelligence-explosion when the case will hopefully be more obvious)
4. How much goodwill do we burn with AI companies per percent likelihood of an actually-useful AI pause that we gain? Are there different framings / forms of activism / target laws that would buy us better chances of a useful pause per unit of goodwill burnt?
5. If pro-pause activism burns goodwill, how effectively can we pull off a good cop / bad cop strategy as opposed to having the Unilateralist’s Curse poison the whole movement?
6. What’s the difference in p(doom) between a world where AI companies have 75th percentile vs. 25th percentile levels of goodwill towards the concept of alignment / friendly professional relationship with the alignment community?
Of these, I find myself thinking most about the last. My gut feeling is that nothing we do is going to matter, and the biggest difference between good outcomes and bad outcomes is how much work the big AI labs put into alignment during the middle of the intelligence explosion when progress moves fastest. The fulcrum for human extinction might look like a meeting between Sam Altman + top OpenAI executives where they decide whether to allocate 10% vs. 20% of their GPT-6 instances to alignment research. And the fulcrum for that fulcrum might be whether the executives think “Oh yeah, alignment, that thing that all the cool Silicon Valley people whose status hierarchy we want to climb agree is really important” vs. “Oh, the ideology of the hated socialist decel outgroup who it would be social suicide to associate ourselves with”. If we get five SB 1047 style bills at the cost of shifting from the first perspective to the second, I’m not sure we’re winning here (even if those bills don’t get vetoed). And the more you think that all past EA interventions have made things worse, the more concerned you should be about this (arguably—I admit it depends how you generalize).
Right now I lean towards trying to chart a muddy middle course, something like “support activism that seems especially efficient in getting things done per unit of AI company goodwill burnt”. But I am most optimistic about laying the groundwork for a pause campaign that might come later, in the middle of the intelligence explosion, when it will become obvious that something crazy is happening, and when Sam Altman will have all those spare GPT-6 instances which—if paused from doing capabilities research—can be turned to alignment.
But these should be matched by looking for cases where something good happened because people tried to accumulate power/influence within a system.
I think this is a significant percent of all good things that have ever happened.
I think you are right about this, you’ve changed my mind (toward greater uncertainty).
My gut feeling is that [...] the biggest difference between good outcomes and bad outcomes is how much work the big AI labs put into alignment during the middle of the intelligence explosion when progress moves fastest.
This seems to depend on a conjunction of several strong assumptions: (1) AI alignment is basically easy; (2) there will be a slow takeoff; (3) the people running AI companies are open to persuasion, and “make AI safety seem cool” is the best kind of persuasion.
But then again I don’t think pause protests are going to work, I’m just trying to pick whichever bad plan seems the least bad.
2. I agree I’m assuming there will be a slow takeoff (operationalized as let’s say a ~one year period where GPT-integer-increment-level-changes happen on a scale of months, before any such period where they happen on a scale of days).
3. AI companies being open to persuasion seems kind of trivial to me. They already have alignment teams. They already (I assume) have budget meetings where they discuss how many resources these teams should get. I’m just imagining inputs into this regular process. I agree that issues around politics could be a lesser vs. greater input.
1. I wouldn’t frame this as alignment is easy/hard, so much as “alignment is more refractory to 10,000 copies of GPT-6 working for a subjective century” vs. “alignment is more refractory to one genius, not working at a lab, coming up with a new paradigm using only current or slightly-above-current AIs as model organisms, in a sense where we get one roll at this per calendar year”.
Not to tell you what to do, but I’d love to see a longer ACX post making these arguments Scott :). Seems like it could be rich seed for discussion; almost all the writing I’ve seen in rationalist spaces around these issues has been anti-build-goodwill/influence-from-within, and I think your counterpoints here are strong
Great comment. However, lots of people concerned with safety have quit OpenAI. Wouldn’t you expect them to continue working at OpenAI if they thought your argument was correct?
BTW, there might be ways to achieve slowdown through methods other than traditional activism. This could buy valuable time for alignment work without hurting the “AI alignment” brand. For example:
I worry there’s a negative example bias in the section about working with AI companies/accumulating power and influence, vs. working outside the system.
You point to cases where something bad happened, and say that some of the people complacent in the bad thing didn’t protest because they wanted to accumulate power/influence within the system.
But these should be matched by looking for cases where something good happened because people tried to accumulate power/influence within a system.
I think this is a significant percent of all good things that have ever happened. Just to give a trivial example, slavery ended because people like Abraham Lincoln successfully accumulated power within the federal government, which at the time was pro-slavery and an enforcer of slavery. If abolitionists had tried to “stay pure” by refusing to run for office, they probably would have gotten nowhere.
Or an even clearer example: Jimmy Carter ended segregation in Georgia by pretending to be extremely racist, winning the gubernatorial election on the strength of the racist vote, then showing his true colors and ending segregation.
(is it cheating to use government as an example? I don’t think so—you mentioned Vietnam)
You also mention academia and say that maybe the desire of academics to “work within the system” prevents intellectual change. I would argue that any time an academic consensus changes—which historically has been pretty common—it’s been because someone worked within the system, got their PhD, and used their prestige to advocate for the new better position. If nobody who disagreed with an academic consensus ever did that, paradigms would never change, and academia would be much worse.
(here I think a good example is wokeness—the colleges are full of people who said they decided that such-and-such a field was racist and it was their duty to change it from within. Those people won, and they’ll keep winning until people with alternate ideologies are equally dedicated)
I also think there’s a bias in this space towards thinking that the current AI situation is maximally cursed compared to all counterfactuals. Suppose nobody who cared about alignment had founded an AI company. We’d still have Moore’s Law and compute costs would still go down. Using modern chips, it costs $20 to train a GPT-2 equivalent (this might be slightly conditioning on chip or algorithmic progress spurred by OpenAI, but I think it’s a useful comparison point). If OpenAI hadn’t done it, eventually someone else would have. So maybe in this world, since OpenAI/Anthropic/DeepMind don’t exist, the top AI companies are Google (not Deepmind), Meta, and Baidu, they’re 1-5 years of algorithmic progress and getting-scaling-running behind where they are now, and they all have the Yann LeCunn approach to alignment (or in Baidu’s case have never even heard the term). Is subtracting 1-5 years from timelines in exchange for making most big AI companies have alignment teams and at-least-mildly-concerned CEOs, a good trade? I can’t really say, but I don’t understand everyone else’s strong conviction that it isn’t. What would we have done with 1-5 years extra timeline? MIRI-style agent foundations research? Try to lobby politicians to pause a thing that wasn’t happening?
(in fact, for this counterfactual to be fair, there can’t be any alignment discussion at all—if there’s alignment discussion, it inspires Sam Altman and the rest. So I think we would just let those 1-5 years pass by without using them strategically, unless we can somehow do the alignment research in secret with no public community to speak of.)
I don’t want to argue that working within the system is definitely better—I’m on the fence, because of a combination of your considerations and the ones above. My cruxes are -
1. What is the chance that PauseAI activism will work?
2. If it does work, is there a plan for what to do with the pause?
3. Does pro-pause activism now complement or substitute for pro-pause activism later? (eg mid-intelligence-explosion when the case will hopefully be more obvious)
4. How much goodwill do we burn with AI companies per percent likelihood of an actually-useful AI pause that we gain? Are there different framings / forms of activism / target laws that would buy us better chances of a useful pause per unit of goodwill burnt?
5. If pro-pause activism burns goodwill, how effectively can we pull off a good cop / bad cop strategy as opposed to having the Unilateralist’s Curse poison the whole movement?
6. What’s the difference in p(doom) between a world where AI companies have 75th percentile vs. 25th percentile levels of goodwill towards the concept of alignment / friendly professional relationship with the alignment community?
Of these, I find myself thinking most about the last. My gut feeling is that nothing we do is going to matter, and the biggest difference between good outcomes and bad outcomes is how much work the big AI labs put into alignment during the middle of the intelligence explosion when progress moves fastest. The fulcrum for human extinction might look like a meeting between Sam Altman + top OpenAI executives where they decide whether to allocate 10% vs. 20% of their GPT-6 instances to alignment research. And the fulcrum for that fulcrum might be whether the executives think “Oh yeah, alignment, that thing that all the cool Silicon Valley people whose status hierarchy we want to climb agree is really important” vs. “Oh, the ideology of the hated socialist decel outgroup who it would be social suicide to associate ourselves with”. If we get five SB 1047 style bills at the cost of shifting from the first perspective to the second, I’m not sure we’re winning here (even if those bills don’t get vetoed). And the more you think that all past EA interventions have made things worse, the more concerned you should be about this (arguably—I admit it depends how you generalize).
Right now I lean towards trying to chart a muddy middle course, something like “support activism that seems especially efficient in getting things done per unit of AI company goodwill burnt”. But I am most optimistic about laying the groundwork for a pause campaign that might come later, in the middle of the intelligence explosion, when it will become obvious that something crazy is happening, and when Sam Altman will have all those spare GPT-6 instances which—if paused from doing capabilities research—can be turned to alignment.
I think you are right about this, you’ve changed my mind (toward greater uncertainty).
This seems to depend on a conjunction of several strong assumptions: (1) AI alignment is basically easy; (2) there will be a slow takeoff; (3) the people running AI companies are open to persuasion, and “make AI safety seem cool” is the best kind of persuasion.
But then again I don’t think pause protests are going to work, I’m just trying to pick whichever bad plan seems the least bad.
2. I agree I’m assuming there will be a slow takeoff (operationalized as let’s say a ~one year period where GPT-integer-increment-level-changes happen on a scale of months, before any such period where they happen on a scale of days).
3. AI companies being open to persuasion seems kind of trivial to me. They already have alignment teams. They already (I assume) have budget meetings where they discuss how many resources these teams should get. I’m just imagining inputs into this regular process. I agree that issues around politics could be a lesser vs. greater input.
1. I wouldn’t frame this as alignment is easy/hard, so much as “alignment is more refractory to 10,000 copies of GPT-6 working for a subjective century” vs. “alignment is more refractory to one genius, not working at a lab, coming up with a new paradigm using only current or slightly-above-current AIs as model organisms, in a sense where we get one roll at this per calendar year”.
Not to tell you what to do, but I’d love to see a longer ACX post making these arguments Scott :). Seems like it could be rich seed for discussion; almost all the writing I’ve seen in rationalist spaces around these issues has been anti-build-goodwill/influence-from-within, and I think your counterpoints here are strong
Great comment. However, lots of people concerned with safety have quit OpenAI. Wouldn’t you expect them to continue working at OpenAI if they thought your argument was correct?
BTW, there might be ways to achieve slowdown through methods other than traditional activism. This could buy valuable time for alignment work without hurting the “AI alignment” brand. For example:
Commoditize LLMs to reduce the incentive for large training runs
Point out to NVidia AI is on track to kill their company
(I encourage people to signal boost these ideas if they seem good. No attribution necessary.)