I feel somewhat concerned that after reading your repeated writing saying “use your AGI to (metaphorically) burn all GPUs”, someone might actually do so, but of course their AGI isn’t actually aligned or powerful enough to do so without causing catastrophic collateral damage. At least the suggestion encourages AI race dynamics – because if you don’t make AGI first, someone else will try to burn all your GPUs! – and makes the AI safety community seem thoroughly supervillain-y.
Points 5 and 6 suggest that soon after someone develops AGI for the first time, they must use it to perform a pivotal act as powerful as “melt all GPUs”, or else we are doomed. I agree that figuring out how to align such a system seems extremely hard, especially if this is your first AGI. But aiming for such a pivotal act with your first AGI isn’t our only option, and this strategy seems much riskier than if we take some more time use our AGI to solve alignment further before attempting any pivotal acts. I think it’s plausible that all major AGI companies could stick to only developing AGIs that are (probably) not power-seeking for a decent number of years. Remember, even Yann LeCun of Facebook AI Research thinks that AGI should have strong safety measures. Further, we could have compute governance and monitoring to prevent rogue actors from developing AGI, at least until we solve alignment enough to entrust more capable AGIs to develop strong guarantees against random people developing misaligned superintelligences. (There are also similar comments and responses on LessWrong.)
Perhaps a crux here is that I’m more optimistic than you about things like slow takeoffs, AGI likely being at least 20 years out, the possibility of using weaker AGI to help supervise stronger AGI, and AI safety becoming mainstream. Still, I don’t think it’s helpful to claim that we must or even should aim to try to “burn all GPUs” with our first AGI, instead of considering alternative strategies.
I agree it’s not necessarily a good idea to go around founding the Let’s Commit A Pivotal Act AI Company.
But I think there’s room for subtlety somewhere like “Conditional on you being in a situation where you could take a pivotal act, which is a small and unusual fraction of world-branches, maybe you should take a pivotal act.”
That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)
Somewhere halfway between “found the Let’s Commit A Pivotal Act Company” and “if you happen to stumble into a pivotal act, take it”, there’s an intervention to spread a norm of “if a good person who cares about the world happens to stumble into a pivotal-act-capable AI, take the opportunity”. I don’t think this norm would necessarily accelerate a race. After all, bad people who want to seize power can take pivotal acts whether we want them to or not. The only people who are bound by norms are good people who care about the future of humanity. I, as someone with no loyalty to any individual AI team, would prefer that (good, norm-following) teams take pivotal acts if they happen to end up with the first superintelligence, rather than not doing that.
Another way to think about this is that all good people should be equally happy with any other good person creating a pivotal AGI, so they won’t need to race among themselves. They might be less happy with a bad person creating a pivotal AGI, but in that case you should race and you have no other option. I realize “good” and “bad” are very simplistic but I don’t think adding real moral complexity changes the calculation much.
I am more concerned about your point where someone rushes into a pivotal act without being sure their own AI is aligned. I agree this would be very dangerous, but it seems like a job for normal cost-benefit calculation: what’s the risk of your AI being unaligned if you act now, vs. someone else creating an unaligned AI if you wait X amount of time? Do we have any reason to think teams would be systematically biased when making this calculation?
I’m more confident than Scott that the first AGI systems will be capable enough to execute a pivotal act (though alignability is another matter!). And, unlike Scott, I think AGI orgs should take the option more seriously at an earlier date, and center more of their strategic thinking around this scenario class. But if you don’t agree with me there, I think you should endorse a position more like Scott’s.
The alternative seems to just amount to writing off futures where early AGI systems are highly capable or impactful — giving up in advance, effectively deciding that endorsing a strategy that sounds weirdly extreme is a larger price to pay than human extinction. Phrased in those terms, this seems obviously absurd. (More absurd if you agree with me that this would mean writing off most possible futures.)
Nuclear weapons were an extreme technological development in their day, and MAD was an extreme and novel strategy developed in response to the novel properties of nuclear weapons. Strategically novel technologies force us to revise our strategies in counter-intuitive ways. The responsible way to handle this is to seriously analyze the new strategic landscape, have conversations about it, and engage in dialogue between major players until we collectively have a clear-sighted picture of what strategy makes sense, even if that strategy sounds weirdly extreme relative to other strategic landscapes.
If there’s some alternative to intervening on AGI proliferation, then that seems important to know as well. But we should discover that, if so, via investigation, argument, and analysis of the strategic situation, rather than encouraging a mindset under which most of the relevant strategy space is taboo or evil (and then just hoping that this part of the strategy space doesn’t end up being relevant).
If someone manages to create a powerful AGI, and the only cost for most humans is that it burns their GPUs, this seems like an easy tradeoff for me. It’s not great, but it’s mostly a negligible problem for our species. But I do agree using governance and monitoring is a possible option. I’m normally a hardline libertarian/anarchist, but I’m fine going full Orwellian in this domain.
Strongly agreed. Somehow taking over the world and preventing anybody else from building AI seems like a core part of the plan for Yudkowsky and others. (When I asked about this on LW, somebody said they expected the first aligned AGI to implement global surveillance to prevent unaligned AGIs.) That sounds absolutely terrible—see risks from stable totalitarianism.
If Yudkowsky is right and the only way to save the world is by global domination, then I think we’re already doomed. But there’s lots of cruxes to his worldview: short timelines, short takeoff speeds, the difficulty of the alignment problem, the idea that AGI will be a single entity rather than many different systems in different domains. Most people in AI safety are not nearly as pessimistic. I’d much rather bet on the wide range of scenarios where his dire predictions are incorrect.
But this wouldn’t be global domination in any conventional sense. When humans implement such things, its methods are extremely harsh and inhibit freedoms on all levels of society. A human-run domination would need to enforce such measures with harsh prison time, executions, fear and intimidation, etc. But this is mostly because humans are not very smart, so they don’t know any other way to stop human y from doing x. A powerful AGI wouldn’t have this problem. I don’t think it would even have to be as crude as “burn all GPUs”. It could probably monitor and enforce things so efficiently that trying to create another AGI would be like trying to fight gravity. For a human, it would simply be that you can’t achieve it, no matter how many times you try, almost a new rule interwoven into the fabric of reality. This could probably be made less severe with an implementation such as “can’t achieve AGI that is above intelligence threshold X” or “poses X amount of risk to population”. In this less severe form, humans would still be free to develop AIs that could solve aging, cancer, space travel, etc., but couldn’t develop anything too powerful or dangerous.
I feel somewhat concerned that after reading your repeated writing saying “use your AGI to (metaphorically) burn all GPUs”, someone might actually do so, but of course their AGI isn’t actually aligned or powerful enough to do so without causing catastrophic collateral damage. At least the suggestion encourages AI race dynamics – because if you don’t make AGI first, someone else will try to burn all your GPUs! – and makes the AI safety community seem thoroughly supervillain-y.
Points 5 and 6 suggest that soon after someone develops AGI for the first time, they must use it to perform a pivotal act as powerful as “melt all GPUs”, or else we are doomed. I agree that figuring out how to align such a system seems extremely hard, especially if this is your first AGI. But aiming for such a pivotal act with your first AGI isn’t our only option, and this strategy seems much riskier than if we take some more time use our AGI to solve alignment further before attempting any pivotal acts. I think it’s plausible that all major AGI companies could stick to only developing AGIs that are (probably) not power-seeking for a decent number of years. Remember, even Yann LeCun of Facebook AI Research thinks that AGI should have strong safety measures. Further, we could have compute governance and monitoring to prevent rogue actors from developing AGI, at least until we solve alignment enough to entrust more capable AGIs to develop strong guarantees against random people developing misaligned superintelligences. (There are also similar comments and responses on LessWrong.)
Perhaps a crux here is that I’m more optimistic than you about things like slow takeoffs, AGI likely being at least 20 years out, the possibility of using weaker AGI to help supervise stronger AGI, and AI safety becoming mainstream. Still, I don’t think it’s helpful to claim that we must or even should aim to try to “burn all GPUs” with our first AGI, instead of considering alternative strategies.
Quoting Scott Alexander here:
I’m more confident than Scott that the first AGI systems will be capable enough to execute a pivotal act (though alignability is another matter!). And, unlike Scott, I think AGI orgs should take the option more seriously at an earlier date, and center more of their strategic thinking around this scenario class. But if you don’t agree with me there, I think you should endorse a position more like Scott’s.
The alternative seems to just amount to writing off futures where early AGI systems are highly capable or impactful — giving up in advance, effectively deciding that endorsing a strategy that sounds weirdly extreme is a larger price to pay than human extinction. Phrased in those terms, this seems obviously absurd. (More absurd if you agree with me that this would mean writing off most possible futures.)
Nuclear weapons were an extreme technological development in their day, and MAD was an extreme and novel strategy developed in response to the novel properties of nuclear weapons. Strategically novel technologies force us to revise our strategies in counter-intuitive ways. The responsible way to handle this is to seriously analyze the new strategic landscape, have conversations about it, and engage in dialogue between major players until we collectively have a clear-sighted picture of what strategy makes sense, even if that strategy sounds weirdly extreme relative to other strategic landscapes.
If there’s some alternative to intervening on AGI proliferation, then that seems important to know as well. But we should discover that, if so, via investigation, argument, and analysis of the strategic situation, rather than encouraging a mindset under which most of the relevant strategy space is taboo or evil (and then just hoping that this part of the strategy space doesn’t end up being relevant).
If someone manages to create a powerful AGI, and the only cost for most humans is that it burns their GPUs, this seems like an easy tradeoff for me. It’s not great, but it’s mostly a negligible problem for our species. But I do agree using governance and monitoring is a possible option. I’m normally a hardline libertarian/anarchist, but I’m fine going full Orwellian in this domain.
Strongly agreed. Somehow taking over the world and preventing anybody else from building AI seems like a core part of the plan for Yudkowsky and others. (When I asked about this on LW, somebody said they expected the first aligned AGI to implement global surveillance to prevent unaligned AGIs.) That sounds absolutely terrible—see risks from stable totalitarianism.
If Yudkowsky is right and the only way to save the world is by global domination, then I think we’re already doomed. But there’s lots of cruxes to his worldview: short timelines, short takeoff speeds, the difficulty of the alignment problem, the idea that AGI will be a single entity rather than many different systems in different domains. Most people in AI safety are not nearly as pessimistic. I’d much rather bet on the wide range of scenarios where his dire predictions are incorrect.
But this wouldn’t be global domination in any conventional sense. When humans implement such things, its methods are extremely harsh and inhibit freedoms on all levels of society. A human-run domination would need to enforce such measures with harsh prison time, executions, fear and intimidation, etc. But this is mostly because humans are not very smart, so they don’t know any other way to stop human y from doing x. A powerful AGI wouldn’t have this problem. I don’t think it would even have to be as crude as “burn all GPUs”. It could probably monitor and enforce things so efficiently that trying to create another AGI would be like trying to fight gravity. For a human, it would simply be that you can’t achieve it, no matter how many times you try, almost a new rule interwoven into the fabric of reality. This could probably be made less severe with an implementation such as “can’t achieve AGI that is above intelligence threshold X” or “poses X amount of risk to population”. In this less severe form, humans would still be free to develop AIs that could solve aging, cancer, space travel, etc., but couldn’t develop anything too powerful or dangerous.