I think the blunt MIRI-statement you’re wanting is here:
Capabilities work is currently a bad idea
Nate’s top-level view is that ideally, Earth should take a break on doing work that might move us closer to AGI, until we understand alignment better.
That move isn’t available to us, but individual researchers and organizations who choose not to burn the timeline are helping the world, even if other researchers and orgs don’t reciprocate. You can unilaterally lengthen timelines, and give humanity more chances of success, by choosing not to personally shorten them.
Nate thinks capabilities work is currently a bad idea for a few reasons:
He doesn’t buy that current capabilities work is a likely path to ultimately solving alignment.
Insofar as current capabilities work does seem helpful for alignment, it strikes him as helping with parallelizable research goals, whereas our bottleneck is serial research goals. (See A note about differential technological development.)
Nate doesn’t buy that we need more capabilities progress before we can start finding a better path.
[...]
On Nate’s view, the field should do experiments with ML systems, not just abstract theory. But if he were magically in charge of the world’s collective ML efforts, he would put a pause on further capabilities work until we’ve had more time to orient to the problem, consider the option space, and think our way to some sort of plan-that-will-actually-probably-work. It’s not as though we’re hurting for ML systems to study today, and our understanding already lags far behind today’s systems’ capabilities.
[...]
Nate thinks that DeepMind, OpenAI, Anthropic, FAIR, Google Brain, etc. should hit the pause button on capabilities work (or failing that, at least halt publishing). (And he thinks any one actor can unilaterally do good in the process, even if others aren’t reciprocating.)
Tangentially, I’ll note that you might not want MIRI to say “that move isn’t available to us”, if you think that it’s realistic to get the entire world to take a break on AGI work, and if you think that saying pessimistic things about this might make it harder to coordinate. (Because, e.g., this might require a bunch of actors to all put a lot of sustained work into building some special institution or law, that isn’t useful if you only half-succeed; and Alice might not put in this special work if she thinks Bob is unconditionally unwilling to coordinate, or if she’s confident that Carol is confident that Dan won’t coordinate.)
But this seems like a very unlikely possibility to me, so I currently see more value in just saying MIRI’s actual take; marginal timeline-lengthening actions can be useful even if we can’t actually put the whole world on pause for 20 years.
This is good, but I don’t think it goes far enough. And I agree with your comments re “might not want MIRI to say “that move isn’t available to us”″. It might not be realistic to get the entire world to take a break on AGI work, but it’s certainly conceivable, and I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?). It seems reasonable to direct marginal resources toward pushing for a moratorium on AGI rather than more alignment work (although I still think this should at least be tried too!)
Your’s and Nate’s statement still implicitly assumes that AGI capabilities orgs are “on our side”. The evidence is that they are clearly not. Demis is voicing caution at the same time that Google leadership have started a race with OpenAI (Microsoft). It’s out of Demis’ (and his seemingly toothless ethics board’s) hands. Less accepting what has been tantamount to “existential safety washing”, and more realpolitik, is needed. Better now might be to directly appeal to the public and policymakers. Or find a way to strategise with those with power. For example, should the UN Security Council be approached somehow? This isn’t “defection”.
I’m saying all this because I’m not afraid of treading on any toes. I don’t depend on EA money (or anyone’s money) for my livelihood or career[1]. I’m financially independent. In fact, my life is pretty good, all apart from facing impending doom from this! I mean, I don’t need to work to survive[2], I’ve got an amazing partner and and a supportive family. All that is missing is existential security! I’d be happy to have “completed it mate” (i.e I’ve basically done this with the normal life of house, car, spouse, family, financial security etc); but I haven’t - remaining is this small issue of surviving for a normal lifespan, having my children survive and thrive / ensuring the continuation of the sentient universe as we know it...
I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?).
I think it’s a lot more realistic to solve alignment than to delay AGI by 50 years. I’d guess that delaying AGI by 10 years is maybe easier than alignment, but it also doesn’t solve anything unless we can use those 10 years to figure out alignment as well. For that matter, delaying by 50 years also requires that we solve alignment in that timeframe, unless we’re trying to buy time to do some third other thing.
The difficulty of alignment is also a lot more uncertain than the difficulty of delaying AGI: it depends more on technical questions that are completely unknown from our current perspective. Delaying AGI by decades is definitely very hard, whereas the difficulty of alignment is mostly a question mark.
All of that suggests to me that alignment is far more important as a way to spend marginal resources today, but we should try to do both if there are sane ways to pursue both options today.
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection (including centuries of delaying AGI, if that turned out to be needed, without permanently damaging or crippling humanity).
(There are alternatives to aligned AGI that strike me as promising enough to be worth pursuing. E.g., maybe humans can build Drexlerian nanofactories without help from AGI, and can leverage this for a pivotal act. But these all currently seem to me like even bigger longshots than the alignment problem, so I’m not currently eager to direct resources away from (relatively well-aimed, non-capabilities-synergistic) alignment research for this purpose.)
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI. (Taking into account that we have no litmus test for what counts as an “AGI” and we don’t know what range of algorithms or what amounts of compute you’d need to exclude in order to be sure you’ve blocked AGI. So a regulation that blocks AGI for fifty years would probably need to block a ton of other things.)
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility? Or perhaps a detailed plan for implementing it? (Or even the seemingly very unlikely ”..there’s nothing to worry about”)), which would allow us to iterate on a better strategy (we shouldn’t assume that our outlook will be the same after 10 years!)
but we should try to do both if there are sane ways to pursue both options today.
Yes! (And I think there are sane ways).
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
There are people working on this (e.g. Yampolskiy, Landry & Ellen), and this is definitely something I want to spend more time on (note that the writings so far could definitely do with a more accessible distillation).
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection
I really don’t think we need to worry about this now. AGI x-risk is an emergency—we need to deal with that emergency first (e.g. kick the can down the road 10 years with a moratorium on AGI research); then when we can relax a little, we can have the luxury to think about long term flourishing.
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI.
I think this can definitely be argued against (and I will try and write more as/when I make a more fleshed out post calling for a global AGI moratorium). For a start, without all the work on nuclear proliferation and risk, we may well not be here today. Yes there has been proliferation, but there hasn’t been an all-out nuclear exchange yet! It’s now 77 years since a nuclear weapon was used in anger. That’s a pretty big result I think! Also, global taboos around bio topics such as human genetic engineering are well established. If such a taboo is established, enforcement becomes a lesser concern, as you are then only fighting against isolated rogue elements rather than established megacorporations. Katja Grace discusses such taboos in her post on slowing down AI.
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Fair point. I think we should be thinking much wider than EA here. This needs to become mainstream, and fast.
Also, I should say that I don’t think MIRI should necessarily be diverting resources to work on a moratorium. Alignment is your comparative advantage so you should probably stick to that. What I’m saying is that you should be publicly and loudly calling for a moratorium. That would be very easy for you to do (a quick blog post/press release). But it could have a huge effect in terms of shifting the Overton Window on this. As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility?
Agreed on all counts! Though as someone who’s been working in this area for 10 years, I have a newfound appreciation for how little intellectual progress can easily end up happening in a 10-year period...
(Or even the seemingly very unlikely ”..there’s nothing to worry about”)
I have a lot of hopes that seem possible enough to me to be worth thinking about, but this specific hope isn’t one of them. Alignment may turn out to be easier than expected, but I think we can mostly rule out “AGI is just friendly by default”.
But it could have a huge effect in terms of shifting the Overton Window on this.
In which direction?
:P
I’m joking, though I do take seriously that there are proposals that might be better signal-boosted by groups other than MIRI. But if you come up with a fuller proposal you want lots of sane people to signal-boost, do send it to MIRI so we can decide if we like it; and if we like it as a sufficiently-realistic way to lengthen timelines, I predict that we’ll be happy to signal-boost it and say as much.
As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
I strongly agree and think it’s right that people… like, put some human feeling into their words, if they agree about how fucked up this situation is? (At least if they find it natural to do so.)
I think the blunt MIRI-statement you’re wanting is here:
Tangentially, I’ll note that you might not want MIRI to say “that move isn’t available to us”, if you think that it’s realistic to get the entire world to take a break on AGI work, and if you think that saying pessimistic things about this might make it harder to coordinate. (Because, e.g., this might require a bunch of actors to all put a lot of sustained work into building some special institution or law, that isn’t useful if you only half-succeed; and Alice might not put in this special work if she thinks Bob is unconditionally unwilling to coordinate, or if she’s confident that Carol is confident that Dan won’t coordinate.)
But this seems like a very unlikely possibility to me, so I currently see more value in just saying MIRI’s actual take; marginal timeline-lengthening actions can be useful even if we can’t actually put the whole world on pause for 20 years.
This is good, but I don’t think it goes far enough. And I agree with your comments re “might not want MIRI to say “that move isn’t available to us”″. It might not be realistic to get the entire world to take a break on AGI work, but it’s certainly conceivable, and I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?). It seems reasonable to direct marginal resources toward pushing for a moratorium on AGI rather than more alignment work (although I still think this should at least be tried too!)
Your’s and Nate’s statement still implicitly assumes that AGI capabilities orgs are “on our side”. The evidence is that they are clearly not. Demis is voicing caution at the same time that Google leadership have started a race with OpenAI (Microsoft). It’s out of Demis’ (and his seemingly toothless ethics board’s) hands. Less accepting what has been tantamount to “existential safety washing”, and more realpolitik, is needed. Better now might be to directly appeal to the public and policymakers. Or find a way to strategise with those with power. For example, should the UN Security Council be approached somehow? This isn’t “defection”.
I’m saying all this because I’m not afraid of treading on any toes. I don’t depend on EA money (or anyone’s money) for my livelihood or career[1]. I’m financially independent. In fact, my life is pretty good, all apart from facing impending doom from this! I mean, I don’t need to work to survive[2], I’ve got an amazing partner and and a supportive family. All that is missing is existential security! I’d be happy to have “completed it mate” (i.e I’ve basically done this with the normal life of house, car, spouse, family, financial security etc); but I haven’t - remaining is this small issue of surviving for a normal lifespan, having my children survive and thrive / ensuring the continuation of the sentient universe as we know it...
Although I still care about my reputation in EA to be fair (can’t really avoid this as a human)
All my EA work is voluntary
I think it’s a lot more realistic to solve alignment than to delay AGI by 50 years. I’d guess that delaying AGI by 10 years is maybe easier than alignment, but it also doesn’t solve anything unless we can use those 10 years to figure out alignment as well. For that matter, delaying by 50 years also requires that we solve alignment in that timeframe, unless we’re trying to buy time to do some third other thing.
The difficulty of alignment is also a lot more uncertain than the difficulty of delaying AGI: it depends more on technical questions that are completely unknown from our current perspective. Delaying AGI by decades is definitely very hard, whereas the difficulty of alignment is mostly a question mark.
All of that suggests to me that alignment is far more important as a way to spend marginal resources today, but we should try to do both if there are sane ways to pursue both options today.
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection (including centuries of delaying AGI, if that turned out to be needed, without permanently damaging or crippling humanity).
(There are alternatives to aligned AGI that strike me as promising enough to be worth pursuing. E.g., maybe humans can build Drexlerian nanofactories without help from AGI, and can leverage this for a pivotal act. But these all currently seem to me like even bigger longshots than the alignment problem, so I’m not currently eager to direct resources away from (relatively well-aimed, non-capabilities-synergistic) alignment research for this purpose.)
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI. (Taking into account that we have no litmus test for what counts as an “AGI” and we don’t know what range of algorithms or what amounts of compute you’d need to exclude in order to be sure you’ve blocked AGI. So a regulation that blocks AGI for fifty years would probably need to block a ton of other things.)
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility? Or perhaps a detailed plan for implementing it? (Or even the seemingly very unlikely ”..there’s nothing to worry about”)), which would allow us to iterate on a better strategy (we shouldn’t assume that our outlook will be the same after 10 years!)
Yes! (And I think there are sane ways).
There are people working on this (e.g. Yampolskiy, Landry & Ellen), and this is definitely something I want to spend more time on (note that the writings so far could definitely do with a more accessible distillation).
I really don’t think we need to worry about this now. AGI x-risk is an emergency—we need to deal with that emergency first (e.g. kick the can down the road 10 years with a moratorium on AGI research); then when we can relax a little, we can have the luxury to think about long term flourishing.
I think this can definitely be argued against (and I will try and write more as/when I make a more fleshed out post calling for a global AGI moratorium). For a start, without all the work on nuclear proliferation and risk, we may well not be here today. Yes there has been proliferation, but there hasn’t been an all-out nuclear exchange yet! It’s now 77 years since a nuclear weapon was used in anger. That’s a pretty big result I think! Also, global taboos around bio topics such as human genetic engineering are well established. If such a taboo is established, enforcement becomes a lesser concern, as you are then only fighting against isolated rogue elements rather than established megacorporations. Katja Grace discusses such taboos in her post on slowing down AI.
Fair point. I think we should be thinking much wider than EA here. This needs to become mainstream, and fast.
Also, I should say that I don’t think MIRI should necessarily be diverting resources to work on a moratorium. Alignment is your comparative advantage so you should probably stick to that. What I’m saying is that you should be publicly and loudly calling for a moratorium. That would be very easy for you to do (a quick blog post/press release). But it could have a huge effect in terms of shifting the Overton Window on this. As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
Agreed on all counts! Though as someone who’s been working in this area for 10 years, I have a newfound appreciation for how little intellectual progress can easily end up happening in a 10-year period...
I have a lot of hopes that seem possible enough to me to be worth thinking about, but this specific hope isn’t one of them. Alignment may turn out to be easier than expected, but I think we can mostly rule out “AGI is just friendly by default”.
In which direction?
:P
I’m joking, though I do take seriously that there are proposals that might be better signal-boosted by groups other than MIRI. But if you come up with a fuller proposal you want lots of sane people to signal-boost, do send it to MIRI so we can decide if we like it; and if we like it as a sufficiently-realistic way to lengthen timelines, I predict that we’ll be happy to signal-boost it and say as much.
I strongly agree and think it’s right that people… like, put some human feeling into their words, if they agree about how fucked up this situation is? (At least if they find it natural to do so.)