Re existential security, what are your AGI timelines and p(doom|AGI) like, and do you support efforts calling for a global moratorium on AGI (to allow time for alignment research to catch up / establish the possibility of alignment of superintelligent AI)?
As for existential risk, my current very tentative forecast is that the world state at the end of 2100 to look something like:
73% - the world in 2100 looks broadly like it does now (in 2023) in the same sense that the current 2023 world looks broadly like it did in 1946. That is to say of course there will be a lot of technological and sociological change between now and then but by the end of 2100 there still won’t be unprecedented explosive economic growth (e.g.., >30% GWP growth per year), no existential disaster, etc.
9% - the world is in a singleton state controlled by an unaligned rogue AI acting on its own initiative.
6% - the future is good for humans but our AI / post-AI society causes some other moral disaster (e.g., widespread abuse of digital minds, widespread factory farming)
5% - we get aligned AI, solve the time of perils, and have a really great future
4% - the world is in a singleton state controlled by an AI-enabled dictatorship that was initiated by some human actor misusing AI intentionally
1% - all humans are extinct due to an unaligned rogue AI acting on its own initiative
2% - all humans are extinct due to something else on this list (e.g., some other AI scenario, nukes, biorisk, unknown unknowns)
I think conditional on producing minimal menace AI by the end of 2070, there’s a 28% chance an existential risk would follow within the next 100 years that could be attributed to that AI system.
9% - the world is in a singleton state controlled by an unaligned rogue AI acting on its own initiative. …
1% - all humans are extinct due to an unaligned rogue AI acting on its own initiative
This is interesting and something I haven’t seen much expressed within EA. What is happening in the 8% where the humans are still around and the unaligned singleton rogue AI is acting on it’s own initiative? Does it just take decades to wipe all the humans out? Are there digital uploads of (some) humans for the purposes of information saving?[1] Is a ceiling on intelligence/capability hit upon by the AI which means humans retain some economic niches? Is the misalignment only partial, so that the AI somehow shares some of humanity’s values (enough to keep us around)?
I think conditional on producing minimal menace AI by the end of 2070, there’s a 28% chance an existential risk would follow within the next 100 years that could be attributed to that AI system.
Re existential security, what are your AGI timelines
I have trouble understanding what “AGI” specifically refers to and I don’t think it’s the best way to think about risks from AI. As you may know, in addition to being co-CEO at Rethink Priorities, I take forecasting seriously as a hobby and people actually for some reason pay me to forecast, making me a professional forecaster. So I think a lot in terms of concrete resolution criteria for forecasting questions and my thinking on these questions has actually been meaningfully bottlenecked right now by not knowing what those concrete resolution criteria are.
That being said, being a good thinker also involves having to figure out how to operate in some sort of undefined grey space, and so I should be at least somewhat comfortable enough with compute trends, algorithmic progress, etc. to be able to give some sort of answer. And so I think for the type of AI that I struggle to define but am worried about – the kind that has the capability of autonomously causing existential risk – the kind of AI that AI researcher Caroline Jeanmaire refers to as the “minimal menace” – I am willing to tentatively put the following distribution on that:
5% probability of happening before 2035
20% probability of before 2041
50% probability of before 2054
80% probability of before 2400
(Though of course that’s my opinion on my distribution, not saying that Caroline or others would agree.)
To be clear that’s an unconditional distribution, so it includes the possibility of us not producing “minimal menace” AI because we go extinct from something else first. It includes the possibility of AI development being severely delayed due to war or other disasters, the possibility of policy delaying AI development, etc.
I’m still actively working on refining this view so it may well change soon. But this is my current best guess.
Thanks for your detailed answers Peter. Caroline Jenmaire’s “minimal menace” is a good definition of AGI for our purposes (but also so is Holden Karnofsky’s PASTA, OpenPhil’s Transformative AI and Matthew Barnett’s TAI.)
I’m curious about your 5% by 2035 figure. Has this changed much as a result of GPT-4? And what is happening in the remaining 95%? How much of that is extra “secret sauce” remaining undiscovered? A big reason for me updating so heavily toward AGI being near (and correspondingly, doom being high given the woeful state-of-the-art in Alignment) is the realisation that there very well may be no additional secret sauce necessary and all that is needed is more compute and data (read: money) being thrown at it (and 2 OOMs increase in training FLOP over GPT-4 is possible within 6-12 months).
the possibility of policy delaying AI development
How likely do you consider this to be, conditional on business as usual? I think things are moving in the right direction, but we can’t afford to be complacent. Indeed we should be pushing maximally for it to happen (to the point where, to me, almost anything else looks like “rearranging deckchairs on the Titanic”).
Whilst I may not be a professional forecaster, I am a successful investor and I think I have a reasonable track record of being early to a number of significant global trends: veganism (2005), cryptocurrency (several big wins from investing early—BTC, ETH, DOT, SOL, KAS; maybe a similar amount of misses but overall up ~1000x), Covid (late Jan 2020), AI x-risk (2009), AGI moratorium (2023, a few days before the FLI letter went public).
do you support efforts calling for a global moratorium on AGI (to allow time for alignment research to catch up / establish the possibility of alignment of superintelligent AI)?
I’m definitely interested in seeing these ideas explored, but I want to be careful before getting super into it. My guess is that a global moratorium would not be politically feasible. But pushing for a global moratorium could still be worthwhile to pursue even if it is unlikely to happen as it could be a good galvanizing ask that brings more general attention to AI safety issues and make other policy asks seem more reasonable by comparison. I’d like to see more thinking about this.
On the merits of the actual policy, I am unsure whether a moratorium is a good idea. My concern is may just produce a larger compute overhang which could increase the likelihood of future discontinuous and hard-to-control AI progress.
Some people in our community have been convinced that an immediate and lengthy AI moratorium is a necessary condition for human survival, but I don’t currently share that assessment.
Re compute overhang, I don’t think this is a defeater. We need the moratorium to be indefinite, and only lifted when there is a global consensus on an alignment solution (and perhaps even a global referendum on pressing go on more powerful foundation models).
Some people in our community have been convinced that an immediate and lengthy AI moratorium is a necessary condition for human survival, but I don’t currently share that assessment.
This makes sense given your timelines and p(doom) outlined above. But I urge you (and others reading) to reconsider the level of danger we are now in[1].
Re existential security, what are your AGI timelines and p(doom|AGI) like, and do you support efforts calling for a global moratorium on AGI (to allow time for alignment research to catch up / establish the possibility of alignment of superintelligent AI)?
As for existential risk, my current very tentative forecast is that the world state at the end of 2100 to look something like:
73% - the world in 2100 looks broadly like it does now (in 2023) in the same sense that the current 2023 world looks broadly like it did in 1946. That is to say of course there will be a lot of technological and sociological change between now and then but by the end of 2100 there still won’t be unprecedented explosive economic growth (e.g.., >30% GWP growth per year), no existential disaster, etc.
9% - the world is in a singleton state controlled by an unaligned rogue AI acting on its own initiative.
6% - the future is good for humans but our AI / post-AI society causes some other moral disaster (e.g., widespread abuse of digital minds, widespread factory farming)
5% - we get aligned AI, solve the time of perils, and have a really great future
4% - the world is in a singleton state controlled by an AI-enabled dictatorship that was initiated by some human actor misusing AI intentionally
1% - all humans are extinct due to an unaligned rogue AI acting on its own initiative
2% - all humans are extinct due to something else on this list (e.g., some other AI scenario, nukes, biorisk, unknown unknowns)
I think conditional on producing minimal menace AI by the end of 2070, there’s a 28% chance an existential risk would follow within the next 100 years that could be attributed to that AI system.
Though I don’t know how seriously you should take this, because forecasting >75 years into the future is very hard.
Also my views of this are very incomplete and in flux and I look forward to refining them and writing more about them publicly.
This is interesting and something I haven’t seen much expressed within EA. What is happening in the 8% where the humans are still around and the unaligned singleton rogue AI is acting on it’s own initiative? Does it just take decades to wipe all the humans out? Are there digital uploads of (some) humans for the purposes of information saving?[1] Is a ceiling on intelligence/capability hit upon by the AI which means humans retain some economic niches? Is the misalignment only partial, so that the AI somehow shares some of humanity’s values (enough to keep us around)?
Does this mean that you think we get alignment by default? Or alignment is on track to be solved on this timeline? Or somehow we survive misaligned AI (as per the above discrepancy between your estimates for singleton unaligned rogue AI and human extinction)? As per my previous comment, I think the default outcome of AGI is doom with high likelihood (and haven’t received any satisfactory answers to the question If your AGI x-risk estimates are low, what scenarios make up the bulk of your expectations for an OK outcome?
This still seems like pretty much an existential catastrophe in my book, even if it isn’t technically extinction.
Thanks for elaborating, Peter! Do you mind sharing how you obtained those probabilities? Are they your subjective guesses?
I have trouble understanding what “AGI” specifically refers to and I don’t think it’s the best way to think about risks from AI. As you may know, in addition to being co-CEO at Rethink Priorities, I take forecasting seriously as a hobby and people actually for some reason pay me to forecast, making me a professional forecaster. So I think a lot in terms of concrete resolution criteria for forecasting questions and my thinking on these questions has actually been meaningfully bottlenecked right now by not knowing what those concrete resolution criteria are.
That being said, being a good thinker also involves having to figure out how to operate in some sort of undefined grey space, and so I should be at least somewhat comfortable enough with compute trends, algorithmic progress, etc. to be able to give some sort of answer. And so I think for the type of AI that I struggle to define but am worried about – the kind that has the capability of autonomously causing existential risk – the kind of AI that AI researcher Caroline Jeanmaire refers to as the “minimal menace” – I am willing to tentatively put the following distribution on that:
5% probability of happening before 2035
20% probability of before 2041
50% probability of before 2054
80% probability of before 2400
(Though of course that’s my opinion on my distribution, not saying that Caroline or others would agree.)
To be clear that’s an unconditional distribution, so it includes the possibility of us not producing “minimal menace” AI because we go extinct from something else first. It includes the possibility of AI development being severely delayed due to war or other disasters, the possibility of policy delaying AI development, etc.
I’m still actively working on refining this view so it may well change soon. But this is my current best guess.
Thanks for your detailed answers Peter. Caroline Jenmaire’s “minimal menace” is a good definition of AGI for our purposes (but also so is Holden Karnofsky’s PASTA, OpenPhil’s Transformative AI and Matthew Barnett’s TAI.)
I’m curious about your 5% by 2035 figure. Has this changed much as a result of GPT-4? And what is happening in the remaining 95%? How much of that is extra “secret sauce” remaining undiscovered? A big reason for me updating so heavily toward AGI being near (and correspondingly, doom being high given the woeful state-of-the-art in Alignment) is the realisation that there very well may be no additional secret sauce necessary and all that is needed is more compute and data (read: money) being thrown at it (and 2 OOMs increase in training FLOP over GPT-4 is possible within 6-12 months).
How likely do you consider this to be, conditional on business as usual? I think things are moving in the right direction, but we can’t afford to be complacent. Indeed we should be pushing maximally for it to happen (to the point where, to me, almost anything else looks like “rearranging deckchairs on the Titanic”).
Whilst I may not be a professional forecaster, I am a successful investor and I think I have a reasonable track record of being early to a number of significant global trends: veganism (2005), cryptocurrency (several big wins from investing early—BTC, ETH, DOT, SOL, KAS; maybe a similar amount of misses but overall up ~1000x), Covid (late Jan 2020), AI x-risk (2009), AGI moratorium (2023, a few days before the FLI letter went public).
I’m definitely interested in seeing these ideas explored, but I want to be careful before getting super into it. My guess is that a global moratorium would not be politically feasible. But pushing for a global moratorium could still be worthwhile to pursue even if it is unlikely to happen as it could be a good galvanizing ask that brings more general attention to AI safety issues and make other policy asks seem more reasonable by comparison. I’d like to see more thinking about this.
On the merits of the actual policy, I am unsure whether a moratorium is a good idea. My concern is may just produce a larger compute overhang which could increase the likelihood of future discontinuous and hard-to-control AI progress.
Some people in our community have been convinced that an immediate and lengthy AI moratorium is a necessary condition for human survival, but I don’t currently share that assessment.
Good to see that you think the ideas should be explored. I think a global moratorium is becoming more feasible, given the UN Security Council meeting on AI, The UK Summit, the Statement on AI risk, public campaigns etc.
Re compute overhang, I don’t think this is a defeater. We need the moratorium to be indefinite, and only lifted when there is a global consensus on an alignment solution (and perhaps even a global referendum on pressing go on more powerful foundation models).
This makes sense given your timelines and p(doom) outlined above. But I urge you (and others reading) to reconsider the level of danger we are now in[1].
Or, ahem, to rethink your priorities (sorry).