I want to register that my perspective on medium-term[1] AI existential risk (shortened to AIXR from now on) has changed quite a lot this year. Currently, Iād describe it as moving from āDeep Uncertaintyā to ārisk is low in absolute terms, but high enough to be concerned aboutā. I guess atm Iād think that my estimates are moving closer toward the Superforecasters in the recent XPT report (though Iād still say Iām still Deeply Uncertain on this issue, to the extent that I donāt think the probability calculus is that meaningful to apply)
Some points around this change:
Iām not sure itās meaningful to cleanly distinguish AIXR from other anthropogenic x-risks, especially since negative consequences of AI may plausibly increase other x-risks (e.g. from Nuclear War, biosecurity, Climate Change etc.)
I think in practice, the most likely risks from AI would come from deployment of powerful systems that have catastrophic consequences that are then rolled back. Iām think of Bing āSydneyā here as the canonical empirical case.[2] I just donāt believe weāre going to get no warning shots.
Similary, most negative projections of AI donāt take into account negative social reaction and systematic human response to these events. You either assume weāll get no warning shots and then get exterminated (āsharp left turnā) or think that humanity is doomed to not co-operate (āMolochā). I think that the evidence suggests instead that societies and governments will react against increasing AI capability if they view it negatively, rather than simply stand by and watch it happen, which is what many AIXR seem to assume or at least imply to me.
I think the AIXR community underestimated the importance of AI Governance and engaging with politics. Instead of politicians/āthe public ānot getting itā, they infact seem to have āgot itā. The strategy of only letting people vetted to be smart enough think about AIXR seems to have been flawed, the same kind of thinking that led to thinking a āpivotal actā strategy was viable.
In fact, Iāve been pleasantly surprised by how welcoming both politicians and the public have been to the Overton Window being opened. Turns out the median voter in a liberal democracy doesnāt like ālet large co-orporations create powerful models with little understanding of their consequences without significant oversightā. I think peopleās expectations of positive co-ordination on AI issues should have gone up this year, and conversely it should make your AIXR estimate go down unless you think relative alignment progress has declined even more.
While there have been an awful lot of terrible arguments against AIXR raised this year, some have been new and valuable to me. Some examples:
As titotal has written, I expect āearly AGIā will be a ābuggy messā. More specifically, I doubt it could one-shot solve ātake over the worldā unless you assume god-like capability by assumption, instead of āmore capable than humans in some important ways, but not invincibleā.
Progress via āstack moar layers lolā will plausibly slow down, or at least run into some severe issues.[3] The current hype-wave seems to be just following this as far as it will go with compute and data, rather than exploring alternative architectures that could be much more efficient.
Iām still unimpressed by how Frontier systems perform on tests such as The ARC Challenge which test hypothesis creativity and testing in a few-shot scenario (and on hidden data) without āground truthā, as opposed to training on trillions-upon-trillions of training examples with masks and true labels.
Related to the above, I view creativity and explanation as critical to science and the progress of science, so Iām not sure what āautomation of scienceā would actually look like. It makes me very sceptical of claims like Davidsonās with larger numbers like ā1000x capabilityā (edit, itās actually 1000x in a year! Is that a one-time increase or perpetual explosive growth? It just seems way too strong a claim).[4]
Progress in AI Safety has empirically been correlated, at least weakly, with increases in capability. I didnāt agree with everything in Noraās recent post, but I think sheās right about assessing a fully thoeretical approach to progress (such as MIRIās strategy, as far as I can tell).
In summary, it doesnāt seem like the field of AIXR comes out particularly strongly to me on the I-T-N Framework relative to other existential and catastrophic risks. It may be āmore importantā, but seems similarly intractable in the style that these global problems generally are, and definitely not as neglected any more (in terms of funding, where does it stand relative to nuclear or bio for example? would be interesting to know).
However, I still think x-risk and catastrophic risk from AI this century is unjustifiably high. I just donāt think itās plausible to have levels of pDoom ~>90% under current evidence unless you have private insight.
I think the main existential/ācatastrophic issues around AI in the foreseeable future revolve around political institutions and great power conflict, rather than humans being wiped out agentically (either deceptively by a malicious power-seeker or unintentionally by an idiot-savant)
Anyway, these thoughts arenāt fully worked out, Iām still exploring what I think on this issue, but just wanted to register where my current thoughts are at in case it helps others in the community.
Clarification: Iām not saying āSydneyā had catastrophic consequences, but that a future system could be released in a similar way due to internal corporate pressures, and that system could then act negatively in the real world in a way its developers did not expect.
Btw, xuanās twitter is one of the best I know of to get insights into the state of AI. Sometimes I agree, sometimes I donāt, but her takes are always legit
What does it even mean to say 1,000x capability, especially in terms of science? Is there a number here that Davidson is tracking? When would he view it as falsified?
Thanks :) I might do an actual post at the end of the year? In the meantime I just wanted to get my ideas out there as I find it incredibly difficult to actually finish any of the many Forum drafts I have š
Honestly I just wrote a list of potential x-risks to make a similar reference class. It wasnāt mean to be a specific claim, just examples for the quick take!
I guess climate change might be less of an existential risk in an of itself (per Halstead), but there might be interplays between them that increase their combined risk (I think Ord talks about this in the precipice). Iām also sympathetic to Luke Kempās view that we should really just care about overall x-risk, regardless of cause area, as extinction by any means would be as bad for humanities potential.[1]
I think itās plausible to consider x-risk from AI higher than Climate Change over the rest of this century, but my position at the moment is that this would be more like 5% v 1% or 1% v 0.01% than 90% v 0.001%, but as I said Iām not sure trying to put precise probability estimates is that useful.
Definitely accept the general point that itād be good to be more specific with this language in a front-page post though.
My point is that even though AI emits some amount of carbon gases, Iām struggling to find a scenario where itās a major issue for global warming as AI can help provide solutions here as well.
(Oh, my point wasnāt that climate change couldnāt be an x-risk, though it has been disputed, more that I donāt see the pathway for AI to exacerbate climate change).
It makes me very sceptical of claims like Davidsonās with larger numbers like ā1000x capabilityā (edit, itās actually 1000x in a year! Is that a one-time increase or perpetual explosive growth? It just seems way too strong a claim).
I was wondering why he said that, since Iāve read his report before and that didnāt come up at all. I suppose a few scattered recollections I have are
Tom would probably suggest you play around with the takeoffspeeds playground to gain a better intuition (I couldnāt find anything 1,000x-in-a-year-related at all though)
Capabilities takeoff speed ā impact takeoff speed (Tom: āoverall I expect impact takeoff speed to be slower than capabilities takeoff, with the important exception that AIās impact might mostly happen pretty suddenly after we have superhuman AIā)
I want to register that my perspective on medium-term[1] AI existential risk (shortened to AIXR from now on) has changed quite a lot this year. Currently, Iād describe it as moving from āDeep Uncertaintyā to ārisk is low in absolute terms, but high enough to be concerned aboutā. I guess atm Iād think that my estimates are moving closer toward the Superforecasters in the recent XPT report (though Iād still say Iām still Deeply Uncertain on this issue, to the extent that I donāt think the probability calculus is that meaningful to apply)
Some points around this change:
Iām not sure itās meaningful to cleanly distinguish AIXR from other anthropogenic x-risks, especially since negative consequences of AI may plausibly increase other x-risks (e.g. from Nuclear War, biosecurity, Climate Change etc.)
I think in practice, the most likely risks from AI would come from deployment of powerful systems that have catastrophic consequences that are then rolled back. Iām think of Bing āSydneyā here as the canonical empirical case.[2] I just donāt believe weāre going to get no warning shots.
Similary, most negative projections of AI donāt take into account negative social reaction and systematic human response to these events. You either assume weāll get no warning shots and then get exterminated (āsharp left turnā) or think that humanity is doomed to not co-operate (āMolochā). I think that the evidence suggests instead that societies and governments will react against increasing AI capability if they view it negatively, rather than simply stand by and watch it happen, which is what many AIXR seem to assume or at least imply to me.
I think the AIXR community underestimated the importance of AI Governance and engaging with politics. Instead of politicians/āthe public ānot getting itā, they infact seem to have āgot itā. The strategy of only letting people vetted to be smart enough think about AIXR seems to have been flawed, the same kind of thinking that led to thinking a āpivotal actā strategy was viable.
In fact, Iāve been pleasantly surprised by how welcoming both politicians and the public have been to the Overton Window being opened. Turns out the median voter in a liberal democracy doesnāt like ālet large co-orporations create powerful models with little understanding of their consequences without significant oversightā. I think peopleās expectations of positive co-ordination on AI issues should have gone up this year, and conversely it should make your AIXR estimate go down unless you think relative alignment progress has declined even more.
While there have been an awful lot of terrible arguments against AIXR raised this year, some have been new and valuable to me. Some examples:
As titotal has written, I expect āearly AGIā will be a ābuggy messā. More specifically, I doubt it could one-shot solve ātake over the worldā unless you assume god-like capability by assumption, instead of āmore capable than humans in some important ways, but not invincibleā.
Progress via āstack moar layers lolā will plausibly slow down, or at least run into some severe issues.[3] The current hype-wave seems to be just following this as far as it will go with compute and data, rather than exploring alternative architectures that could be much more efficient.
Iām still unimpressed by how Frontier systems perform on tests such as The ARC Challenge which test hypothesis creativity and testing in a few-shot scenario (and on hidden data) without āground truthā, as opposed to training on trillions-upon-trillions of training examples with masks and true labels.
Related to the above, I view creativity and explanation as critical to science and the progress of science, so Iām not sure what āautomation of scienceā would actually look like. It makes me very sceptical of claims like Davidsonās with larger numbers like ā1000x capabilityā (edit, itās actually 1000x in a year! Is that a one-time increase or perpetual explosive growth? It just seems way too strong a claim).[4]
Progress in AI Safety has empirically been correlated, at least weakly, with increases in capability. I didnāt agree with everything in Noraās recent post, but I think sheās right about assessing a fully thoeretical approach to progress (such as MIRIās strategy, as far as I can tell).
In summary, it doesnāt seem like the field of AIXR comes out particularly strongly to me on the I-T-N Framework relative to other existential and catastrophic risks. It may be āmore importantā, but seems similarly intractable in the style that these global problems generally are, and definitely not as neglected any more (in terms of funding, where does it stand relative to nuclear or bio for example? would be interesting to know).
However, I still think x-risk and catastrophic risk from AI this century is unjustifiably high. I just donāt think itās plausible to have levels of pDoom ~>90% under current evidence unless you have private insight.
I think the main existential/ācatastrophic issues around AI in the foreseeable future revolve around political institutions and great power conflict, rather than humans being wiped out agentically (either deceptively by a malicious power-seeker or unintentionally by an idiot-savant)
Anyway, these thoughts arenāt fully worked out, Iām still exploring what I think on this issue, but just wanted to register where my current thoughts are at in case it helps others in the community.
Say on a ~50 year time scale, or to the end of the century
Clarification: Iām not saying āSydneyā had catastrophic consequences, but that a future system could be released in a similar way due to internal corporate pressures, and that system could then act negatively in the real world in a way its developers did not expect.
Btw, xuanās twitter is one of the best I know of to get insights into the state of AI. Sometimes I agree, sometimes I donāt, but her takes are always legit
What does it even mean to say 1,000x capability, especially in terms of science? Is there a number here that Davidson is tracking? When would he view it as falsified?
This seems like a very sensible and down-to-earth analysis to me, and Iām a bit sad I canāt seem to bookmark it.
Thanks :) I might do an actual post at the end of the year? In the meantime I just wanted to get my ideas out there as I find it incredibly difficult to actually finish any of the many Forum drafts I have š
Do the post :)
I agree this feels plenty enough to be a post for me, but we all have different thresholds I guess!
āAI may plausibly increase other x-risks (e.g. from Nuclear War, biosecurity, Climate Change etc.)ā
Iām extremely surprised to see climate change listed here. Could you explain?
Honestly I just wrote a list of potential x-risks to make a similar reference class. It wasnāt mean to be a specific claim, just examples for the quick take!
I guess climate change might be less of an existential risk in an of itself (per Halstead), but there might be interplays between them that increase their combined risk (I think Ord talks about this in the precipice). Iām also sympathetic to Luke Kempās view that we should really just care about overall x-risk, regardless of cause area, as extinction by any means would be as bad for humanities potential.[1]
I think itās plausible to consider x-risk from AI higher than Climate Change over the rest of this century, but my position at the moment is that this would be more like 5% v 1% or 1% v 0.01% than 90% v 0.001%, but as I said Iām not sure trying to put precise probability estimates is that useful.
Definitely accept the general point that itād be good to be more specific with this language in a front-page post though.
Though not necessarily present, some extinctions may well be a lot worse than others there
My point is that even though AI emits some amount of carbon gases, Iām struggling to find a scenario where itās a major issue for global warming as AI can help provide solutions here as well.
(Oh, my point wasnāt that climate change couldnāt be an x-risk, though it has been disputed, more that I donāt see the pathway for AI to exacerbate climate change).
I would take the proposal to be AI->growth->climate change or other negative growth side effects
I was wondering why he said that, since Iāve read his report before and that didnāt come up at all. I suppose a few scattered recollections I have are
Tom would probably suggest you play around with the takeoffspeeds playground to gain a better intuition (I couldnāt find anything 1,000x-in-a-year-related at all though)
Capabilities takeoff speed ā impact takeoff speed (Tom: āoverall I expect impact takeoff speed to be slower than capabilities takeoff, with the important exception that AIās impact might mostly happen pretty suddenly after we have superhuman AIā)