Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
Citation needed on this point. I you’re underrepresenting the selection bias for a start—it’s extremely hard to know how many people have engaged with and rejected the doomer ideas since they have far less incentive to promote their views. And those who do often find sloppy argument and gross misuses of the data in some of the prominent doomer arguments. (I didn’t have to look too deeply to realise the orthogonality thesis was a substantial source of groupthink)
Even within AI safety workers, it’s far from clear to me that the relationship you assert exists. My impression of the AI safety space is that there are many orgs working on practical problems that they take very seriously without putting much credence in the human-extinction scenarios (FAR.AI, Epoch, UK AISI off the top of my head).
One guy also looked at the explicit views of AI experts and found if anything an anticorrelation between their academic success and their extinction-related concern. That was looking back over a few years and obviously a lot can change in that time, but the arguments for AI extinction had already been around for well over a decade at the time of that survey.
The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem.
This is true for forecasting in every domain. There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them. If this weren’t true, superforecasting wouldn’t be a field—we’d just go straight to the domain experts for our predictions.
Just want to quickly flag that you seem to have far more faith in superforecasters’ long-range predictions than do most people who have worked full-time in forecasting, such as myself.
@MichaelDickens’ ‘Is It So Much to Ask?’ is the best public writeup I’ve seen on this (specifically, on the problems with Metaculus’ and FRI XPT’s x-risk/extinction forecasts, which are cited in the main post above). I also very much agree with:
Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
Here are some reasons why we might expect longer-term predictions to be more difficult:
No fast feedback loops for long-term questions. You can’t get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this past-casting app, but they are imperfect.
It’s possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, “things will change more slowly than you think” is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that “people overestimate what they can do in a week, but underestimate what they can do in ten years”. This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can’t tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.
“Predict no change” in particular might do well, until it doesn’t. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.
In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.
Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute’s pick of forecasters for their existential risk survey was more bearish.
How much weight should we give to these aggregates?
My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. Cotra, Carlsmith
Samotsvety aggregates presented here
A superforecaster aggregate (I’m biased re: quality of Samotsvety vs. superforecasters, but I’m pretty confident based on personal experience)
Individual forecasts from AI domain experts who seem to generally have great judgment, but haven’t spent a ton of time thinking about AI x-risk forecasting (this is the one I’m most uncertain about, could see anywhere from 2-4)
Everything else I can think of I would give little weight to.[1][2]
Separately, I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen jobadverts from AISI that talk about loss of control risk (i.e., AI takeover), and I know people working at AISI who—last I spoke to them—put ≫10% on extinction.
Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
Why do I give little weight to AI expert surveys such as When Will AI Exceed Human Performance? Evidence from AI Experts? I think most AI experts have incoherent and poor views on this because they don’t think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many don’t have great judgment.
You might be right re forecasting (though someone willing in general to frequently bet on 2% scenarios manifesting should fairly quickly outperform someone who frequently bets against them—if their credences are actually more accurate).
I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen jobadverts from AISI talking about loss of control risk (i.e., AI takeover), and how ‘the risks from AI are not sci-fi, they are urgent.’ And I know people working at AISI who, last I spoke to them, put ≫10% on extinction.
The two jobs you mention only refer to ‘loss of control’ as a single concern among many - ‘risks with security implications, including the potential of AI to assist with the development of chemical and biological weapons, how it can be used to carry out cyber-attacks, enable crimes such as fraud, and the possibility of loss of control.’
I’m not claiming that these orgs don’t or shouldn’t take the lesser risks and extreme tail risks seriously (I think they should and do), but denying the claim that people who ‘think seriously’ about AI risks necessarily lean towards high extinction probabilities.
There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them.
I don’t have a nice clean citation. I don’t think one exists. I’ve looked at an awful lot of individual opinions and different surveys. I guess the biggest reason I’m convinced this correlation exists is that arguments for low p(doom) very rarely actually engage arguments for risk at their strong points (when they do the discussions are inconclusive in both directions—I’m not arguing that alignment is hard, but that it’s very much unknown how hard it is).
There appears to be a very high correlation between misunderstanding the state of play, and optimism. And because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely.
I very much wish it was otherwise; I am an optimist who has become steadily more pessimistic as I’ve made alignment my full-time focus—because the arguments against are subtle (and often poorly communicated) but strong.
They arguments for the difficulty of alignment are far too strong to be rationally dismissed down to the 1.4% or whatever it was that the superforecasters arrived at. They have very clearly missed some important points of argument.
The anticorrelation with academic success seems quite right and utterly irrelevant. As a career academic, I have been noticing for decades that academic success has some quite perverse incentives.
I agree that there are bad arguments for pessimism as well as optimism. The use of bad logic in some prominent arguments says nothing about the strength of other arguments. Arguments on both sides are far from conclusive. So you can hope arguments for the fundamental difficulty of aligning network-based AGI are wrong, but assigning a high probability they’re wrong without understanding them in detail and constructing valid counterarguments is tempting but not rational.
If there’s a counterargument you find convincing, please point me to it! Because while I’m arguing from the outside view, my real argument is that this is an issue that is unique in intellectual history, so it can really only be evaluated from the inside view. So that’s where most of my thoughts on the matter go.
All of which isn’t to say the doomers are right and we’re doomed if we don’t stop building network-based AGI. I’m saying we don’t know. I’m arguing that assigning a high probability right now based on limited knowledge to humanity accomplishing alignment is not rationally justified.
I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically. If that’s wrong I’d be shocked, because it looks very strong to me, and I do work hard to correct for my own biases. But it’s possible I’m wrong about this correlation. If so it will make my day and perhaps my month or year!
It is ultimately a question that needs to be resolved at the object level; we just need to take guesses about how to assign resources based on outside views.
because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely… They have very clearly missed some important points of argument.
This seems like an argument from your own authority. I’ve read a number of doomer arguments and personally found them unconvincing, but I’m not asking anyone to take my word for it. Of course you can always say ‘you’ve read the wrong arguments’, but in general, if your argument amounts to ‘you need to read this 10s-of-thousands-of-words-argument’ there’s no reason for an observer to believe that you understand it better than other intelligent individuals who’ve read it and reject it.
Therefore this:
If there’s a counterargument you find convincing, please point me to it!
… sounds like special pleading. You’re trying to simultaneously claim both that a) the arguments for doom are so so complicated that no-one’s anti-doom views have any weight unless they’ve absorbed a nebulous gestalt of pro-doomer literature and b) that the purported absence of a single gestalt-rebutting counterpoint justifies a doomer position.
And to be clear, I don’t think the epistemic burden should be equalised—I think it should be the other way around. Arguments for extinction by AI are necessarily built on a foundation of a priori and partially empirical premises, such that the dissolution of any collapses the whole argument. To give a few examples, such arguments require one to believe:
No causality of intelligence on goals, or too weak causality to outweigh other factors
AI to develop malevolent goals despite the process of developing it inherently involving incremental steps towards making it closer to doing what its developers want
AI to develop malevolent goals despite every developer working on it wanting it not to kill them
Instrumental convergence
Continued exponential progress
Without the (higher) exponential energy demands we’re currently seeing
An ability to adapt to rapidly changing circumstances that’s entirely absent from modern deep learning algorithms
That the ceiling of AI will be sufficiently higher than that of humans to manipulate us on an individual or societal level without anyone noticing and sounding the alarm (or so good at manipulating us that even with the alarm sounding we do its bidding)
That it gets to this level fast enough/stealthily enough that no-one shuts it down
And to specifically believe that AI extinction is the most important thing to work on requires further assumptions, like
nothing else (including AI) is likely to do serious civilisational damage before AI wipes us out
or that civilisational collapse is sufficiently easy to recover from that it’s basically irrelevant to long term expectation
it will be morally bad if AI replaces us
that the thesis of the OP is false, and that survival work is either the same as or higher EV than flourishing work
that there’s anything we can actually do to prevent our destruction, assuming all the above propositions are true
Personally I weakly believe most of these propositions, but even if I weakly believed all of them, that would still leave me with extremely low total concern for Yudkowskian scenarios.
Obviously there are weaker versions of the AI thesis like ‘AI could cause immense harm, perhaps by accident and perhaps by human intent, and so is an important problem to work on’ which it’s a lot more reasonable to believe.
But when you assert that ‘The more people think seriously about this question, the more pessimistic they are’, it sounds like you mean they become something like Yudkowskyesque doomers—and I think that’s basically false outside certain epistemic bubbles.
Inasmuch as it’s true that people who go into the field both tend to be the most pessimistic and don’t tend to exit the field in large numbers after becoming insufficiently pessimistic, I’ll bet you for any specific version of that claim you want to make, something extremely similar is true of biorisk, climate change, s-risks in general, and longterm animal welfare in particular. I’d bet at slightly longer odds the same is true of national defence, global health, pronatalism, antinatalism, nuclear disarmament, conservation, gerontology and many other high-stakes areas.
I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically
I think most people who’ve put much thought into it agree that the highest probability of human extinction by the end of the century comes from misaligned AI. But that’s not sufficient to justify a strong p(doom) position, let alone a ‘most important cause’ position. I also think it comes from a largely unargued-for (and IMO clearly false) assumption that we’d lose virtually no longterm expected value from civilisational collapse.
Citation needed on this point. I you’re underrepresenting the selection bias for a start—it’s extremely hard to know how many people have engaged with and rejected the doomer ideas since they have far less incentive to promote their views. And those who do often find sloppy argument and gross misuses of the data in some of the prominent doomer arguments. (I didn’t have to look too deeply to realise the orthogonality thesis was a substantial source of groupthink)
Even within AI safety workers, it’s far from clear to me that the relationship you assert exists. My impression of the AI safety space is that there are many orgs working on practical problems that they take very seriously without putting much credence in the human-extinction scenarios (FAR.AI, Epoch, UK AISI off the top of my head).
One guy also looked at the explicit views of AI experts and found if anything an anticorrelation between their academic success and their extinction-related concern. That was looking back over a few years and obviously a lot can change in that time, but the arguments for AI extinction had already been around for well over a decade at the time of that survey.
This is true for forecasting in every domain. There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them. If this weren’t true, superforecasting wouldn’t be a field—we’d just go straight to the domain experts for our predictions.
Just want to quickly flag that you seem to have far more faith in superforecasters’ long-range predictions than do most people who have worked full-time in forecasting, such as myself.
@MichaelDickens’ ‘Is It So Much to Ask?’ is the best public writeup I’ve seen on this (specifically, on the problems with Metaculus’ and FRI XPT’s x-risk/extinction forecasts, which are cited in the main post above). I also very much agree with:
Separately, I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen job adverts from AISI that talk about loss of control risk (i.e., AI takeover), and I know people working at AISI who—last I spoke to them—put ≫10% on extinction.
You might be right re forecasting (though someone willing in general to frequently bet on 2% scenarios manifesting should fairly quickly outperform someone who frequently bets against them—if their credences are actually more accurate).
The two jobs you mention only refer to ‘loss of control’ as a single concern among many - ‘risks with security implications, including the potential of AI to assist with the development of chemical and biological weapons, how it can be used to carry out cyber-attacks, enable crimes such as fraud, and the possibility of loss of control.’
I’m not claiming that these orgs don’t or shouldn’t take the lesser risks and extreme tail risks seriously (I think they should and do), but denying the claim that people who ‘think seriously’ about AI risks necessarily lean towards high extinction probabilities.
I don’t think this has been established. See here
I don’t have a nice clean citation. I don’t think one exists. I’ve looked at an awful lot of individual opinions and different surveys. I guess the biggest reason I’m convinced this correlation exists is that arguments for low p(doom) very rarely actually engage arguments for risk at their strong points (when they do the discussions are inconclusive in both directions—I’m not arguing that alignment is hard, but that it’s very much unknown how hard it is).
There appears to be a very high correlation between misunderstanding the state of play, and optimism. And because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely.
I very much wish it was otherwise; I am an optimist who has become steadily more pessimistic as I’ve made alignment my full-time focus—because the arguments against are subtle (and often poorly communicated) but strong.
They arguments for the difficulty of alignment are far too strong to be rationally dismissed down to the 1.4% or whatever it was that the superforecasters arrived at. They have very clearly missed some important points of argument.
The anticorrelation with academic success seems quite right and utterly irrelevant. As a career academic, I have been noticing for decades that academic success has some quite perverse incentives.
I agree that there are bad arguments for pessimism as well as optimism. The use of bad logic in some prominent arguments says nothing about the strength of other arguments. Arguments on both sides are far from conclusive. So you can hope arguments for the fundamental difficulty of aligning network-based AGI are wrong, but assigning a high probability they’re wrong without understanding them in detail and constructing valid counterarguments is tempting but not rational.
If there’s a counterargument you find convincing, please point me to it! Because while I’m arguing from the outside view, my real argument is that this is an issue that is unique in intellectual history, so it can really only be evaluated from the inside view. So that’s where most of my thoughts on the matter go.
All of which isn’t to say the doomers are right and we’re doomed if we don’t stop building network-based AGI. I’m saying we don’t know. I’m arguing that assigning a high probability right now based on limited knowledge to humanity accomplishing alignment is not rationally justified.
I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically. If that’s wrong I’d be shocked, because it looks very strong to me, and I do work hard to correct for my own biases. But it’s possible I’m wrong about this correlation. If so it will make my day and perhaps my month or year!
It is ultimately a question that needs to be resolved at the object level; we just need to take guesses about how to assign resources based on outside views.
This seems like an argument from your own authority. I’ve read a number of doomer arguments and personally found them unconvincing, but I’m not asking anyone to take my word for it. Of course you can always say ‘you’ve read the wrong arguments’, but in general, if your argument amounts to ‘you need to read this 10s-of-thousands-of-words-argument’ there’s no reason for an observer to believe that you understand it better than other intelligent individuals who’ve read it and reject it.
Therefore this:
… sounds like special pleading. You’re trying to simultaneously claim both that a) the arguments for doom are so so complicated that no-one’s anti-doom views have any weight unless they’ve absorbed a nebulous gestalt of pro-doomer literature and b) that the purported absence of a single gestalt-rebutting counterpoint justifies a doomer position.
And to be clear, I don’t think the epistemic burden should be equalised—I think it should be the other way around. Arguments for extinction by AI are necessarily built on a foundation of a priori and partially empirical premises, such that the dissolution of any collapses the whole argument. To give a few examples, such arguments require one to believe:
No causality of intelligence on goals, or too weak causality to outweigh other factors
AI to develop malevolent goals despite the process of developing it inherently involving incremental steps towards making it closer to doing what its developers want
AI to develop malevolent goals despite every developer working on it wanting it not to kill them
Instrumental convergence
Continued exponential progress
Without the (higher) exponential energy demands we’re currently seeing
An ability to adapt to rapidly changing circumstances that’s entirely absent from modern deep learning algorithms
That the ceiling of AI will be sufficiently higher than that of humans to manipulate us on an individual or societal level without anyone noticing and sounding the alarm (or so good at manipulating us that even with the alarm sounding we do its bidding)
That it gets to this level fast enough/stealthily enough that no-one shuts it down
And to specifically believe that AI extinction is the most important thing to work on requires further assumptions, like
nothing else (including AI) is likely to do serious civilisational damage before AI wipes us out
or that civilisational collapse is sufficiently easy to recover from that it’s basically irrelevant to long term expectation
it will be morally bad if AI replaces us
that the thesis of the OP is false, and that survival work is either the same as or higher EV than flourishing work
that there’s anything we can actually do to prevent our destruction, assuming all the above propositions are true
Personally I weakly believe most of these propositions, but even if I weakly believed all of them, that would still leave me with extremely low total concern for Yudkowskian scenarios.
Obviously there are weaker versions of the AI thesis like ‘AI could cause immense harm, perhaps by accident and perhaps by human intent, and so is an important problem to work on’ which it’s a lot more reasonable to believe.
But when you assert that ‘The more people think seriously about this question, the more pessimistic they are’, it sounds like you mean they become something like Yudkowskyesque doomers—and I think that’s basically false outside certain epistemic bubbles.
Inasmuch as it’s true that people who go into the field both tend to be the most pessimistic and don’t tend to exit the field in large numbers after becoming insufficiently pessimistic, I’ll bet you for any specific version of that claim you want to make, something extremely similar is true of biorisk, climate change, s-risks in general, and longterm animal welfare in particular. I’d bet at slightly longer odds the same is true of national defence, global health, pronatalism, antinatalism, nuclear disarmament, conservation, gerontology and many other high-stakes areas.
I think most people who’ve put much thought into it agree that the highest probability of human extinction by the end of the century comes from misaligned AI. But that’s not sufficient to justify a strong p(doom) position, let alone a ‘most important cause’ position. I also think it comes from a largely unargued-for (and IMO clearly false) assumption that we’d lose virtually no longterm expected value from civilisational collapse.