I think the critical crux here is the assumption about human competence, individually and working in groups. And I’m afraid I agree; humans have an optimism bias by many measures. Our track record on doing even easy projects right on the first try (or even the first few tries) is not good.
I also think optimists are often asking the question could we solve alignment, while pessimists are asking will we solve alignment, which includes a lot more practical difficulties so more opportunities for failure.
Of course there are many other relevant cruxes, but I think those two are pretty common and the first is the biggest contribution of this particular contribution.
Seth Herd
I don’t have a nice clean citation. I don’t think one exists. I’ve looked at an awful lot of individual opinions and different surveys. I guess the biggest reason I’m convinced this correlation exists is that arguments for low p(doom) very rarely actually engage arguments for risk at their strong points (when they do the discussions are inconclusive in both directions—I’m not arguing that alignment is hard, but that it’s very much unknown how hard it is).
There appears to be a very high correlation between misunderstanding the state of play, and optimism. And because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely.
I very much wish it was otherwise; I am an optimist who has become steadily more pessimistic as I’ve made alignment my full-time focus—because the arguments against are subtle (and often poorly communicated) but strong.
They arguments for the difficulty of alignment are far too strong to be rationally dismissed down to the 1.4% or whatever it was that the superforecasters arrived at. They have very clearly missed some important points of argument.
The anticorrelation with academic success seems quite right and utterly irrelevant. As a career academic, I have been noticing for decades that academic success has some quite perverse incentives.
I agree that there are bad arguments for pessimism as well as optimism. The use of bad logic in some prominent arguments says nothing about the strength of other arguments. Arguments on both sides are far from conclusive. So you can hope arguments for the fundamental difficulty of aligning network-based AGI are wrong, but assigning a high probability they’re wrong without understanding them in detail and constructing valid counterarguments is tempting but not rational.If there’s a counterargument you find convincing, please point me to it! Because while I’m arguing from the outside view, my real argument is that this is an issue that is unique in intellectual history, so it can really only be evaluated from the inside view. So that’s where most of my thoughts on the matter go.
All of which isn’t to say the doomers are right and we’re doomed if we don’t stop building network-based AGI. I’m saying we don’t know. I’m arguing that assigning a high probability right now based on limited knowledge to humanity accomplishing alignment is not rationally justified.
I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically. If that’s wrong I’d be shocked, because it looks very strong to me, and I do work hard to correct for my own biases. But it’s possible I’m wrong about this correlation. If so it will make my day and perhaps my month or year!
It is ultimately a question that needs to be resolved at the object level; we just need to take guesses about how to assign resources based on outside views.
I see! Thanks for the clarification. It’s a fascinating argument if I’m understanding it correctly now: it could be worth substantially increasing our risk of extinction if we more substantially increased our odds of capturing more of the potential value in our light cone.
I’m not a dedicated utilitarian, so I typically tend to value futures with some human flourishing and little suffering vastly higher than futures with no sentient beings. But I am actually convinced that we should tilt a little toward futures with more flourishing.
Aligning AGI seems like the crux for both survival and flourishing (and aligning society, in the likely case that “aligned” AGI is intent-aligned to take orders from individuals). But there will be small changes in strategy that emphasize flourishing vs mere survival futures, and I’ll lean toward those based on this discussion, because outside of myself and my loved ones, my preferences become largely utilitarian.
It should also be born in mind that creating misaligned AGI runs a pretty big risk of wiping out not just us but any other sentient species in the lightcone.
Agreed on all counts, except that a strong value on rationality seems very likely to be an advantage in on-average reaching more-correct beliefs. Feeling good about changing one’s mind instead of bad is going to lead to more belief changes, and those tend to lead toward truth.
Good points on the rationalist community being a bit insular. I don’t think about that much myself because I’ve never been involved with the bay area rationalist community, just LessWrong.
Copied from my comment on LW, because it may actually be more relevant over here where not everyone is convinced about alignment being hard. It’s a really sketchy presentation of what I think are strong arguments for why the consensus on this is wrong on this.
I really wish I could agree. I think we should definitely think about flourishing when it’s a win/win with survival efforts. But saying we’re near the ceiling on survival looks wildly too optimistic to me. This is after very deeply considering our position and the best estimate of our odds, primarily surrounding the challenge of aligning superhuman AGI (including surrounding societal complications).There are very reasonable arguments to be made about the best estimate of alignment/AGI risk. But disaster likelihoods below 10% really just aren’t viable when you look in detail. And it seems like that’s what you need to argue that we’re near ceiling on survival.
The core claim here is “we’re going to make a new species which is far smarter than we are, and that will definitely be fine because we’ll be really careful how we make it” in some combination with “oh we’re definitely not making a new species any time soon, just more helpful tools”.
When examined in detail, assigning a high confidence to those statements is just as silly as it looks at a glance. That is obviously a very dangerous thing and one we’ll do pretty much as soon as we’re able.
90% plus on survival looks like a rational view from a distance, but there are very strong arguments that it’s not. This won’t be a full presentation of those arguments; I haven’t written it up satisfactorily yet, so here’s the barest sketch.
Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
(edit—we asymptote at different points but almost universally far above 10% p(doom))
And those who’ve spent more time on this particular question should be weighted far higher. Time-on-task is the single most important factor for success in every endeavor. It’s not a guarantee but it’s by far the most important factor. It dwarfs raw intelligence as a predictor of success in every domain (although the two are multiplicative).
The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem. Those who actually work in that area are very systematically more pessimistic the longer and more deeply we’ve thought about it. There’s not a perfect correlation, but it’s quite large.This should be very concerning from an outside view.
This effect clearly goes both ways, but that only starts to explain the effect. Those who intuitively find AGI very dangerous are prone to go into the field. And they’ll be subject to confirmation bias. But if they were wrong, a substantial subset should be shifting away from that view after they’re exposed to every argument for optimism. This effect would be exaggerated by the correlation between rationalist culture and alignment thinking; valuing rationality provides resistance (but certainly not immunity!) to motivated reasoning/confirmation bias by aligning ones’ motivations with updating based on arguments and evidence.
I am an optimistic person, and I deeply want AGI to be safe. I would be overjoyed for a year if I somehow updated to only 10% chance of AGI disaster. It is only my correcting for my biases that keeps me looking hard enough at pessimistic arguments to believe them based on their compelling logic.
And everyone is affected by motivated reasoning, particularly the optimists. This is complex, but after doing my level best to correct for motivations, it looks to me like the bias effects have far more leeway to work when there’s less to push against. The more evidence and arguments are considered, the less bias takes hold. This is from the literature on motivated reasoning and confirmation bias, which was my primary research focus for a few years and a primary consideration for the last ten.
That would’ve been better as a post or a short form, and more polished. But there it is FWIW, a dashed-off version of an argument I’ve been mulling over for the past couple of years.
I’ll still help you aim for flourishing, since having an optimistic target is a good way to motivate people to think about the future.
Edit: I realize this isn’t an airtight argument and apologize for the tone of confidence in the absence of presenting the whole thing carefully and with proper references.
It seems like having genuinely safety-minded people within orgs is invaluable. Do you think that having them refuse to join is going to meaningfully slow things down?
It just takes one brave or terrified person in the know to say “these guys are internally deploying WHAT? I’ve got to stop this!”
I worry very much that we won’t have one such person in the know in OpenAI. I’m very glad we have them in Anthropic.
Having said that, I agree that Anthropic should not be shielded from criticism.
Your assumption that influence flows one way in organizations seems based on fear not psychology. If someone believes AGI is a real risk, they should be motivated enough to resist some pressure from superiors who merely argue that they’re doing good stuff.If you won’t actively resist changing your beliefs once you join a culture with importantly different beliefs, then don’t join an org.
While Anthropic’s plan is a terrible one, so is PauseAI’s. We have no good plans. And we must’nt fight amongst ourselves.
This seems almost exactly like the repugnant conclusion. Taken to extremes, intuition disagrees with logic. When that happens, it’s usually the worse for intuition.
I’m not a utilitarian, but I find the repugnant conclusion impossible to reject if you are.
If you want chose what is good for everyone, there’s little argument what that is in those cases.
And if we’re talking about what’s good for everyone, that’s got to be a linear sum of what’s good for each someone. If the sum is nonlinear, who exactly is worth less than the others? This leads to the repugnant conclusion and your conclusion here.
Other definitions of “good for everyone” seem to always mean “what I idiosyncratically prefer for everyone else but me”.
We do not have adequate help with AGI x-risk, and the societal issues demand many skillsets that alignment workers typically lack. Surviving AGI and avoiding s-risk far outweigh all other concerns by any reasonable utilitarian logic.
You were getting disagree votes because it sounded like you were claiming certainty. I realize that you weren’t trying to do that, but that’s how people were taking it, and I find that quite understandable. Chicken as an analogy has certain death if neither player swerves, in the standard formulation. Qualifying your statement even a little would’ve gotten your point across better.
FWIW I agree with your statement as I interpret it. I do tend to think that an objective measure of misalignment risk (I place it around 50% largely based on model uncertainty on all sides) makes the question of which side is safer basically irrelevant.
Which highlights the problem with this type of miscomunnication. You were making probably by far the most important point here. It didn’t play a prominent role because it wasn’t communicated in a way the audience would understand.
That’s very helpful! They don’t try to include the cost of fear; the old story I’d heard about cage-free environments was that there’s more fighting and cannibalism. But given the other large benefits, I’m convinced that cage-free is better.
Wait now—I thought cage-free chickens suffered as much or more than caged? I heard the claim a long time ago but never looked in to it closely.
Copied from my LW comment, since this is probably more of an EAF discussion:
This is really important pushback. This is the discussion we need to be having.Most people who are trying to track this believe China has not been racing toward AGI up to this point. Whether they embark on that race is probably being determined now—and based in no small part on the US’s perceived attitude and intentions.
Any calls for racing toward AGI should be closely accompanied with “and of course we’d use it to benefit the entire world, sharing the rapidly growing pie”. If our intentions are hostile, foreign powers have little choice but to race us.
And we should not be so confident we will remain ahead if we do race. There are many routes to progress other than sheer scale of pretraining. The release of DeepSeek r1 today indicates that China is not so far behind. Let’s remember that while the US “won” the race for nukes, our primary rival had nukes very soon after—by stealing our advancements. A standoff between AGI-armed US and China could be disastrous—or navigated successfully if we take the right tone and prevent further proliferation (I shudder to think of Putin controlling an AGI, or many potentially unstable actors).
This discussion is important, so it needs to be better. This pushback is itself badly flawed. In calling out the report’s lack of references, it provides almost none itself. Citing a 2017 official statement from China seems utterly irrelevant to guessing their current, privately held position. Almost everyone has updated massively since 2017. (edit: It’s good that this piece does note that public statements are basically meaningless in such matters.) If China is “racing toward AGI” as an internal policy, they probably would’ve adopted that recently. (I doubt that they are racing yet, but it seems entirely possible they’ll start now in response to the US push to do so—and the their perspective on the US as a dangerous aggressor on the world stage. But what do I know—we need real experts on China and international relations.)
Pointing out the technical errors in the report seems irrelevant to harmful. You can understand very little of the details and still understand that AGI would be a big, big deal if true — and the many experts predicting short timelines could be right. Nitpicking the technical expertise of people who are essentially probably correct in their assessment just sets a bad tone of fighting/arguing instead of having a sensible discussion.
And we desperately need a sensible discussion on this topic.
I completely agree.
But others may not, because most humans aren’t longtermists nor utilitarians. So I’m afraid arguments like this won’t sway the public opinion much at all. People like progress because it will get them and their loved ones (children and grandchildren, whose future they can imagine) better lives. They just barely care at all whether humanity ends after their grandchildren’s lives (to the extent they can even think about it).
This is why I believe that most arguents against AGI x-risk are really based on differing timelines. People like to think that humans are so special we won’t surpass them for a long time. And they mostly care about the future for their loved ones.
I think the point is making this explicit and having a solid exposition to point to when saying “progress is no good if we all die sooner!”
I don’t think it’s worth the effort; I’d personally be just as pleased with one snapshot of the participants in conversation as I would be with a whole video. The point of podcasts for me is that I can do something else while still taking in something useful for my alignment work. But I am definitely a tone-of-voice attender over a facial-expression attender, so others will doubtless get more value out of it.
Ooops, I meant to say I wrote a post on one aspect of this interview on LW: Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours. It did produce some interesting discussion.
Yes, but pursuing excellence also costs time that could be spent elsewhere, and time/results tradeoffs are often highly nonlinear.
The perfect is the enemy of the good. It seems to me that the most common LW/EA personality already pursues excellence more than is optimal.
For more, see my LW comment:
Excellent work.
To summarize one central argument in briefest form:
Aschenbrenner’s conclusion in Situational Awareness is wrong in overstating the claim.
He claims that treating AGI as a national security issue is the obvious and inevitable conclusion for those that understand the enormous potential of AGI development in the next few years. But Aschenbrenner doesn’t adequately consider the possibility of treating AGI primarily as a threat to humanity instead of a threat the nation or to a political ideal (the free world). If we considered it primarily a threat to humanity, we might be able to cooperate with China and other actors to safeguard humanity.
I think this argument is straightforwardly true. Aschenbrenner does not adequately consider alternative strategies, and thus his claim of the conclusion being the inevitable consensus is false.
But the opposite isn’t an inevitable conclusion, either.
I currently think Aschenbrenner is more likely correct about the best course of action. But I am highly uncertain. I have thought hard about this issue for many hours both before and after Aschenbrenner’s piece sparked some public discussion. But my analysis, and the public debate thus far, are very far from conclusive on this complex issue.
This question deserves much more thought. It has a strong claim to being the second most pressing issue in the world at this moment, just behind technical AGI alignment.
This post can be summarized as “Aschenbrenner’s narrative is highly questionable”. Of course it is. From my perspective, having thought deeply about each of the issues he’s addressing, his claims are also highly plausible. To “just discard” this argument because it’s “questionable” would be very foolish. It would be like driving with your eyes closed once the traffic gets confusing.
This is the harshest response I’ve ever written. To the author, I apologize. To the EA community: we will not help the world if we fall back on vibes-based thinking and calling things we don’t like “questionable” to dismiss them. We must engage at the object level. While the future is hard to predict, it is quite possible that it will be very unlike the past, but in understandable ways. We will have plenty of problems with the rest of the world doing its standard vibes-based thinking and policy-making. The EA community needs to do better.
There is much to question and debate in Aschenbrenner’s post, but it must be engaged with at the object level. I will do that, elsewhere.
On the vibes/ad-hominem level, note that Aschenbrenner also recently wrote that Nobody’s on the ball on AGI alignment. He appears to believe (there and elsewhere) that AGI is a deadly risk, and we might very well all die from it. He might be out to make a quick billion, but he’s also serious about the risks involved.
The author’s object-level claim is that they don’t think AGI is immanent. Why? How sure are you? How about we take some action or at least think about the possibility, just in case you might be wrong and the many people close to its development might be right?
Fantastic.