Yudkowsky’s message is “If anyone builds superintelligence, everyone dies.” Zvi’s version is “If anyone builds superintelligence under anything like current conditions, everyone probably dies.”
Yudkowsky contrasts those framings with common “EA framings” like “It seems hard to predict whether superintelligence will kill everyone or not, but there’s a worryingly high chance it will, and Earth isn’t prepared,” and seems to think the latter framing is substantially driven by concerns about what can be said “in polite company.”
Obviously I can’t speak for all of EA, or all of Open Phil, and this post is my personal view rather than an institutional one since no single institutional view exists, but for the record, my inside view since 2010 has been “If anyone builds superintelligence under anything close to current conditions, probably everyone dies (or is severely disempowered),” and I think the difference between me and Yudkowsky has less to do with social effects on our speech and more to do with differing epistemic practices, i.e. about how confident one can reasonably be about the effects of poorly understood future technologies emerging in future, poorly understood circumstances. (My all-things-considered view, which includes various reference classes and partial deference to many others who think about the topic, is more agnostic and hasn’t consistently been above the “probably” line.)
Moreover, I think those who believe some version of “If anyone builds superintelligence, everyone dies” should be encouraged to make their arguments loudly and repeatedly; the greatest barrier to actually-risk-mitigating action right now is the lack of political will.
That said, I think people should keep in mind that:
Public argumentation can only get us so far when the evidence for the risks and their mitigations is this unclear, when AI has automated so little of the economy, when AI failures have led to so few deaths, etc.
Most concrete progress on worst-case AI risks — e.g. arguably the AISIs network, the draft GPAI code of practice for the EU AI Act, company RSPs, the chip and SME export controls, or some lines of technical safety work — comes from dozens of people toiling away mostly behind-the-scenes for years, not from splashy public communications (though many of the people involved were influenced by AI risk writings years before). Public argumentation is a small portion of the needed work to make concrete progress. It may be necessary, but it’s far from sufficient.
My best guess (though very much not a confident guess) is the aggregate of these efforts are net-negative, and I think that is correlated with that work having happened in backrooms, often in context where people were unable to talk about their honest motivations. It sure is really hard to tell, but I really want people to consider the hypothesis that a bunch of these behind-the-scenes policy efforts have been backfiring, especially ex-post with a more republican administration.
The chip and SME export controls seem to currently be one of the drivers of the escalating U.S. and China arms race, the RSPs are I think largely ineffectual and have delayed the speed at which we could get regulation that is not reliant on lab supervision, and the overall EU AI act seems very bad, though I think the effect of the marginal help with drafting is of course much harder to estimate.
Missing from this list: The executive order, which I think has retrospectively revealed itself as being a major driver of polarization of AI-risk concerns, by strongly conflating near-term risk with extinction risks. It did also do a lot of great stuff, though my best guess is we’ll overall regret it (but on this I feel the least confident).
I agree that a ton of concrete political implementation work needs to be done, but I think the people working in the space who have chosen to do that work in a way that doesn’t actually engage in public discourse have made mistakes, and this has had large negative externalities.
Again, really not confident here, and I agree that there is a lot of implementation work to be done that is not glorious and flashy, but I think a bunch of the ways it’s been done in a kind of conspiratorial and secretive fashion has been counterproductive[1].
Ultimately as you say the bottleneck for things happening is political will and buy-in that AI systems pose a serious existential risk, and I think that means a lot of implementation and backroom work is blocked and bottlenecked on that public argumentation happening. And when people try to push forward anyways, they often end up forced to conflate existential risk with highly politicized short-term issues that aren’t very correlated with the actual risks, and backfire when the political winds change and people update.
“It seems hard to predict whether superintelligence will kill everyone or not, but there’s a worryingly high chance it will, and Earth isn’t prepared,” and seems to think the latter framing is substantially driven by concerns about what can be said “in polite company.”
Funnily enough, I think this is true in the opposite direction. There is massive social pressure in EA spaces to take AI x-risk and the doomer arguments seriously. I don’t think it’s uncommon for someone who secretly suspects it’s all a load of nonsense to diplomatically say a statement like the above, in “polite EA company”.
Like you: I urge people who think AI x-risk is overblown to make their arguments loudly and repeatedly.
It’s easy for both to be true at the same time right? That is skeptics tone it down within EA, and believers tone it down when dealing with people *outside* EA.
Also, at the risk of saying the obvious, people occupying the ends of a position (within a specific context) will frequently feel that their perspectives are unfairly maligned or censored.
If the consensus position is that minimum wage should be $15/hour, both people who believe that it should be $0 and people who believe it should be $40/hour may feel social pressure to moderate their views; it takes active effort to reduce pressures in that direction.
As someone who leans on the x-risk-skeptical side, especially regarding AI, I’ll offer my anecdote that I don’t think my views have been unfairly maligned or censored much.
I do think my arguments have largely been ignored, which is unfortunate. But I don’t personally feel the “massive social pressure” that titotal alluded to above, at least in a strong sense.
I think your “vibe” is skeptical and most of your writings are ones expressing skepticism but I think your object-level x-risk probabilities are fairly close to the median?, people like titotal and @Vasco Grilo🔸 have their probabilities closer to lifelong risk of death from a lightning strike than from heart disease.
Good point, but I still think that many of my beliefs and values differ pretty dramatically from the dominant perspectives often found in EA AI x-risk circles. I think these differences in my underlying worldview should carry just as much weight—if not more—than whether my bottom-line estimates of x-risk align with the median estimates in the community. To elaborate:
On the values side:
Willingness to accept certain tradeoffs that are ~taboo in EA: I am comfortable with many scenarios where AI risk increases by a non-negligible amount if this accelerates AI progress. In other words, I think the potential benefits of faster progress in AI development can often outweigh the risks posed by an increase in existential risk.
Relative indifference to human disempowerment: With some caveats, I am largely comfortable with human disempowerment, and I don’t think the goal of AI governance should be to keep humans in control. To me, the preference for prioritizing human empowerment over other outcomes feels like an arbitrary form of speciesism—favoring humans simply because we are human, rather than due to any solid moral reasoning.
On the epistemic side:
Skepticism of AI alignment’s central importance to AI x-risk: I am skeptical that AI alignment is very important for reducing x-risk from AI. My primary threat model for AI risk doesn’t center on the idea that an AI with a misaligned utility function would necessarily pose a danger. Instead, I think the key issue lies in whether agents with differing values—be they human or artificial—will have incentives to cooperate and compromise peacefully or whether their environment will push them toward conflict and violence.
Doubts about the treacherous turn threat model: I believe the “treacherous turn” threat model is significantly overrated. (For context, this model posits that an AI system could pretend to be aligned with human values until it becomes sufficiently capable to act against us without risk.) I’ll note that both Paul Christiano and Eliezer Yudkowsky have identified this as their main threat model, but it is not my primary threat model.
people like titotal and @Vasco Grilo🔸 have their probabilities closer to lifelong risk of death from a lightning strike than from heart disease.
Right. Thanks for clarifying, Linch. I guess the probability of human extinction over the next 10 years is 10^-6, which is roughly my probability of death from a lighting strike during the same period. “the odds of being struck by lightning in a given year are less than one in a million [I guess the odds are not much lower than this], and almost 90% of all lightning strike victims survive” (10^-6 = 10^-6*10*(1 − 0.9)).
Like you: I urge people who think AI x-risk is overblown to make their arguments loudly and repeatedly.
Agreed, titotal. In addition, I encourage people to propose public bets to whoever has extreme views (if you are confident they will pay you back), and ask them if they are trying to get loans in order to increase their donations to projects decreasing AI risk, which makes sense if they do not expect to pay the interest in full due to high risk of extinction.
and ask them if they are trying to get loans in order to increase their donations to projects decreasing AI risk, which makes sense if they do not expect to pay the interest in full due to high risk of extinction.
Fraud is bad.
In any case, people already don’t have enough worthwhile targets for donating money to, even under short timelines, so it’s not clear what good taking out loans would do. If it’s a question of putting one’s money where one’s mouth is, I personally took a 6-figure paycut in 2022 to work on reducing AI x-risk, and also increased my consumption/spending.
That’s not fraud, without more—Vasco didn’t suggest that anyone obtain loans that they did not intend to repay, or could not repay, in a no-doom world.
Every contract has an implied term that future obligations are void in the event of human extinction. There’s no shame in not paying one’s debts because extinction happened.
You cannot spend the money you obtain from a loan without losing the means to pay it back. You can do a tiny bit to borrow against your future labor income, but the normal thing to do is to declare personal bankruptcy, and so there is little assurance for that.
(This has been discussed many dozens of times on both the EA Forum and LessWrong. There exist no loan structures as far as I know that allow you to substantially benefit from predicting doom.)
Hello Habryka. Could you link to a good overview of why taking loans does not make sense even if one thinks there is a high risk of human extinction soon? Daniel Kokotajlo said:
Idk about others. I haven’t investigated serious ways to do this [taking loans],* but I’ve taken the low-hanging fruit—it’s why my family hasn’t paid off our student loan debt for example, and it’s why I went for financing on my car (with as long a payoff time as possible) instead of just buying it with cash.
*Basically I’d need to push through my ugh field and go do research on how to make this happen. If someone offered me a $10k low-interest loan on a silver platter I’d take it.
I should also clarify that I am open to bets about less extreme events. For example, global unemployment rate doubling or population dropping below 7 billion in the next few years.
I think people like me proposing public bets to whoever has extreme views or asking them whether they have considered loans should be transparent about their views. In contrast, fraude is “the crime of obtaining money or property by deceiving people”.
I did not have anything in particular in mind about what the people asking for loans would do without human extinction soon. In general, I think it makes sense for people to pay their loans. However, since I strongly endorse expected total hedonistic utilitarism, I do not put an astronomical weight on respecting contracts. So I believe not paying a loan is fine if the benefits are sufficiently large.
I think the difference between me and Yudkowsky has less to do with social effects on our speech and more to do with differing epistemic practices, i.e. about how confident one can reasonably be about the effects of poorly understood future technologies emerging in future, poorly understood circumstances.
This isn’t expressing disagreement, but I think it’s also important to consider the social effects of our speaking in line with different epistemic practices, i.e.,
When someone says “AI will kill us all” do people understand us as expressing 100% confidence in extinction, or do they interpret it as mere hyperbole and rhetoric, and infer that what we actually mean is that AI will potentially kill us all or have other drastic effects
When someone says “There’s a high risk AI kills us all or disempowers us” do people understand this as us expressing very high confidence that it kills us all or as saying it almost certainly won’t kill us all.
Yudkowsky’s message is “If anyone builds superintelligence, everyone dies.” Zvi’s version is “If anyone builds superintelligence under anything like current conditions, everyone probably dies.”
Yudkowsky contrasts those framings with common “EA framings” like “It seems hard to predict whether superintelligence will kill everyone or not, but there’s a worryingly high chance it will, and Earth isn’t prepared,” and seems to think the latter framing is substantially driven by concerns about what can be said “in polite company.”
Obviously I can’t speak for all of EA, or all of Open Phil, and this post is my personal view rather than an institutional one since no single institutional view exists, but for the record, my inside view since 2010 has been “If anyone builds superintelligence under anything close to current conditions, probably everyone dies (or is severely disempowered),” and I think the difference between me and Yudkowsky has less to do with social effects on our speech and more to do with differing epistemic practices, i.e. about how confident one can reasonably be about the effects of poorly understood future technologies emerging in future, poorly understood circumstances. (My all-things-considered view, which includes various reference classes and partial deference to many others who think about the topic, is more agnostic and hasn’t consistently been above the “probably” line.)
Moreover, I think those who believe some version of “If anyone builds superintelligence, everyone dies” should be encouraged to make their arguments loudly and repeatedly; the greatest barrier to actually-risk-mitigating action right now is the lack of political will.
That said, I think people should keep in mind that:
Public argumentation can only get us so far when the evidence for the risks and their mitigations is this unclear, when AI has automated so little of the economy, when AI failures have led to so few deaths, etc.
Most concrete progress on worst-case AI risks — e.g. arguably the AISIs network, the draft GPAI code of practice for the EU AI Act, company RSPs, the chip and SME export controls, or some lines of technical safety work — comes from dozens of people toiling away mostly behind-the-scenes for years, not from splashy public communications (though many of the people involved were influenced by AI risk writings years before). Public argumentation is a small portion of the needed work to make concrete progress. It may be necessary, but it’s far from sufficient.
My best guess (though very much not a confident guess) is the aggregate of these efforts are net-negative, and I think that is correlated with that work having happened in backrooms, often in context where people were unable to talk about their honest motivations. It sure is really hard to tell, but I really want people to consider the hypothesis that a bunch of these behind-the-scenes policy efforts have been backfiring, especially ex-post with a more republican administration.
The chip and SME export controls seem to currently be one of the drivers of the escalating U.S. and China arms race, the RSPs are I think largely ineffectual and have delayed the speed at which we could get regulation that is not reliant on lab supervision, and the overall EU AI act seems very bad, though I think the effect of the marginal help with drafting is of course much harder to estimate.
Missing from this list: The executive order, which I think has retrospectively revealed itself as being a major driver of polarization of AI-risk concerns, by strongly conflating near-term risk with extinction risks. It did also do a lot of great stuff, though my best guess is we’ll overall regret it (but on this I feel the least confident).
I agree that a ton of concrete political implementation work needs to be done, but I think the people working in the space who have chosen to do that work in a way that doesn’t actually engage in public discourse have made mistakes, and this has had large negative externalities.
See also: https://www.commerce.senate.gov/services/files/55267EFF-11A8-4BD6-BE1E-61452A3C48E3
Again, really not confident here, and I agree that there is a lot of implementation work to be done that is not glorious and flashy, but I think a bunch of the ways it’s been done in a kind of conspiratorial and secretive fashion has been counterproductive[1].
Ultimately as you say the bottleneck for things happening is political will and buy-in that AI systems pose a serious existential risk, and I think that means a lot of implementation and backroom work is blocked and bottlenecked on that public argumentation happening. And when people try to push forward anyways, they often end up forced to conflate existential risk with highly politicized short-term issues that aren’t very correlated with the actual risks, and backfire when the political winds change and people update.
See also: https://www.lesswrong.com/posts/vFqa8DZCuhyrbSnyx/integrity-in-ai-governance-and-advocacy
Funnily enough, I think this is true in the opposite direction. There is massive social pressure in EA spaces to take AI x-risk and the doomer arguments seriously. I don’t think it’s uncommon for someone who secretly suspects it’s all a load of nonsense to diplomatically say a statement like the above, in “polite EA company”.
Like you: I urge people who think AI x-risk is overblown to make their arguments loudly and repeatedly.
It’s easy for both to be true at the same time right? That is skeptics tone it down within EA, and believers tone it down when dealing with people *outside* EA.
Also, at the risk of saying the obvious, people occupying the ends of a position (within a specific context) will frequently feel that their perspectives are unfairly maligned or censored.
If the consensus position is that minimum wage should be $15/hour, both people who believe that it should be $0 and people who believe it should be $40/hour may feel social pressure to moderate their views; it takes active effort to reduce pressures in that direction.
As someone who leans on the x-risk-skeptical side, especially regarding AI, I’ll offer my anecdote that I don’t think my views have been unfairly maligned or censored much.
I do think my arguments have largely been ignored, which is unfortunate. But I don’t personally feel the “massive social pressure” that titotal alluded to above, at least in a strong sense.
I think your “vibe” is skeptical and most of your writings are ones expressing skepticism but I think your object-level x-risk probabilities are fairly close to the median?, people like titotal and @Vasco Grilo🔸 have their probabilities closer to lifelong risk of death from a lightning strike than from heart disease.
Good point, but I still think that many of my beliefs and values differ pretty dramatically from the dominant perspectives often found in EA AI x-risk circles. I think these differences in my underlying worldview should carry just as much weight—if not more—than whether my bottom-line estimates of x-risk align with the median estimates in the community. To elaborate:
On the values side:
Willingness to accept certain tradeoffs that are ~taboo in EA:
I am comfortable with many scenarios where AI risk increases by a non-negligible amount if this accelerates AI progress. In other words, I think the potential benefits of faster progress in AI development can often outweigh the risks posed by an increase in existential risk.
Relative indifference to human disempowerment:
With some caveats, I am largely comfortable with human disempowerment, and I don’t think the goal of AI governance should be to keep humans in control. To me, the preference for prioritizing human empowerment over other outcomes feels like an arbitrary form of speciesism—favoring humans simply because we are human, rather than due to any solid moral reasoning.
On the epistemic side:
Skepticism of AI alignment’s central importance to AI x-risk:
I am skeptical that AI alignment is very important for reducing x-risk from AI. My primary threat model for AI risk doesn’t center on the idea that an AI with a misaligned utility function would necessarily pose a danger. Instead, I think the key issue lies in whether agents with differing values—be they human or artificial—will have incentives to cooperate and compromise peacefully or whether their environment will push them toward conflict and violence.
Doubts about the treacherous turn threat model:
I believe the “treacherous turn” threat model is significantly overrated. (For context, this model posits that an AI system could pretend to be aligned with human values until it becomes sufficiently capable to act against us without risk.) I’ll note that both Paul Christiano and Eliezer Yudkowsky have identified this as their main threat model, but it is not my primary threat model.
Right. Thanks for clarifying, Linch. I guess the probability of human extinction over the next 10 years is 10^-6, which is roughly my probability of death from a lighting strike during the same period. “the odds of being struck by lightning in a given year are less than one in a million [I guess the odds are not much lower than this], and almost 90% of all lightning strike victims survive” (10^-6 = 10^-6*10*(1 − 0.9)).
Agreed, titotal. In addition, I encourage people to propose public bets to whoever has extreme views (if you are confident they will pay you back), and ask them if they are trying to get loans in order to increase their donations to projects decreasing AI risk, which makes sense if they do not expect to pay the interest in full due to high risk of extinction.
Fraud is bad.
In any case, people already don’t have enough worthwhile targets for donating money to, even under short timelines, so it’s not clear what good taking out loans would do. If it’s a question of putting one’s money where one’s mouth is, I personally took a 6-figure paycut in 2022 to work on reducing AI x-risk, and also increased my consumption/spending.
That’s not fraud, without more—Vasco didn’t suggest that anyone obtain loans that they did not intend to repay, or could not repay, in a no-doom world.
Every contract has an implied term that future obligations are void in the event of human extinction. There’s no shame in not paying one’s debts because extinction happened.
You cannot spend the money you obtain from a loan without losing the means to pay it back. You can do a tiny bit to borrow against your future labor income, but the normal thing to do is to declare personal bankruptcy, and so there is little assurance for that.
(This has been discussed many dozens of times on both the EA Forum and LessWrong. There exist no loan structures as far as I know that allow you to substantially benefit from predicting doom.)
Hello Habryka. Could you link to a good overview of why taking loans does not make sense even if one thinks there is a high risk of human extinction soon? Daniel Kokotajlo said:
I should also clarify that I am open to bets about less extreme events. For example, global unemployment rate doubling or population dropping below 7 billion in the next few years.
I do actually have trouble finding a good place to link to. I’ll try to dig one up in the next few days.
Thanks for clarifying, Jason.
I think people like me proposing public bets to whoever has extreme views or asking them whether they have considered loans should be transparent about their views. In contrast, fraude is “the crime of obtaining money or property by deceiving people”.
I read Vasco as suggesting exactly that—what is your understanding of what he meant, if not that?
Hi Rebecca,
I did not have anything in particular in mind about what the people asking for loans would do without human extinction soon. In general, I think it makes sense for people to pay their loans. However, since I strongly endorse expected total hedonistic utilitarism, I do not put an astronomical weight on respecting contracts. So I believe not paying a loan is fine if the benefits are sufficiently large.
This isn’t expressing disagreement, but I think it’s also important to consider the social effects of our speaking in line with different epistemic practices, i.e.,
When someone says “AI will kill us all” do people understand us as expressing 100% confidence in extinction, or do they interpret it as mere hyperbole and rhetoric, and infer that what we actually mean is that AI will potentially kill us all or have other drastic effects
When someone says “There’s a high risk AI kills us all or disempowers us” do people understand this as us expressing very high confidence that it kills us all or as saying it almost certainly won’t kill us all.