The Unilateralist’s Curse, An Explanation
Motivation: This post seeks to explain the Unilateralist’s Curse more thoroughly than is done on the tag and more simply than is done in the original paper. This post might be helpful to anybody trying to do good, and the examples used apply most to those doing AI Safety or community building work.
Epistemic Effort: ~10 hours of reading + some hours writing/revising
In short: The Unilateralist’s Curse happens when multiple people could take an action with far reaching consequences, they could do so without the permission of others, and the people are not all in agreement about whether doing the action is good – in this situation it is more likely that the action happens than is ideal. Understanding how to identify and respond to Unilateralist’s Curse situations is important for those trying to do good effectively because it will help them avoid causing accidental harm.
The Unilateralist’s Curse occurs when
There are multiple actors (people, organizations, countries, etc.) who could take an action unilaterally, or without the agreement of others, and
The action has large consequences or consequences on everybody, and
The actors do not know for certain whether the consequences of the action will be positive or negative (because of uncertainty about the probability of various outcomes, how bad those outcomes would be, different moral judgements about goodness and badness, different risk tolerances, etc.), and
This problem still exists even if all the actors are well-intentioned or trying to do as much good as possible.
The curse: given the above situation, the action is more likely to be taken than is ideal, and the chances of it being taken increase as the number of actors capable of taking the action increases.
Examples:
Say a group of 10 biologists has just sequenced the genome for smallpox and are deciding whether or not to publish the genome. On the one hand, publishing will allow for better treatments and vaccines. On the other hand, the genome would make it easier for a terrorist to create weaponized smallpox. If 9 of the biologists think it’s a bad idea to publish but one thinks publishing is best, that one person might unilaterally publish the genome. All it takes is one person to mess things up* by taking the action. *assuming the true consequences are net negative
Almost anybody could light themselves on fire in front of the White House with a sign warning about AI risk. This could very possibly lead to net harm as it makes AI Safety seem really extreme and fanatic, but there’s also a slim chance it could spur useful action, hence why somebody might do it. This is a unilateral action because you don’t need anybody’s permission to do this, and the consequences would be far reaching.
Dealing with the Unilateralist’s Curse, practical advice:
Learn more about what’s going on. If you find yourself in the minority opinion of a unilateral scenario as described above, you should seek to find out why the people who aren’t doing the action aren’t doing it. Do they disagree on the empirical likelihood of various outcomes? Do they disagree about how good/bad various outcomes would be? Maybe they haven’t thought about it very much – if so, why not? Maybe they think it’s a good idea but that their comparative advantage lies elsewhere? Probably they have some information that you don’t have which is altering their decision – knowing this information would be useful! Maybe there are information hazards associated with the action and they can’t tell you about said hazards – if so maybe you should probably just trust their judgment and read below.
Take actions with the agreement of others. The heuristic of “comply in action, defy in thought,” or “think independently and act collectively” applies here. In unilateral situations, you can often coordinate with others; you should probably do this. There are some cases where this does not apply, but if you are going to do a thing unilaterally, you should definitely know why others haven’t done it yet.
Relatedly, Use group decision making processes. For example, make an agreement ahead of time that you and the other actors in the unilateralist situation will deliberate on the action, take a vote, and you all abide by the majority’s decision to act or not to act. You could also create a single decision maker responsible for centrally planning everything – for instance consolidating all the grant-making to one person – but this has significant tradeoffs.
Simply defer to what the group is doing. If nobody has published the genome for SuperVirus35, don’t publish the genome for SuperVirus35. Again, it’s better if you can understand why others haven’t taken the action, but it’s often the safer option to defer to others.
Recognize that you’re in a Unilateralist’s Curse scenario and respond accordingly. This means reflecting on the fact that your views vary from others’ views and are subject to error. You should decrease the probability that you do an action accordingly.
When you can’t talk talk about it: If you can’t talk about the fact that you are in this position or can’t get information about others in this position (maybe you have the blueprint for an AGI and you’re considering deploying it; if you were in this position you probably wouldn’t want to tell many people because it would lead to people trying to steal it from you or accelerating timelines as everybody else realizes AGI is easily within reach, also other adversarial dynamics): ask yourself how likely it is that others are in a similar position. If others have the blueprint for AGI and haven’t deployed it, there’s probably a good reason. Generally, the more people in the unilateralist position, the higher the chance that somebody does the action. Therefore, if lots of people are in the position but nobody has done it, you should update your belief away from doing the action yourself because others have deemed it not worth doing.
A little bit of nuance:
If the actual value of a proposed action is positive, following the cautious principles previously explained wouldn’t be good. In fact, historically deferring to the consensus view would yield many bad outcomes. Galileo’s theories were quite unpopular at the time, but that didn’t make them wrong. If people always deferred to the consensus, startups would never be founded and innovation would likely be very slow. There’s also tons of ambiguous cases (as in people currently disagree about whether unilateral action was right) – should Daniel Ellsberg have leaked the Pentagon Papers or should Edward Snowden have leaked classified NSA data unilaterally?
The Unilateralist’s Curse encourages us to not act when we might do lots of harm. However, there is also the risk of spoiling an initiative. Maybe there’s some awesome plan, but multiple actors have veto power, this is a Unilateralist’s Curse scenario. For instance, a funding proposal might need 100% agreement among the fund-managers (I don’t think this is how it actually works in many places); if 9 fund-managers think it’s a good idea while one disagrees, the project doesn’t get funded. Again, in this scenario it would probably be good if the fund-managers communicated about why they disagree and adopted some sort of “majority rules” policy once they had shared why they all think the thing they do.
In many contexts there is already a group-level bias against taking unilateral actions. For instance there are social pressures encouraging people to do things widely recognized as acceptable. Despite these pressures pushing for conformity, it is still useful to consider whether you are in a Unilateralist’s Curse scenario and what you should do about it.
Slightly technical why this happens: this phenomenon happens because the actors do not all have perfect knowledge of the true value of the action; instead, they all have some error in their estimation of the value. Because the actors are not coordinating, we should assume that their errors are not identical, meaning they disagree about their estimates of the value of the action. You could think about a random normal distribution of estimated-value clustered around the true value of the action. The more actors (estimated-values you draw from the normal distribution) the more likely you are to get an outlier who thinks the value is positive when it is actually negative.
The probability of the action being taken is higher if the true value of the action is higher. For actions with really negative true value, it is unlikely that anybody will think the action is worth doing. However, if the true value of the action is only a little net negative, it is relatively more likely that somebody expects the value to be positive and thus takes the unilateral action.
More examples from AI safety:
Purely hypothetical: OpenAI, DeepMind, Anthropic, and MIRI might all, at some point, be capable of deploying an AGI. They disagree about how likely it is for the AGI to kill everybody – the MIRI folks are pretty worried, they place the probability of existential catastrophe, given that they deploy their AGI, at 90%, P(Doom | Deploy) ~ 90%, and decide not to deploy. Anthropic isn’t really sure, but there’s enough safety researchers who are worried about existential risk, P(Doom | Deploy) ~ 10%, so they don’t deploy. OpenAI is a bit less worried, placing the chance of existential catastrophe at P(Doom | Deploy) ~ 1%, but they decide this risk is too high. DeepMind meanwhile doesn’t have many existential-risk-focused people because they all left after a scandal in 2023, and the organization places P(Doom | Deploy) ~ 0.1%; DeepMind decides to deploy their AGI because this feels sufficiently low. The world ends. DeepMind wasn’t trying to kill everybody. They were well-intentioned, but they misjudged the actual risk (or we got really unlucky with a 1/1000 dice role). With different empirical beliefs, they come to a different conclusion about what to do.
Alternatively, say all four of these groups agree that the chances of AGI killing us all are 0.1%, and they also think that in the 99.9% of worlds where we survive, life gets way better. Longtermist folks at MIRI, Anthropic, and OpenAI manage to convince others in the organization that extinction is really really bad and we shouldn’t accept a 0.1% chance of it. Maybe DeepMind doesn’t have anybody advocating for longtermist arguments or has person-affecting views, and they believe that everybody dying is roughly 7 billion times as bad as one person dying. They estimate that in the 99.9% of worlds where we don’t all die, all human lives get sufficiently better so the calculation rules in favor of deploying. With different moral beliefs about who matters, they come to a different conclusion about what to do – even if they agreed about the empirical chances of various outcomes.
Yelling “AI is going to kill us all” from the mountain tops. Pretty much anybody can do this. There are good reasons why the EA community is not currently doing this, for instance it makes people think we’re weird and it doesn’t clearly lead to progress on solving AI Alignment. I’m not saying we should never do this, but I’m saying everybody else has also thought of this idea, and largely decided against doing it.
Paying top math and ML researchers to do alignment work. If you have the money to do this, you are in a unilateralist situation. Tons of people have also thought of this. Why haven’t they done it? Heck, people might have tried this or are currently trying this, and you might not know. There are reputational reasons to avoid making such a project public.
Two grant makers could be independently deciding whether to fund a project, and it’s unclear if the project is net-positive. In this situation, it is more likely that the project gets funded than if there was only one grant maker. Similarly, if there are 10 grant makers, it is even more likely that the project gets funded. In reality, grant makers are hopefully aware of the Unilateralist’s Curse and are responding to it.
The EA community sometimes promotes a culture of “if you see a promising thing that could be done, go do it.” But this has a tradeoff with the Unilateralist’s Curse. Probably somebody else has already thought of this – why didn’t they do it? If you find yourself in a unilateralist situation, first think, see if anybody else has done the thing, then consider talking to others about it (do so unless there’s huge info hazards), and try to learn why nobody else has done the thing. Please don’t kill us all.
Thank you for reading! Please feel free to send me any feedback, positive or negative. Thank you to Lizka for feedback on an earlier draft.
- Principles of Privacy for Alignment Research by 27 Jul 2022 19:53 UTC; 72 points) (LessWrong;
- It’s not obvious that getting dangerous AI later is better by 23 Sep 2023 5:35 UTC; 23 points) (
- Will an Overconfident AGI Mistakenly Expect to Conquer the World? by 25 Aug 2023 23:24 UTC; 15 points) (LessWrong;
Additionally, the people willing to act unilaterally are more likely to have positively biased estimates of the value of the action:
P(estimate > truth|unilateralist)>P(estimate > truth)As you noted, some curse arguments are symmetric, in the sense that they also provide reason to expect unilateralists to do more good. Notably, the bias above is asymmetric; it provides a reason to expect unilateralists to do less good, with no corresponding reason to expect unilateralists to do more good.
@JakubK I think that your interpretation of OP’s quote is somewhat less useful, in the sense that it only retroactively explains a behavior of a “unilateralist”—i.e. an actor that had already made a decision. I find that for the purposes of a generic actor, it has less utility to ask themselves “Am I (acting as) a unilateralist?” instead of “How many other actors are there capable of, yet not acting?”
Beside the abstracntess of it, I see that there is actually quite some overlap with what you would call “unilateralists” and simply “courageous actors”. This is because there usually is what I would refer to as an “activation energy” for actions to be made, which is basically a bias where the majority of actors are more likely not to act when the true value of an action is net neutral or slightly positive. And precisely in the scenario where the true value of an action is only slightly (but significantly) positive, you would then need a courageous actor to “overcome the activation energy” and do the right thing.
I’m not sure if activation energy is the right term, but it is an observed phenomenon in ethics, see the comparison of the Fat Man with the classical Trolley Switch scenario. To sum it up, I think that looking at the unilateralist’s curse simply as a statistical phenomenon is fundamentally wrong, and one must include moral dimensions in deciding whether to take the action no one has taken despite having the option to. That said, the general advice in the post like information sharing etc. are indeed helpful and applicable.