Does this solve Pascal’s Muggings?

  1. In his recent 80k podcast, Will argued that we don’t have a decision theory that solves the Pascal’s Mugging problem

  2. In a recent post from Greg Lewis, I added a comment on how I think about Pascal’s Muggings

  3. Can someone explain why my comment doesn’t count as a decision theory that solves the Pascal’s Mugging problem?

I haven’t spent a ton of time thinking about this, so I’m expecting someone will be able to easily clarify something that I don’t know.

  1. Will believes we don’t have a decision theory to resolve the Pascal’s Mugging issue

Here’s the quote from the 80k podcast transcript:

Will MacAskill: Where I think, very intuitively, I can produce some guarantee of good — like saving a life, or a one in a trillion trillion trillion trillion trillion trillion trillion chance of producing a sufficiently large amount of value. That expected value — that is, the amount of value multiplied by the probability of achieving it — is even greater than the expected value of saving one life.

Will MacAskill: So what do we do? I’m like, “Eh, it really doesn’t seem you should do the incredibly-low-probability thing.” That seems very intuitive. And then the question is, can you design a decision theory that avoids that implication without having some other incredibly counterintuitive implications? It turns out the answer is no, actually.

Rob Wiblin: Right.

Will MacAskill: Now, this isn’t to say that you should do the thing that involves going after tiny probabilities of enormous amounts of value. It’s just within the state where we can formally prove that there’s a paradox here — that, I’m sorry, the philosophers have failed you: we have no good answer about what to do here.

My resolution is that I think you should do the extremely-low probability thing, as long as it’s really true that it actually will have the one in a trillion … trillion chance of producing the huge amount of value.

The problem is that in practice, you almost certainly never would be confident that that’s the case, as opposed to, e.g. the proposed action not really making the huge value outcome more likely at all. Or as opposed to the one minus one in a trillion … trillion chance outcomes actually dominating.

2. I recently set out a heuristic that makes sense to me

Here’s a link to the comment, and I copy and paste almost all of it below.

What I wrote below appears to provide a useful framework for thinking about Pascal’s Mugging situations.

The Reversal/​Inconsistency Test:

If the logic seems to lead to taking action X, and seems to equally validly lead to taking an action inconsistent with X, then I treat it as a Pascal’s Mugging.

Examples:

  • Original Pascal’s Mugging:

    • The original Pascal’s Mugging suggests you should give the mugger your 10 livres in the hope that you get the promised 10 quadrillion Utils.

    • The test: It seems equally valid that there’s an “anti-mugger” out there who is thinking “if Pascal refuses to give the mugger the 10 livres, then I will grant him 100 quadrillion Utils”. There is no reason to privilege the mugger who is talking to you, and ignore the anti-mugger whom you can’t see.

    • Conclusion: fails the Reversal/​Inconsistency Test, so treat as a Pascal’s Mugging and ignore.

  • Extremely unlikely s-risk example:

    • I claim that the fart goblins of Smiggledorf will appear in the winter solstice of the year 2027, and magically keep everyone alive for 1 googleplex years, but subject them to constant suffering by having to smell the worst farts you’ve ever imagined. The smells are so bad that the suffering that each person experiences in one minute is equivalent to 1 million lifetimes of suffering.

    • The only way to avoid this horrific outcome is to earn as much money as you can, and donate 90% of your income to a very nice guy with the EA Forum username “sanjay”.

    • The test: Is there any reason to believe that donating all this money will make the fart goblins less likely to appear, as opposed to more?

    • Conclusion: fails the Reversal/​Inconsistency Test, so treat as a Pascal’s Mugging and ignore.

  • Extremely likely x-risk example:

    • In the distant land of Utopi-doogle, everyone has a wonderful, beautiful life, except for one lady called Cassie who runs around anxiously making predictions. Her first prediction is incredibly specific and falsifiable, and turns out to be correct. Same for her second, and her third, and after 100 highly specific, falsifiable and incredibly varied predictions, with a 100% success rate, she then predicts that Utopi-doogle will likely explode killing everyone.

    • The only way to save Utopi-doogle is for every able-bodied adult to stamp their foot while saying Abracadabra. Unfortunately, you have to get the correct foot—if some people are stamping their right foot and some are stamping their left foot, it won’t work. If everyone is stamping their left foot, this will either mean that Utopi-doogle is saved, or that Utopi-doogle will be instantly destroyed.

    • A politician sets up a Left Foot movement arguing that we should try to save Utopi-doogle by arranging a simultaneous left foot stamp.

    • The test: The simultaneous left foot stamp has equal chance of causing doom as of saving civilisation.

    • Conclusion: fails the Reversal/​Inconsistency Test, so treat the politician’s suggestion as a Pascal’s Mugging and ignore.

    • Note, interestingly, that other actions—such as further research—are not necessarily a Pascal’s Mugging. (Could we ask Cassie about simultaneous stamping of the right foot?)

  • How some people perceive AI safety risk:

    • Let’s assume that, despite recent impressive successes by AI capabilities researchers, human-level AGI has a low (10^-12) chance of happening in the next 200 years

    • Let’s also concede that, if such AGI arose, humanity would have a <50% chance of survival unless we had solved alignment.

    • Let’s continue being charitable to the importance of AI safety and assume that in just over 200 years, humanity will reach a state of utopia which last for millenia, as long as we haven’t wiped ourselves out before then, which means that extinction in the next 200 years would mean 10^20 lives lost

    • The raw maths seems to suggest that work on AI safety is high impact.

    • The test: If we really are that far from AGI, can any work we do really help? Are we sure that any AI safety research we do now will actually make safe AI more likely and not less likely? There are a myriad ways we could make things worse, e.g. we could inadvertently further capabilities research; the research field could be path-dependent, and our early mistakes could damage the field more than just leaving it be until we understand the field better, we might realise that we need to include some ethical thinking, but we incorporate the ethics of 2022, and later realise the ethics of 2022 was flawed, etc.

    • Conclusion: fails the Reversal/​Inconsistency Test, so treat as a Pascal’s Mugging and ignore.

    • Note that in this scenario, it is true that the AGI scenario is highly unlikely, but the important thing is not that it’s unlikely, it’s that it’s unactionable.

3. Can someone explain why my comment doesn’t count as a decision theory that solves the Pascal’s Mugging problem?

I’m open to the possibility that there are subtleties about exactly what is meant by a decision theory that I’m not aware of. Or perhaps there are reasons to believe that the heuristic I set out above is flawed (I haven’t actually used it in anger much).