My preferred resolution of Pascal’s Mugging is not to set an arbitrary threshold, such as 1-in-1,000,000 (although I agree that probabilities above 1-in-1,000,000 likely don’t count as Pascal’s Muggings).
I prefer the Reversal/Inconsistency Test:
If the logic seems to lead to taking action X, and seems to equally validly lead to taking an action inconsistent with X, then I treat it as a Pascal’s Mugging.
Examples:
Original Pascal’s Mugging:
The original Pascal’s Mugging suggests you should give the mugger your 10 livres in the hope that you get the promised 10 quadrillion Utils.
The test: It seems equally valid that there’s an “anti-mugger” out there who is thinking “if Pascal refuses to give the mugger the 10 livres, then I will grant him 100 quadrillion Utils”. There is no reason to privilege the mugger who is talking to you, and ignore the anti-mugger whom you can’t see.
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Extremely unlikely s-risk example:
I claim that the fart goblins of Smiggledorf will appear in the winter solstice of the year 2027, and magically keep everyone alive for 1 googleplex years, but subject them to constant suffering by having to smell the worst farts you’ve ever imagined. The smells are so bad that the suffering that each person experiences in one minute is equivalent to 1 million lifetimes of suffering.
The only way to avoid this horrific outcome is to earn as much money as you can, and donate 90% of your income to a very nice guy with the EA Forum username “sanjay”.
The test: Is there any reason to believe that donating all this money will make the fart goblins less likely to appear, as opposed to more?
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Extremely likely x-risk example:
In the distant land of Utopi-doogle, everyone has a wonderful, beautiful life, except for one lady called Cassie who runs around anxiously making predictions. Her first prediction is incredibly specific and falsifiable, and turns out to be correct. Same for her second, and her third, and after 100 highly specific, falsifiable and incredibly varied predictions, with a 100% success rate, she then predicts that Utopi-doogle will likely explode killing everyone.
The only way to save Utopi-doogle is for every able-bodied adult to stamp their foot while saying Abracadabra. Unfortunately, you have to get the correct foot—if some people are stamping their right foot and some are stamping their left foot, it won’t work. If everyone is stamping their left foot, this will either mean that Utopi-doogle is saved, or that Utopi-doogle will be instantly destroyed.
A politician sets up a Left Foot movement arguing that we should try to save Utopi-doogle by arranging a simultaneous left foot stamp.
The test: The simultaneous left foot stamp has equal chance of causing doom as of saving civilisation.
Conclusion: fails the Reversal/Inconsistency Test, so treat the politician’s suggestion as a Pascal’s Mugging and ignore.
Note, interestingly, that other actions—such as further research—are not necessarily a Pascal’s Mugging. (Could we ask Cassie about simultaneous stamping of the right foot?)
How some people perceive AI safety risk:
Let’s assume that, despite recent impressive successes by AI capabilities researchers, human-level AGI has a low (10^-12) chance of happening in the next 200 years
Let’s also concede that, if such AGI arose, humanity would have a <50% chance of survival unless we had solved alignment.
Let’s continue being charitable to the importance of AI safety and assume that in just over 200 years, humanity will reach a state of utopia which last for millenia, as long as we haven’t wiped ourselves out before then, which means that extinction in the next 200 years would mean 10^20 lives lost
The raw maths seems to suggest that work on AI safety is high impact.
The test: If we really are that far from AGI, can any work we do really help? Are we sure that any AI safety research we do now will actually make safe AI more likely and not less likely? There are a myriad ways we could make things worse, e.g. we could inadvertently further capabilities research; the research field could be path-dependent, and our early mistakes could damage the field more than just leaving it be until we understand the field better, we might realise that we need to include some ethical thinking, but we incorporate the ethics of 2022, and later realise the ethics of 2022 was flawed, etc.
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Note that in this scenario, it is true that the AGI scenario is highly unlikely, but the important thing is not that it’s unlikely, it’s that it’s unactionable.
Glad to see someone already wrote out some of my thoughts. To just tag on, some of my key bullet points for understanding Pascalian wager problems are:
• You can have offsetting uncertainties and consequences (as you mention), and thus you should fight expected value fire with EV fire.
• Anti-Pascalian heuristics are not meant to directly maximize the accuracy of your beliefs, but rather to improve the effectiveness of your overall decision-making in light of constraints on your time/cognitive resources. If we had infinite time to evaluate everything—even possibilities that seem like red herrings—it would probably usually be optimal to do so, but we don’t have infinite time so we have to make decisions as to what to spend our time analyzing and what to accept as “best-guesstimates” for particularly fuzzy questions. Thus, you can “fight EV fire with EV fire” at the level of “should I even continue entertaining this idea?”
• Very low probabilities (risk estimates) tend to be associated with greater uncertainty, especially when the estimates aren’t based on clear empirical data. As a result, really low probability estimates like “1/100,000,000” tend to be more fragile to further analysis, which crucially plays into the next bullet point.
• Sometimes the problem with Pascalian situations (especially in some high school policy debate rounds I’ve seen) is that someone fails to update based on the velocity/acceleration of their past updates: suppose one person presents an argument saying “this very high impact outcome is 1% likely.” The other person spends a minute arguing that it’s not 1% likely, and it actually only seems to be 0.1% likely. They spend another minute disputing it and it then seems to be only 0.01% likely. They then say “I have 5 other similar-quality arguments I could give, but I don’t have time.” The person that originally presented the argument could then say “Ha! I can’t dispute their arguments, but even if it’s 0.01% likely, the expected value of this outcome still is large” … the other person gives a random one of their 5 arguments and drops the likelihood by another order of magnitude, etc. The point being, given the constraints on information flow/processing speed and available time in discourse, one should occasionally take into account how fast they are updating and infer the “actual probability estimate I would probably settle on if we had a substantially greater amount of time to explore this.” (Then fight EV fire with EV fire)
Thank you for writing this.
My preferred resolution of Pascal’s Mugging is not to set an arbitrary threshold, such as 1-in-1,000,000 (although I agree that probabilities above 1-in-1,000,000 likely don’t count as Pascal’s Muggings).
I prefer the Reversal/Inconsistency Test:
If the logic seems to lead to taking action X, and seems to equally validly lead to taking an action inconsistent with X, then I treat it as a Pascal’s Mugging.
Examples:
Original Pascal’s Mugging:
The original Pascal’s Mugging suggests you should give the mugger your 10 livres in the hope that you get the promised 10 quadrillion Utils.
The test: It seems equally valid that there’s an “anti-mugger” out there who is thinking “if Pascal refuses to give the mugger the 10 livres, then I will grant him 100 quadrillion Utils”. There is no reason to privilege the mugger who is talking to you, and ignore the anti-mugger whom you can’t see.
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Extremely unlikely s-risk example:
I claim that the fart goblins of Smiggledorf will appear in the winter solstice of the year 2027, and magically keep everyone alive for 1 googleplex years, but subject them to constant suffering by having to smell the worst farts you’ve ever imagined. The smells are so bad that the suffering that each person experiences in one minute is equivalent to 1 million lifetimes of suffering.
The only way to avoid this horrific outcome is to earn as much money as you can, and donate 90% of your income to a very nice guy with the EA Forum username “sanjay”.
The test: Is there any reason to believe that donating all this money will make the fart goblins less likely to appear, as opposed to more?
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Extremely likely x-risk example:
In the distant land of Utopi-doogle, everyone has a wonderful, beautiful life, except for one lady called Cassie who runs around anxiously making predictions. Her first prediction is incredibly specific and falsifiable, and turns out to be correct. Same for her second, and her third, and after 100 highly specific, falsifiable and incredibly varied predictions, with a 100% success rate, she then predicts that Utopi-doogle will likely explode killing everyone.
The only way to save Utopi-doogle is for every able-bodied adult to stamp their foot while saying Abracadabra. Unfortunately, you have to get the correct foot—if some people are stamping their right foot and some are stamping their left foot, it won’t work. If everyone is stamping their left foot, this will either mean that Utopi-doogle is saved, or that Utopi-doogle will be instantly destroyed.
A politician sets up a Left Foot movement arguing that we should try to save Utopi-doogle by arranging a simultaneous left foot stamp.
The test: The simultaneous left foot stamp has equal chance of causing doom as of saving civilisation.
Conclusion: fails the Reversal/Inconsistency Test, so treat the politician’s suggestion as a Pascal’s Mugging and ignore.
Note, interestingly, that other actions—such as further research—are not necessarily a Pascal’s Mugging. (Could we ask Cassie about simultaneous stamping of the right foot?)
How some people perceive AI safety risk:
Let’s assume that, despite recent impressive successes by AI capabilities researchers, human-level AGI has a low (10^-12) chance of happening in the next 200 years
Let’s also concede that, if such AGI arose, humanity would have a <50% chance of survival unless we had solved alignment.
Let’s continue being charitable to the importance of AI safety and assume that in just over 200 years, humanity will reach a state of utopia which last for millenia, as long as we haven’t wiped ourselves out before then, which means that extinction in the next 200 years would mean 10^20 lives lost
The raw maths seems to suggest that work on AI safety is high impact.
The test: If we really are that far from AGI, can any work we do really help? Are we sure that any AI safety research we do now will actually make safe AI more likely and not less likely? There are a myriad ways we could make things worse, e.g. we could inadvertently further capabilities research; the research field could be path-dependent, and our early mistakes could damage the field more than just leaving it be until we understand the field better, we might realise that we need to include some ethical thinking, but we incorporate the ethics of 2022, and later realise the ethics of 2022 was flawed, etc.
Conclusion: fails the Reversal/Inconsistency Test, so treat as a Pascal’s Mugging and ignore.
Note that in this scenario, it is true that the AGI scenario is highly unlikely, but the important thing is not that it’s unlikely, it’s that it’s unactionable.
Glad to see someone already wrote out some of my thoughts. To just tag on, some of my key bullet points for understanding Pascalian wager problems are:
• You can have offsetting uncertainties and consequences (as you mention), and thus you should fight expected value fire with EV fire.
• Anti-Pascalian heuristics are not meant to directly maximize the accuracy of your beliefs, but rather to improve the effectiveness of your overall decision-making in light of constraints on your time/cognitive resources. If we had infinite time to evaluate everything—even possibilities that seem like red herrings—it would probably usually be optimal to do so, but we don’t have infinite time so we have to make decisions as to what to spend our time analyzing and what to accept as “best-guesstimates” for particularly fuzzy questions. Thus, you can “fight EV fire with EV fire” at the level of “should I even continue entertaining this idea?”
• Very low probabilities (risk estimates) tend to be associated with greater uncertainty, especially when the estimates aren’t based on clear empirical data. As a result, really low probability estimates like “1/100,000,000” tend to be more fragile to further analysis, which crucially plays into the next bullet point.
• Sometimes the problem with Pascalian situations (especially in some high school policy debate rounds I’ve seen) is that someone fails to update based on the velocity/acceleration of their past updates: suppose one person presents an argument saying “this very high impact outcome is 1% likely.” The other person spends a minute arguing that it’s not 1% likely, and it actually only seems to be 0.1% likely. They spend another minute disputing it and it then seems to be only 0.01% likely. They then say “I have 5 other similar-quality arguments I could give, but I don’t have time.” The person that originally presented the argument could then say “Ha! I can’t dispute their arguments, but even if it’s 0.01% likely, the expected value of this outcome still is large” … the other person gives a random one of their 5 arguments and drops the likelihood by another order of magnitude, etc. The point being, given the constraints on information flow/processing speed and available time in discourse, one should occasionally take into account how fast they are updating and infer the “actual probability estimate I would probably settle on if we had a substantially greater amount of time to explore this.” (Then fight EV fire with EV fire)
This says “200 hundred”. Do you mean 200 or 20,000?
Thanks, I’ve edited it to be 200 rather than 200 hundred