On (1) I’m not really sure the uncertainty and the trust in the estimate are separable. A probability estimate of a nonrecurring event[1] fundamentally is a label someone[2] applies to how confident they are something will happen. A corollary of this is that you should probably take into account how probability estimates could have actually been reached, your trust in that reasoning and the likelihood of bias when deciding how to act. [3]
On (2) I agree with your comments about the OP’s point; if the probabilities are +/-1 percentage point with error symmetrically distributed they’re still on average 1.5%[4], though in some circumstances introducing error bars might affect how you handle risk. But as I’ve said, I don’t think the distribution of errors looks like this when it comes to assessing whether long shots are worth pursuing or not (not even under the assumption of good faith). I’d be pretty worried if hits based grant-makers didn’t, frankly, and this question puts me in their shoes.
Your point about analytic philosophy often expecting literal answers to slightly weird hypotheticals is a good one. But EA isn’t just analytic philosophy and St Petersburg Paradoxes, it’s also people literally coming up with best guesses of probabilities of things they think might work and multiplying them (and a whole subculture based on that, and guesstimating just how impactful “crazy train” long shot ideas they’re curious about might be). So I think it’s pretty reasonable to treat it not as a slightly daft hypothetical where a 1.5% probability is an empirical reality,[5] but as a real world decision grant award scenario where the “1.5% probability” is a suspiciously precise credence, and you’ve got to decide whether to trust it enough to fund it over something that definitely works. In that situation, I think I’m discounting the estimated chance of success of the long shot by more than 50%.
FWIW I don’t take the question as evidence the survey designers are biased in any way
not me. Or at least a “1.5%” chance of working for thousands of people and implicitly a 98.5% chance of having no effect on anyone certainly doesn’t feel like the sort of degree of precision I’d estimate to...
Whilst it’s the unintended consequences of how the question was framed, this example feels particularly fishy. We’re asked to contemplate trading off something that certainly will work against something potentially higher yielding that is highly unlikely to work, and yet the thing that is highly unlikely to work turns out to have the higher EV because someone has speculated on its likelihood to a very high degree of precision, and those extra 5 thousandths made all the difference. What’s the chance the latter estimate is completely bogus or finessed to favour the latter option? I’d say in real world scenarios (and certainly not just EA scenarios) it’s quite a bit more than 5 in 1000....
Thanks for the reply, and sorry for the wall of text I’m posting now (no need to reply further, this is probably too much text for this sort of discussion)...
I agree that uncertainty is in someone’s mind rather than out there in the world. Still, granting the accuracy of probability estimates feels no different from granting the accuracy of factual assumptions. Say I was interested in eliciting people’s welfare tradeoffs between chicken sentience and cow sentience in the context of eating meat (how that translates into suffering caused per calorie of meat). Even if we lived in a world where false-labelling of meat was super common (such that, say, when you buy things labelled as ‘cow’, you might half the time get tuna, and when you buy chicken, you might half the time get ostrich), if I’m asking specifically for people’s estimates on the moral disvalue from chicken calories vs cow calories, it would be strange if survey respondees factored in information about tunas and ostriches. Surely, if I was also interested in how people thought about calories from tunas and ostriches, I’d be asking about those animals too!
Also, circumstances about the labelling of meat products can change over time, so that previously elicited estimates on “chicken/cow-labelled things” would now be off. Survey results will be more timeless if we don’t contaminate straightforward thought experiments with confounding empirical considerations that weren’t part of the question.
A respondee might mention Kant and how all our knowledge about the world is indirect, how there’s trust involved in taking assumptions for granted. That’s accurate, but let’s just take them for granted anyway and move on?
On whether “1.5%” is too precise of an estimate for contexts where we don’t have extensive data: If we grant that thought experiments can be arbitrarily outlandish, then it doesn’t really matter.
Still, I could imagine that you’d change your mind about never using these estimates if you thought more about situations where they might become relevant. For instance, I used estimates in that area (roughly around 1.5% chance of something happening) several times within the last two years:
My wife developed lupus a few years ago, which is the illness that often makes it onto the whiteboard in the show Dr House because it can throw up symptoms that mimic tons of other diseases, sometimes serious ones. We had a bunch of health scares where we were thinking “this is most likely just some weird lupus-related symptom that isn’t actually dangerous, but it also resembles that other thing (which is also a common secondary complication from lupus or its medications), which would be a true emergency. In these situations, should we go to the ER for a check-up or not? With a 4-5h average A&E waiting time and the chance to catch viral illnesses while there (which are extra bad when you already have lupus), it probably doesn’t make sense to go in if we think the chance of a true emergency is only <0.5%. However, at 2% or higher, we’d for sure want to go in. (In between those two, we’d probably continue to feel stressed and undecided, and maybe go in primarily for peace of mind, lol). Narrowing things down from “most likely it’s nothing, but some small chance that it’s bad!” to either “I’m confident this is <0.5%” or “I’m confident this is at least 2%” is not easy, but it worked in some instances. This suggests some usefulness (as a matter of practical necessity of making medical decisions in a context of long A&E waiting times) to making decisions based on a fairly narrowed down low-probability estimate. Sure, the process I described is still a bit more fuzzy than just pulling a 1.5% point estimate from somewhere, but I feel like it approaches similar levels of precision needed to narrow things down that much, and I think many other people would have similar decision thresholds in a situation like ours.
Admittedly, medical contexts are better studied than charity contexts, and especially influencing-the-distant-future charity contexts. So, it makes sense if you’re especially skeptical of that level of precision in charitable contexts. (And I indeed agree with this; I’m not defending that level of precision in practice for EA charities!) Still, like habryka pointed out in another comment, I don’t think there’s a red line were fundamental changes happen as probabilities get lower and lower. The world isn’t inherently frequentist, but we can often find plausibly-relevant base rates. Admittedly, there’s always some subjectivity, some art, in choosing relevant base rates, assessing additional risk factors, making judgment calls about “how much is this symptom a match?.” But if you find the right context for it (meaning: a context where you’re justifiably anchoring to some very low-probability base rate), you can get well below the 0.5% level for practically-relevant decisions (and maybe make proportional upwards or downwards adjustments from there). For these reasons, it doesn’t strike me as totally outlandish that some group will at some point come up with ranged very-low-probability estimate of averting some risk (like asteroid risk or whatever), while being well-calibrated. I’m not saying I have a concrete example in mind, but I wouldn’t rule it out.
OP here :) Thanks for the interesting discussion that the two of you have had!
Lukas_Gloor, I think we agree on most points. Your example of estimating a low probability of medical emergency is great! And I reckon that you are communicating appropriately about it. You’re probably telling your doctor something like “we came because we couldn’t rule out complication X” and not “we came because X has a probability of 2%” ;-)
You also seem to be well aware of the uncertainty. Your situation does not feel like one where you went to the ER 50 times, were sent home 49 times, and have from this developed a good calibration. It looks more like a situation where you know about danger signs which could be caused by emergencies, and have some rules like “if we see A and B and not C, we need to go to the ER”.[1]
Your situation and my post both involve low probabilities in high-stakes situations. That said, the goal of my post is to remind people that this type of probability is often uncertain, and that they should communicate this with the appropriate humility.
That’s how I would think about it, at least… it might well be that you’re more rational than I, and use probabilities more explicitly. ↩︎
Thanks for the thoughtful response.
On (1) I’m not really sure the uncertainty and the trust in the estimate are separable. A probability estimate of a nonrecurring event[1] fundamentally is a label someone[2] applies to how confident they are something will happen. A corollary of this is that you should probably take into account how probability estimates could have actually been reached, your trust in that reasoning and the likelihood of bias when deciding how to act. [3]
On (2) I agree with your comments about the OP’s point; if the probabilities are +/-1 percentage point with error symmetrically distributed they’re still on average 1.5%[4], though in some circumstances introducing error bars might affect how you handle risk. But as I’ve said, I don’t think the distribution of errors looks like this when it comes to assessing whether long shots are worth pursuing or not (not even under the assumption of good faith). I’d be pretty worried if hits based grant-makers didn’t, frankly, and this question puts me in their shoes.
Your point about analytic philosophy often expecting literal answers to slightly weird hypotheticals is a good one. But EA isn’t just analytic philosophy and St Petersburg Paradoxes, it’s also people literally coming up with best guesses of probabilities of things they think might work and multiplying them (and a whole subculture based on that, and guesstimating just how impactful “crazy train” long shot ideas they’re curious about might be). So I think it’s pretty reasonable to treat it not as a slightly daft hypothetical where a 1.5% probability is an empirical reality,[5] but as a real world decision grant award scenario where the “1.5% probability” is a suspiciously precise credence, and you’ve got to decide whether to trust it enough to fund it over something that definitely works. In that situation, I think I’m discounting the estimated chance of success of the long shot by more than 50%.
FWIW I don’t take the question as evidence the survey designers are biased in any way
“this will either avert 100,000 DALYs or have no effect” doesn’t feel like a proposition based on well-evidenced statistical regularities...
not me. Or at least a “1.5%” chance of working for thousands of people and implicitly a 98.5% chance of having no effect on anyone certainly doesn’t feel like the sort of degree of precision I’d estimate to...
Whilst it’s the unintended consequences of how the question was framed, this example feels particularly fishy. We’re asked to contemplate trading off something that certainly will work against something potentially higher yielding that is highly unlikely to work, and yet the thing that is highly unlikely to work turns out to have the higher EV because someone has speculated on its likelihood to a very high degree of precision, and those extra 5 thousandths made all the difference. What’s the chance the latter estimate is completely bogus or finessed to favour the latter option? I’d say in real world scenarios (and certainly not just EA scenarios) it’s quite a bit more than 5 in 1000....
that one’s a math test too ;-)
maybe a universe where physics is a god with an RNG...
Thanks for the reply, and sorry for the wall of text I’m posting now (no need to reply further, this is probably too much text for this sort of discussion)...
I agree that uncertainty is in someone’s mind rather than out there in the world. Still, granting the accuracy of probability estimates feels no different from granting the accuracy of factual assumptions. Say I was interested in eliciting people’s welfare tradeoffs between chicken sentience and cow sentience in the context of eating meat (how that translates into suffering caused per calorie of meat). Even if we lived in a world where false-labelling of meat was super common (such that, say, when you buy things labelled as ‘cow’, you might half the time get tuna, and when you buy chicken, you might half the time get ostrich), if I’m asking specifically for people’s estimates on the moral disvalue from chicken calories vs cow calories, it would be strange if survey respondees factored in information about tunas and ostriches. Surely, if I was also interested in how people thought about calories from tunas and ostriches, I’d be asking about those animals too!
Also, circumstances about the labelling of meat products can change over time, so that previously elicited estimates on “chicken/cow-labelled things” would now be off. Survey results will be more timeless if we don’t contaminate straightforward thought experiments with confounding empirical considerations that weren’t part of the question.
A respondee might mention Kant and how all our knowledge about the world is indirect, how there’s trust involved in taking assumptions for granted. That’s accurate, but let’s just take them for granted anyway and move on?
On whether “1.5%” is too precise of an estimate for contexts where we don’t have extensive data: If we grant that thought experiments can be arbitrarily outlandish, then it doesn’t really matter.
Still, I could imagine that you’d change your mind about never using these estimates if you thought more about situations where they might become relevant. For instance, I used estimates in that area (roughly around 1.5% chance of something happening) several times within the last two years:
My wife developed lupus a few years ago, which is the illness that often makes it onto the whiteboard in the show Dr House because it can throw up symptoms that mimic tons of other diseases, sometimes serious ones. We had a bunch of health scares where we were thinking “this is most likely just some weird lupus-related symptom that isn’t actually dangerous, but it also resembles that other thing (which is also a common secondary complication from lupus or its medications), which would be a true emergency. In these situations, should we go to the ER for a check-up or not? With a 4-5h average A&E waiting time and the chance to catch viral illnesses while there (which are extra bad when you already have lupus), it probably doesn’t make sense to go in if we think the chance of a true emergency is only <0.5%. However, at 2% or higher, we’d for sure want to go in. (In between those two, we’d probably continue to feel stressed and undecided, and maybe go in primarily for peace of mind, lol). Narrowing things down from “most likely it’s nothing, but some small chance that it’s bad!” to either “I’m confident this is <0.5%” or “I’m confident this is at least 2%” is not easy, but it worked in some instances. This suggests some usefulness (as a matter of practical necessity of making medical decisions in a context of long A&E waiting times) to making decisions based on a fairly narrowed down low-probability estimate. Sure, the process I described is still a bit more fuzzy than just pulling a 1.5% point estimate from somewhere, but I feel like it approaches similar levels of precision needed to narrow things down that much, and I think many other people would have similar decision thresholds in a situation like ours.
Admittedly, medical contexts are better studied than charity contexts, and especially influencing-the-distant-future charity contexts. So, it makes sense if you’re especially skeptical of that level of precision in charitable contexts. (And I indeed agree with this; I’m not defending that level of precision in practice for EA charities!) Still, like habryka pointed out in another comment, I don’t think there’s a red line were fundamental changes happen as probabilities get lower and lower. The world isn’t inherently frequentist, but we can often find plausibly-relevant base rates. Admittedly, there’s always some subjectivity, some art, in choosing relevant base rates, assessing additional risk factors, making judgment calls about “how much is this symptom a match?.” But if you find the right context for it (meaning: a context where you’re justifiably anchoring to some very low-probability base rate), you can get well below the 0.5% level for practically-relevant decisions (and maybe make proportional upwards or downwards adjustments from there). For these reasons, it doesn’t strike me as totally outlandish that some group will at some point come up with ranged very-low-probability estimate of averting some risk (like asteroid risk or whatever), while being well-calibrated. I’m not saying I have a concrete example in mind, but I wouldn’t rule it out.
OP here :) Thanks for the interesting discussion that the two of you have had!
Lukas_Gloor, I think we agree on most points. Your example of estimating a low probability of medical emergency is great! And I reckon that you are communicating appropriately about it. You’re probably telling your doctor something like “we came because we couldn’t rule out complication X” and not “we came because X has a probability of 2%” ;-)
You also seem to be well aware of the uncertainty. Your situation does not feel like one where you went to the ER 50 times, were sent home 49 times, and have from this developed a good calibration. It looks more like a situation where you know about danger signs which could be caused by emergencies, and have some rules like “if we see A and B and not C, we need to go to the ER”.[1]
Your situation and my post both involve low probabilities in high-stakes situations. That said, the goal of my post is to remind people that this type of probability is often uncertain, and that they should communicate this with the appropriate humility.
That’s how I would think about it, at least… it might well be that you’re more rational than I, and use probabilities more explicitly. ↩︎