titotal comments on Probabilities might be off by one percentage point

titotal 12 Dec 2024 12:21 UTC
8 points
7 ∶ 4
When I answered this question, I answered it with an implied premise that an EA org is making these claims about the possibilities, and went for number 1, because I don’t trust EA orgs to be accurate in their “1.5%” probability estimates, and I expect these to be more likely overestimates than underestimates.
- Lizka 12 Dec 2024 14:43 UTC
  11 points
  4 ∶ 0
  Parent
  As a datapoint: despite (already) agreeing to a large extent with this post,^[1] IIRC I answered the question assuming that I do trust the premise.
  Despite my agreement, I do think there are certain kinds of situations in which we can reasonably use small probabilities. (Related post: Most* small probabilities aren’t pascalian, and maybe also related.)
  More generally: I remember appreciating some discussion on the kinds of thought experiments that are useful, when, etc. I can’t find it quickly, but possible starting points could be this LW post, Least Convenient Possible World, maybe this post from Richard, and stuff about fictional evidence.
  Writing quickly based on a skim, sorry for lack of clarity/misinterpretations!
  1. ^
    My view is roughly something like:
    at least in the most obviously analogous situations, it’s very rare that we can properly tell the difference between 1.5% and 0.15% (and so the premise is somewhat absurd)
- Lukas_Gloor 12 Dec 2024 16:02 UTC
  4 points
  2 ∶ 2
  Parent
  My intuitive reaction to this is “Way to screw up a survey.”
  Considering that three people agree-voted your post, I realize I should probably come away with this with a very different takeaway, more like “oops, survey designers need to put in extra effort if they want to get accurate results, and I would’ve totally fallen for this pitfall myself.”
  Still, I struggle with understanding your and the OP’s point of view. My reaction to the original post was something like:
  Why would this matter? If the estimate could be off by 1 percentage point, it could be down to 0.5% or up to 2.5%, which is still 1.5% in expectation. Also, if this question’s intention were about the likelihood of EA orgs being biased, surely they would’ve asked much more directly about how much respondees trust an estimate of some example EA org.
  We seem to disagree on use of thought experiments. The OP writes:
  When designing thought experiments, keep them as realistic as possible, so that they elicit better answers. This reduces misunderstandings, pitfalls, and potentially compounding errors. It produces better communication overall.
  I don’t think this is necessary and I could even see it backfiring. If someone goes out of their way to make a thought experiment particularly realistic, maybe respondees might get the impression that it is asking about a real-world situation where they are invited to bring in all kinds of potentially confounding considerations. But that would defeat the point of the thought experiment (e.g., people might answer based on how much they trust the modesty of EA orgs, as opposed to giving you their personal tolerance of risk of the feeling of having had no effect/wasted money in hindsight). The way I see it, the whole point of thought experiments is to get ourselves to think very carefully and cleanly about the principles we find most important. We do this by getting rid of all the potentially confounding variables. See here for a longer explanation of this view.
  Maybe future surveys should have a test to figure out how people understand the use of thought experiments. Then, we could split responses between people who were trying to play the thought experiment game the intended way, and people who were refusing to play (i.e., questioning premises and adding further assumptions).
  *On some occasions, it makes sense to question the applicability of a thought experiment. For instance, in the classical “what if you’re a doctor who has the opportunity to kill a healthy patient during routine chek-up so that we could save the lives of 4 people needing urgent organ transplants,” it makes little sense to just go “all else is equa! Let’s abstract away all other societal considerations or the effect on the doctor’s moral character.”
  So, if I were to write a post on thought experiments today, I would add something about the importance of re-contextualizing lessons learned within a thought experiments to the nuances of real-world situations. In short, I think my formula would be something like, “decouple within thought experiments, but make sure add an extra thinking step from ‘answers inside a thought experiment’ to ‘what can we draw from this in terms of real-life applications.’” (Credit to Kaj Sotala, who once articulated a similar point in probably a better way.)
  - Sjlver 13 Dec 2024 13:06 UTC
    3 points
    0 ∶ 0
    Parent
    I agree that our different reactions come partly from having different intuitions about the boundaries of a thought experiment. Which factors should one include vs exclude when evaluating answers?
    
    For me, I assumed that the question can’t be just about expected values. This seemed too trivial. For simple questions like that, it would be clearer to ask the question directly (e.g., “Are you in favor of high-risk interventions with large expected rewards?”) than to use a thought experiment. So I concluded that the thought experiment probably goes a bit further.
    
    If it goes further, there are many factors that might come into play:
    
    How certain are we of the numbers?
    Are there any negative effects if the intervention fails? These could be direct negative outcomes, but also indirect ones like difficulty to raise funds in the future, reputation loss...
    Are we allocating a small part of a budget, or our total money? Is this a repeated decision or a one-off?
    
    I had no good answers, and no good guesses about the question’s intent. Maybe this is clearer for you, given that you mention “the way EA culture has handled thought experiments thus far” in a comment below. I, for one, decided to skip the question :/
  - David T 12 Dec 2024 23:35 UTC
    1 point
    0 ∶ 0
    Parent
    Feels like taking into account the likelihood that the “1.5% probability of 100,000 DALYs averted” estimate is a credence based on some marginally-relevant base rate^[1] that might have been chosen with a significant bias towards optimism is very much in keeping with the spirit of the question (which presumably is about gauging attitudes towards uncertainty, not testing basic EV calculation skills)^[2].
    A very low percentage chance of averting a lot of DALYs feels a lot more like “1.5% of clinical trials of therapies for X succeeded; this untested idea might also have a 1.5% chance” optimism attached to a proposal offering little reason to believe it’s above average rather than an estimate based on somewhat robust statistics (we inferred that 1.5% of people who receive this drug will be cured from the 1.5% of people who had that outcome in trials). So it seems quite reasonable to assume that the 1.5% chance of a positive binary outcome estimate might be biased upwards. Even more so in the context of “we acknowledge this is a long shot and high-certainty solutions to other pressing problems exist, but if the chance of this making an impact was as high as 0.0x%...” style fundraising appeals to EAs’ determination to avoid scope insensitivity.
    ^
    either that or someone’s been remarkably precise in their subjective estimates or collected some unusual type of empirical data. I certainly can’t imagine reaching the conclusion an option has exactly 1.5% chance of averting 100k DALYs myself
    ^
    if you want to show off you understand EV and risk estimation you’d answer (C) “here’s how I’d construct my portfolio” anyway :-)
    - Lukas_Gloor 13 Dec 2024 1:45 UTC
      2 points
      0 ∶ 0
      Parent
      If we’re considering realistic scenarios instead of staying with the spirit of the thought experiment (which I think we should not, partly precisely because it introduces lots of possible ambiguities in how people interpret the question, and partly because this probably isn’t what the surveyors intended, given the way EA culture has handled thought experiments thus far – see for instance the links in Lizka’s answer, or the way EA draws heavily from analytic philosophy, where straightforwardly engaging with unrealistic thought experiments is a standard component of the toolkit), then I agree that an advertized 1.5% chance of having a huge impact could be more likely upwards-biased than the other way around. (But it depends on who’s doing the estimate – some people are actually well-calibrated or prone to be extra modest.)
      [...] is very much in keeping with the spirit of the question (which presumably is about gauging attitudes towards uncertainty, not testing basic EV calculation skills
      (1) what you described seems to me best characterized as being about trust. Trust in other’s risk estimates. That would be separate from attitudes about uncertainty (and if that’s what the surveyors wanted to elicit, they’d probably have asked the question very differently).
      (Or maybe what you’re thinking about could be someone having radical doubts about the entire epistemology behind “low probabilities”? I’m picturing a position that goes something like, “it’s philosophically impossible to reason sanely about low probabilities; besides, when we make mistakes, we’ll almost always overestimate rather than underestimate our ability to have effects on the world.” Maybe that’s what you think people are thinking – but as an absolute, this would seem weirdly detailed and radical to me, and I feel like there’s a prudential wager against believing that our reasoning is doomed from the start in a way that would prohibit everyone from pursuing ambitious plans.)
      (2) What I meant wasn’t about basic EV calculation skills (obviously) – I didn’t mean to suggest that just because the EV of the low-probability intervention is greater than the EV of the certain intervention, it’s a no-brainer that it should be taken. I was just saying that the OP’s point about probabilities maybe being off by one percentage point, by itself, without some allegation of systematic bias in the measurement, doesn’t change the nature of the question. There’s still the further question of whether we want to bring in other considerations besides EV. (I think “attitudes towards uncertainty” fits well here as a title, but again, I would reserve it for the thing I’m describing, which is clearly different from “do you think other people/orgs within EA are going to be optimistically biased?.”)
      (Note that it’s one question whether people would go by EV for cases that are well within the bounds of numbers of people that exist currently on earth. I think it becomes a separate question when you go further to extremes, like whether people would continue gambling in the St Petersburg paradox or how they relate to claims about vastly larger realms than anything we understand to be in current physics, the way Pascal’s mugging postulates.)
      Finally, I realize that maybe the other people here in the thread have so little trust in the survey designers that they’re worried that, if they answer with the low-probability, higher-EV option, the survey designers will write takeaways like “more EAs are in favor of donating to speculative AI risk interventions.” I agree that, if you think survey designers will make too strong of an update from your answers to a thought experiment, you should point out all the ways that you’re not automatically endorsing their preferred option. But I feel like the EA survey already has lots of practical questions along the lines of “Where do you actually donate to?” So, it feels unlikely that this question is trying to trick respondees or that the survey designers will just generally draw takeaways from this that aren’t warranted?
      - Will Howard🔹 13 Dec 2024 9:42 UTC
        4 points
        0 ∶ 0
        Parent
        I realize that maybe the other people here in the thread have so little trust in the survey designers that they’re worried that, if they answer with the low-probability, higher-EV option, the survey designers will write takeaways like “more EAs are in favor of donating to speculative AI risk.”
        I’m one of the people who agreed with @titotal’s comment, and it was because of something like this.
        It’s not that I’m worried per se that the survey designers will write a takeaway that puts a spin on this question (last time they just reported it neutrally). It’s more that I expect this question^[1] to be taken by other orgs/people as a proxy metric for the EA community’s support for hits-based interventions. And because of the practicalities of how information is acted on the subtlety of the wording of the question might be lost in the process (e.g. in an organisation someone might raise the issue at some point, but it would eventually end up as a number in a spreadsheet or BOTEC, and there is no principled way to adjust for the issue that titotal describes).
        ^
        And one other about supporting low-probability/high-impact interventions
        Lukas_Gloor 13 Dec 2024 10:20 UTC
        2 points
        0 ∶ 0
        Parent
        That makes sense; I understand that concern.
        
        I wonder if, next time, the survey makers could write something to reassure us that they’re not going to be using any results out of context or with an unwarranted spin (esp. in cases like the one here, where the question is related to a big ‘divide’ within EA, but worded as an abstract thought experiment.)
      - David T 13 Dec 2024 23:36 UTC
        3 points
        0 ∶ 0
        Parent
        Thanks for the thoughtful response.
        On (1) I’m not really sure the uncertainty and the trust in the estimate are separable. A probability estimate of a nonrecurring event^[1] fundamentally is a label someone^[2] applies to how confident they are something will happen. A corollary of this is that you should probably take into account how probability estimates could have actually been reached, your trust in that reasoning and the likelihood of bias when deciding how to act. ^[3]
        On (2) I agree with your comments about the OP’s point; if the probabilities are +/-1 percentage point with error symmetrically distributed they’re still on average 1.5%^[4], though in some circumstances introducing error bars might affect how you handle risk. But as I’ve said, I don’t think the distribution of errors looks like this when it comes to assessing whether long shots are worth pursuing or not (not even under the assumption of good faith). I’d be pretty worried if hits based grant-makers didn’t, frankly, and this question puts me in their shoes.
        Your point about analytic philosophy often expecting literal answers to slightly weird hypotheticals is a good one. But EA isn’t just analytic philosophy and St Petersburg Paradoxes, it’s also people literally coming up with best guesses of probabilities of things they think might work and multiplying them (and a whole subculture based on that, and guesstimating just how impactful “crazy train” long shot ideas they’re curious about might be). So I think it’s pretty reasonable to treat it not as a slightly daft hypothetical where a 1.5% probability is an empirical reality,^[5] but as a real world decision grant award scenario where the “1.5% probability” is a suspiciously precise credence, and you’ve got to decide whether to trust it enough to fund it over something that definitely works. In that situation, I think I’m discounting the estimated chance of success of the long shot by more than 50%.
        FWIW I don’t take the question as evidence the survey designers are biased in any way
        ^
        “this will either avert 100,000 DALYs or have no effect” doesn’t feel like a proposition based on well-evidenced statistical regularities...
        ^
        not me. Or at least a “1.5%” chance of working for thousands of people and implicitly a 98.5% chance of having no effect on anyone certainly doesn’t feel like the sort of degree of precision I’d estimate to...
        ^
        Whilst it’s the unintended consequences of how the question was framed, this example feels particularly fishy. We’re asked to contemplate trading off something that certainly will work against something potentially higher yielding that is highly unlikely to work, and yet the thing that is highly unlikely to work turns out to have the higher EV because someone has speculated on its likelihood to a very high degree of precision, and those extra 5 thousandths made all the difference. What’s the chance the latter estimate is completely bogus or finessed to favour the latter option? I’d say in real world scenarios (and certainly not just EA scenarios) it’s quite a bit more than 5 in 1000....
        ^
        that one’s a math test too ;-)
        ^
        maybe a universe where physics is a god with an RNG...
        Lukas_Gloor 14 Dec 2024 15:17 UTC
        2 points
        0 ∶ 0
        Parent
        Thanks for the reply, and sorry for the wall of text I’m posting now (no need to reply further, this is probably too much text for this sort of discussion)...
        I agree that uncertainty is in someone’s mind rather than out there in the world. Still, granting the accuracy of probability estimates feels no different from granting the accuracy of factual assumptions. Say I was interested in eliciting people’s welfare tradeoffs between chicken sentience and cow sentience in the context of eating meat (how that translates into suffering caused per calorie of meat). Even if we lived in a world where false-labelling of meat was super common (such that, say, when you buy things labelled as ‘cow’, you might half the time get tuna, and when you buy chicken, you might half the time get ostrich), if I’m asking specifically for people’s estimates on the moral disvalue from chicken calories vs cow calories, it would be strange if survey respondees factored in information about tunas and ostriches. Surely, if I was also interested in how people thought about calories from tunas and ostriches, I’d be asking about those animals too!
        Also, circumstances about the labelling of meat products can change over time, so that previously elicited estimates on “chicken/cow-labelled things” would now be off. Survey results will be more timeless if we don’t contaminate straightforward thought experiments with confounding empirical considerations that weren’t part of the question.
        A respondee might mention Kant and how all our knowledge about the world is indirect, how there’s trust involved in taking assumptions for granted. That’s accurate, but let’s just take them for granted anyway and move on?
        On whether “1.5%” is too precise of an estimate for contexts where we don’t have extensive data: If we grant that thought experiments can be arbitrarily outlandish, then it doesn’t really matter.
        Still, I could imagine that you’d change your mind about never using these estimates if you thought more about situations where they might become relevant. For instance, I used estimates in that area (roughly around 1.5% chance of something happening) several times within the last two years:
        My wife developed lupus a few years ago, which is the illness that often makes it onto the whiteboard in the show Dr House because it can throw up symptoms that mimic tons of other diseases, sometimes serious ones. We had a bunch of health scares where we were thinking “this is most likely just some weird lupus-related symptom that isn’t actually dangerous, but it also resembles that other thing (which is also a common secondary complication from lupus or its medications), which would be a true emergency. In these situations, should we go to the ER for a check-up or not? With a 4-5h average A&E waiting time and the chance to catch viral illnesses while there (which are extra bad when you already have lupus), it probably doesn’t make sense to go in if we think the chance of a true emergency is only <0.5%. However, at 2% or higher, we’d for sure want to go in. (In between those two, we’d probably continue to feel stressed and undecided, and maybe go in primarily for peace of mind, lol). Narrowing things down from “most likely it’s nothing, but some small chance that it’s bad!” to either “I’m confident this is <0.5%” or “I’m confident this is at least 2%” is not easy, but it worked in some instances. This suggests some usefulness (as a matter of practical necessity of making medical decisions in a context of long A&E waiting times) to making decisions based on a fairly narrowed down low-probability estimate. Sure, the process I described is still a bit more fuzzy than just pulling a 1.5% point estimate from somewhere, but I feel like it approaches similar levels of precision needed to narrow things down that much, and I think many other people would have similar decision thresholds in a situation like ours.
        Admittedly, medical contexts are better studied than charity contexts, and especially influencing-the-distant-future charity contexts. So, it makes sense if you’re especially skeptical of that level of precision in charitable contexts. (And I indeed agree with this; I’m not defending that level of precision in practice for EA charities!) Still, like habryka pointed out in another comment, I don’t think there’s a red line were fundamental changes happen as probabilities get lower and lower. The world isn’t inherently frequentist, but we can often find plausibly-relevant base rates. Admittedly, there’s always some subjectivity, some art, in choosing relevant base rates, assessing additional risk factors, making judgment calls about “how much is this symptom a match?.” But if you find the right context for it (meaning: a context where you’re justifiably anchoring to some very low-probability base rate), you can get well below the 0.5% level for practically-relevant decisions (and maybe make proportional upwards or downwards adjustments from there). For these reasons, it doesn’t strike me as totally outlandish that some group will at some point come up with ranged very-low-probability estimate of averting some risk (like asteroid risk or whatever), while being well-calibrated. I’m not saying I have a concrete example in mind, but I wouldn’t rule it out.
        Sjlver 16 Dec 2024 9:12 UTC
        3 points
        1 ∶ 0
        Parent
        OP here :) Thanks for the interesting discussion that the two of you have had!
        Lukas_Gloor, I think we agree on most points. Your example of estimating a low probability of medical emergency is great! And I reckon that you are communicating appropriately about it. You’re probably telling your doctor something like “we came because we couldn’t rule out complication X” and not “we came because X has a probability of 2%” ;-)
        You also seem to be well aware of the uncertainty. Your situation does not feel like one where you went to the ER 50 times, were sent home 49 times, and have from this developed a good calibration. It looks more like a situation where you know about danger signs which could be caused by emergencies, and have some rules like “if we see A and B and not C, we need to go to the ER”.^[1]
        Your situation and my post both involve low probabilities in high-stakes situations. That said, the goal of my post is to remind people that this type of probability is often uncertain, and that they should communicate this with the appropriate humility.
        That’s how I would think about it, at least… it might well be that you’re more rational than I, and use probabilities more explicitly. ↩︎