In various communities (including the EA and rationalist communities), it’s common to make use of explicit, numerical probabilities.[1]
At an extreme end, this may be done in an attempt to calculate and then do whatever seems to maximise expected utility.
It could also involve attempts to create explicit, probabilistic models (EPMs), perhaps involving expected value calculations, and use this as an input into decision-making. (So the EPM may not necessarily be the only input, or necessarily be intended to include everything that’s important.) Examples of this include the cost-effectiveness analyses created by GiveWell or ALLFED.
Most simply, a person may generate just a single explicit probability (EP; e.g., “I have a 20% chance of getting this job”), and then use that as an input into decision-making.
(For simplicity, in this post I’ll often say “using EPs” as a catchall term for using a single EP, using EPMs, or maximising expected utility. I’ll also often say “alternative approaches” to refer to more qualitative or intuitive methods, ranging from simply “trusting your gut” to extensive deliberations where you don’t explicitly quantify probabilities.)
Many arguments for the value of using EPs have been covered elsewhere (and won’t be covered here). I find many of these quite compelling, and believe that one of the major things the EA and rationalist communities get right is relying on EPs more than the general public does.
But use of EPs is also often criticised. And I (along with most EAs and rationalists, I suspect) don’t use EPs for most everyday decisions, at least, and I think that that’s probably often a good thing.
So the first aim of this post is to explore some potential downsides of using EPs (compared to alternative approaches) that people have proposed. I’ll focus not on the case of ideal rational agents, but of actual humans, in practice, with our biases and limited computational abilities. Specifically, I discuss the following (non-exhaustive) list of potential downsides:
Time and effort costs
Excluding some of one’s knowledge (which could’ve been leveraged by alternative approaches)
Causing overconfidence
Underestimating the value of information
The optimizer’s curse
Anchoring (to the EP, or to the EPM’s output)
Causing reputational issues
As I’ll discuss, these downsides will not always apply when using EPs, and many will also sometimes apply when using alternative approaches. And when these downsides do apply to uses of EPs, they may often be outweighed by the benefits of using EPs. So this post is not meant to definitively determine the sorts of situations one should vs shouldn’t use EPs in. But I do think these downsides are often at least important factors to consider.
Sometimes people go further, linking discussion of these potential downsides of using EPs as humans, in practice, to claims that there’s an absolute, binary distinction between “risk” and “(Knightian) uncertainty”, or between situations in which we “have” vs “don’t have” probabilities, or something like that. Here’s one statement of this sort of view (from Dominic Roser, who disagrees with it):
According to [one] view, certainty has two opposites: risk and uncertainty. In the case of risk, we lack certainty but we have probabilities. In the case of uncertainty, we do not even have probabilities. [...] According to a popular view, then, how we ought to make policy decisions depends crucially on whether we have probabilities.
I’ve previously argued that there’s no absolute, binary risk-uncertainty distinction, and that believing that there is such a distinction can lead to using bad decision-making procedures. I’ve also argued that we can always assign probabilities (or at least use something like an uninformative prior). But I didn’t address the idea that, in practice, it might be valuable for humans to sometimes act as if there’s a binary risk-uncertainty distinction, or as if it’s impossible to assign probabilities.
Thus, the second aim of this post is to explore whether that’s a good idea. I argue that it is not, with a potential caveat related to reputational issues.
So each section will:
outline a potential downside of using EPs
discuss whether that downside really applies more to using EPs than to alternative approaches
explain why I believe this downside doesn’t suggest one should even act as if there’s a binary risk-uncertainty distinction
Epistemic status: This is basically meant as a collection and analysis of existing ideas, not as anything brand new. I’m not an expert on the topics covered.
Time and effort costs
The most obvious downside of using EPs (or at least EPMs) is that it may often take a lot of time and energy to use them well enough to get better results than one would get from alternative approaches (e.g., trusting your gut).
But what if I’m just deciding what headphones to buy? Is it worth it for me to spend a few hours constructing a detailed model of all the factors relevant to the question, and then finding (or estimating) values for each of those factors, for each of a broad range of different headphones?
Here, the stakes involved are quite low, and it’s also fairly unlikely that I’ll use the EPM again. (In contrast, GiveWell continues to use its models, with modifications, year after year, making the initial investment in constructing the models more worthwhile.) It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things).[2][3]
Does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities?
Not at all. In fact, I’d argue that the headphones example is actually one where, if I did spend a few hours doing research, I could come up with probabilities that are much more“trustworthy” than many of the probabilities involved in situations like GiveWell’s (when it is useful for people to construct EPMs). So I think the issue of time and effort costs may be quite separate even from the question of how trustworthy our probabilities are, let alone the idea that there might be a binary risk-uncertainty distinction.
Excluding some of one’s knowledge
Let’s say that I’m an experienced firefighter in a burning building (untrue on both counts, but go with me on this). I want to know the odds that the floor I’m on will collapse. I could (quite arbitrarily) construct the following EPM:
Probability of collapse = How hot the building is (on a scale from 0-1) * How non-sturdily the building seems to have been built (on a scale from 0-1)
I could also (quite arbitrarily) decide on values of 0.6 and 0.5, respectively. My model would then tell me that the probability of the floor collapsing is 0.3.
It seems like that could be done quite quickly, and while doing other things. So it seems that the time and effort costs involved in using this EPM are probably very similar to the costs involved in using an alternative approach (e.g., trusting my gut). Does this mean constructing an EPM here is a wise choice?
Intuitive expertise
There’s empirical evidence that the answer is “No” for examples like this; i.e., examples which meet the “conditions for intuitive expertise”:
an environment in which there’s a stable relationship between identifiable cues and later events or outcomes of actions
“adequate opportunities for learning the environment (prolonged practice and feedback that is both rapid and unequivocal)” (Kahneman & Klein)
In such situations, our intuitions may quite reliably predict later events. Furthermore, we may not consciously, explicitly know the factors that informed these intuitions. As Kahneman & Klein write: “Skilled judges are often unaware of the cues that guide them”.
Klein describes the true story that inspired my example, in which a team of firefighters were dealing with what they thought was a typical kitchen fire, when the lieutenant:
became tremendously uneasy — so uneasy that he ordered his entire crew to vacate the building. Just as they were leaving, the living room floor collapsed. If they had stood there another minute, they would have dropped into the fire below. Unbeknownst to the firefighters, the house had a basement and that’s where the fire was burning, right under the living room.
I had a chance to interview the lieutenant about this incident, and asked him why he gave the order to evacuate. The only reason he could think of was that he had extrasensory perception. He firmly believed he had ESP.
During the interview I asked him what he was aware of. He mentioned that it was very hot in the living room, much hotter than he expected given that he thought the fire was in the kitchen next door. I pressed him further and he recalled that, not only was it hotter than he expected, it was also quieter than he expected. Fires are usually noisy but this fire wasn’t. By the end of the interview he understood why it was so quiet: because the fire was in the basement, and the floor was muffling the sounds.
It seems that the lieutenant wasn’t consciously aware of the importance of the quietness of the fire. As such, if he’d constructed and relied on an EPM, he wouldn’t have included the quietness as a factor, and thus may not have pulled his crew out in time. But through a great deal of expertise, with reliable feedback from the environment, he was intuitively aware of the importance of that factor.
So when the conditions for intuitive expertise are met, methods other than EPM may reliably outperform EPM, even ignoring costs in time and energy, because they allow us to more fully leverage our knowledge.[4]
But, again, does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities? Again, not at all. In fact, the conditions for intuitive expertise would actually be met precisely when we could have relatively trustworthy probabilities—there have to be fairly stable patterns in the environment, and opportunities to learn these patterns. The issue is simply that, in practice, we often haven’t learned these probabilities on a conscious, explicit level.
On the flipside, using EPMs may often beat alternative methods when the conditions for intuitive expertise aren’t met, and this may be particularly likely when we face relatively “untrustworthy” probabilities.
Relatedly, it’s worth noting that us feeling, in some situation, more confident in our intuitive assessment than in an EPM doesn’t necessarily mean our intuitive assessment is actually more reliable in that situation. As Kahneman & Klein note:
True experts, it is said, know when they don’t know. However, nonexperts (whether or not they think they are) certainly do not know when they don’t know. Subjective confidence is therefore an unreliable indication of the validity of intuitive judgments and decisions.
[...] Although true skill cannot develop in irregular or unpredictable environments, individuals will some times make judgments and decisions that are successful by chance. These “lucky” individuals will be susceptible to an illusion of skill and to overconfidence (Arkes, 2001). The financial industry is a rich source of examples.
Less measurable or legible things
An additional argument is that using EPs may make it harder to leverage knowledge about things that are less measurable and/or legible (with legibility seemingtoapproximately mean susceptibility to being predicted, understood, and monitored).
For example, let’s say Alice is deciding whether to donate to the Centre for Pesticide Suicide Prevention (CPSP), which focuses on advocating for policy changes, or to GiveDirectly, which simply gives unconditional cash transfers to people living in extreme poverty. She may decide CPSP’s impacts are “too hard to measure”, and “just can’t be estimated quantitatively”. Thus, if she uses EPs, she might neglect to even seriously consider CPSP. But if she considered in-depth, qualitative arguments, she might decide that CPSP seems a better bet.
I think it’s very plausible that this is a sort of situation where, in order to leverage as much of one’s knowledge as possible, it’s wise to use qualitative approaches. But we can still use EPs in these cases—we can just give our best guesses about the value of variables we can’t measure, and about what variables to consider and how to structure our model. (And in fact, GiveWell did construct a quantitative cost-effectiveness model for CPSP.) And it’s not obvious to me which of these approaches would typically make it easier for us to leverage our knowledge in these less measurable and legible cases.
Finally, what implications might this issue have for the idea of a binary risk-uncertainty distinction? I disagree with Alice’s view that CPSP’s impacts “just can’t be estimated quantitatively”. The reality is simply that CPSP’s impacts are very hard to estimate, and that the probabilities we’d arrive at if we estimated them would be relatively untrustworthy. In contrast, our estimates of GiveDirectly’s impact would be more trustworthy. That’s all we need to say to make sense of the idea that this is (perhaps) a situation in which we should use approaches other than EPs; I don’t think we need to even act as if there’s a binary risk-uncertainty distinction.
Causing overconfidence; underestimating the value of information
Two common critiques of using EPs are that:
Using EPs tends to make one overconfident about their estimates (and their models’ outputs); that is, it makes them underestimate how uncertain these estimates or outputs are.[5]
Therefore, using EPs tends to make one underestimate the valueof (additional) information (VoI; here “information” can be seen as including just doing more thinking, without actually gathering more empirical data)
These critiques are closely related, so I’ll discuss both in this section.
An example of the first of those critiques comes from Chris Smith. Smith discusses one particular method for dealing with “poorly understood uncertainty”, and then writes:
Calling [that method] “making a Bayesian adjustment” suggests that we have something like a general, mathematical method for critical thinking. We don’t.
Similarly, taking our hunches about the plausibility of scenarios we have a very limited understanding of and treating those hunches like well-grounded probabilities can lead us to believe we have a well-understood method for making good decisions related to those scenarios. We don’t.
Many people have unwarranted confidence in approaches that appear math-heavy or scientific. In my experience, effective altruists are not immune to that bias.
An example of (I think) both of those critiques together comes from Daniela Waldhorn:
The existing gaps in this field of research entail that we face significant constraints when assessing the probability that an invertebrate taxon is conscious. In my opinion, the current state of knowledge is not mature enough for any informative numerical estimation of consciousness among invertebrates. Furthermore, there is a risk that such estimates lead to an oversimplification of the problem and an underestimation of the need for further investigation.
I’m somewhat sympathetic to these arguments. But I think it’s very unclear whether arguments about overconfidence and VoI should push us away from rather than towards using EPs; it really seems to me like it could go either way. This is for two reasons.
Firstly, we can clearly represent low confidence in our EPs, by:
using a probability distribution, rather than just a point estimate
giving that distribution (arbitrarily) wide confidence intervals
choosing the shape of that distribution to further represent the magnitude (and nature) of our uncertainty. (See this comment for diagrams.)
conducting sensitivity analyses, which show the extent to which plausible variations in our model’s inputs can affect our model’s outputs
visually representing these probability distributions and sensitivity analyses (which may make our uncertainty more striking and harder to ignore)
Secondly, if we do use EPs (and appropriately wide confidence intervals), this unlocks ways of moving beyond just the general idea that further information would be valuable; it lets us also:
explicitly calculate how valuable more info seems likely to be
identify which uncertainties it’d be most valuable to gather more info on
In fact, there’s an entire body of work on VoI analysis, and a necessary prerequisite for conducting such an analysis is having an EPM.
It does seem plausible to me that, even if we do all of those things, we or others will primarily focus on our (perhaps implicit) point estimate, and overestimate its trustworthiness, just due to human psychology (or EA/rationalist psychology). But that doesn’t seem obvious. Nor does it seem obvious that the overconfidence that may result from using EPs will tend to be greater than the overconfidence that may result from other approaches (like relying on all-things-considered intuitions; recall Kahneman & Klein’s comments from earlier).
And in any case, this whole discussion was easy to have just in terms of very untrustworthy or low-confidence probabilities—there was no need to invoke the idea of a binary risk-uncertainty distinction, or the idea that there are some matters about which we can simply can’t possibly estimate any probabilities.[6]
The optimizer’s curse
Smith gives a “rough sketch” of the optimizer’s curse:
Optimizers start by calculating the expected value of different activities.
Estimates of expected value involve uncertainty.
Sometimes expected value is overestimated, sometimes expected value is underestimated.
Optimizers aim to engage in activities with the highest expected values.
Result: Optimizers tend to select activities with overestimated expected value.
[...] The optimizer’s curse occurs even in scenarios where estimates of expected value are unbiased (roughly, where any given estimate is as likely to be too optimistic as it is to be too pessimistic).
[...] As uncertainty increases, the degree to which the cost-effectiveness of the optimal-looking program is overstated grows wildly.
The optimizer’s curse is likely to be a pervasive problem and is worth taking seriously.
In many situations, the curse will just indicate that we’re probably overestimating how much better than the alternatives the option we estimate is best is—it won’t indicate that we should actually change what option we pick.
But the curse can indicate that we should pick an option other than that which we estimate is best, if (a) we have reason to believe that our estimate of the value of the best option is more uncertain than our estimate of the value of the other options, and (b) we don’t model that information.
I’ve deliberately kept the above points brief (again, see the above links for further explanations and justifications). This is because those points, while clearly relevant to how to use EPs, are only relevant to when to use EPs (vs alternative approaches) if the optimizer’s curse is a larger problem when using EPs than when using alternative approaches. And I don’t think it necessarily is. For example, Smith notes:
The optimizer’s curse can show up even in situations where effective altruists’ prioritization decisions don’t involve formal models or explicit estimates of expected value. Someone informally assessing philanthropic opportunities in a linear manner might have a thought like:
“Thing X seems like an awfully big issue. Funding Group A would probably cost only a little bit of money and have a small chance leading to a solution for Thing X. Accordingly, I feel decent about the expected cost-effectiveness of funding Group A.
Let me compare that to how I feel about some other funding opportunities…”
Although the thinking is informal, there’s uncertainty, potential for bias, and an optimization-like process. [quote marks added, because I couldn’t double-indent]
This makes a lot of sense to me. But Smith also adds:
Informal thinking isn’t always this linear. If the informal thinking considers an opportunity from multiple perspectives, draws on intuitions, etc., the risk of [overestimating the cost-effectiveness of the optimal-looking program] may be reduced.
I’m less sure what he means by this. I’m guessing he simply means that using multiple, different perspectives means that the various errors and uncertainties are likely to “cancel out” to some extent, reducing the effective uncertainty, and thus reducing the amount by which one is likely to overestimate the value of the best-seeming thing. But if so, it seems that this partial protection could also be achieved by using multiple, different EPMs, making different assumptions in them, getting multiple people to estimate values for inputs, etc.
So ultimately, I think that the problem Smith raises is significant, but I’m quite unsure if it’s a downside of using EPs instead of alternative approaches.
I also don’t think that the optimizer’s curse suggests it’d be valuable to act as if there’s a binary risk-uncertainty distinction. It is clear that the curse gets worse as uncertainty increases (i.e., when one’s probabilities are less trustworthy), but it does so in a gradual, continuous manner. So it seems to me that, again, we’re best off speaking just in terms of more and less trustworthy probabilities, rather than imagining that totally different behaviours are warranted if we’re facing “risk” rather than “Knightian uncertainty”.[7]
Anchoring
Anchoring or focalism is a cognitive bias where an individual depends too heavily on an initial piece of information offered (considered to be the “anchor”) when making decisions. (Wikipedia)
One critique of using EPs, or at least making them public, seems to effectively be that people may become anchored on the EPs given. For example, Jason Schukraft writes:
I contend that publishing specific estimates of invertebrate sentience (e.g., assigning each taxon a ‘sentience score’) would be, at this stage of investigation, at best unhelpful and probably actively counterproductive. [...]
Of course, having studied the topic for some time now, I expect that my estimates would be better than the estimates of the average member of the EA community. If that’s true, then it’s tempting to conclude that making my estimates public would improve the community’s overall position on this topic. However, I think there are at least three reasons to be skeptical of this view.
[One reason is that] It’s difficult to present explicit estimates of invertebrate sentience in a way in which those estimates don’t steal the show. It’s hard to imagine a third party summarizing our work (either to herself or to others) without mentioning lines like ‘Rethink Priorities think there is an X% chance ants have the capacity for valenced experience.’ There are very few serious estimates of invertebrate sentience available, so members of the community might really fasten onto ours.
I think that this critique has substantial merit, but that this is most clear in relation to making EPs public, rather than just in relation to using EPs oneself. As Schukraft writes:
To be clear: I don’t believe it’s a bad idea to think about probabilities of sentience. In fact, anyone directly working on invertebrate sentience ought to be periodically recording their own estimates for various groups of animals so that they can see how their credences change over time.[8]
I expect that one can somewhat mitigate this issue by providing various strong caveats when EPs are quite untrustworthy. And I think somewhat similar issues can also occur when not using EPs (e.g., if just saying something is “very likely”, or giving a general impression of disapproval of what a certain organisation is doing).
But I doubt that caveats would entirely remove the issue.[9] And I’d guess that the anchoring would be worse if using EPs than if not.
Finally, anchoring does seem a more important downside when one’s probabilities are less trustworthy, because then the odds people will be anchored to a bad estimate are higher. But again, it seems easy, and best, to think about this in terms of more and less trustworthy probabilities, rather than in terms of a binary risk-uncertainty distinction.
Reputational issues
Finally, in the same post, Schukraft notes another issue with using EPs:
Sentience scores might reduce our credibility with potential collaborators
[....] science, especially peer-reviewed science, is an inherently conservative enterprise. Scientists simply don’t publish things like probabilities of sentience. For a long time, even the topic of nonhuman sentience was taboo because it was seen as unverifiable. Without a clear, empirically-validated methodology behind them, such estimates would probably not make it into a reputable journal. Intuitions, even intuitions conditioned by careful reflection, are rarely admitted in the court of scientific opinion.
Rethink Priorities is a new, non-academic organization, and it is part of a movement that is—frankly—sort of weird. To collaborate with scientists, we first need to convince them that we are a legitimate research outfit. I don’t want to make that task more challenging by publishing estimates that introduce the perception that our research isn’t rigorous. And I don’t think that perception would be entirely unwarranted. Whenever I read a post and encounter an overly precise prediction for a complex event (e.g., ‘there is a 16% chance Latin America will dominate the plant-based seafood market by 2025’), I come away with the impression that the author doesn’t sufficiently appreciate the complexity of the forces at play. There may be no single subject more complicated than consciousness. I don’t want to reduce that complexity to a number.
Some of my thoughts on this potential downside mirror those I made with regards to anchoring:
This does seem like it would often be a real downside, and worth taking seriously.
This seems most clearly a downside of making EPs public, rather than of using EPs in one’s own thinking (or within a specific organisation or community).
This downside does seem more prominent the less trustworthy one’s probabilities would be.
But unlike all the other downsides I’ve covered, this one does seem like it might warrant acting (in public) as if there is a binary risk-uncertainty distinction. This is because the people one wants to maintain a good reputation with may think there is such a distinction (or effectively think as if that’s true). But it should be noted that this only requires publicly acting as if there’s such a distinction; you don’t have to think as if there’s such a distinction.
One last thing to note is that it also seems possible that similar reputational issues could result from not using EPs. For example, if one relies on qualitative or intuitive approaches, one’s thinking may be seen as “hand-wavey”, “soft”, and/or imprecise by people from a more “hard science” background.
Conclusions
There are some real downsides that can occur in practice when actual humans use EPs (or EPMs, or maximising expected utility)
But some downsides that have been suggested (particularly causing overconfidence and understating the VoI) might actually be more pronounced for approaches other than using EPs
Some downsides (particularly relating to the optimizer’s curse, anchoring, and reputational issues) may be more pronounced when the probabilities one has (or could have) are less trustworthy
Other downsides (particularly excluding one’s intuitive knowledge) may be more pronounced when the probabilities one has (or could have) are more trustworthy
Only one downside (reputational issues) seems to provide any argument for even acting as if there’s a binary risk-uncertainty distinction
And even in that case the argument is quite unclear, and wouldn’t suggest we should use the idea of such a distinction inour own thinking
The above point, combined with arguments I made in an earlier post, makes me believe that we should abandon the concept of the risk-uncertainty distinction in our own thinking (and at least most communication), and that we should think instead in terms of:
a continuum of more to less trustworthy probabilities
the practical upsides and downsides of using EPs, for actual humans.
I’d be interested in people’s thoughts on all of the above; one motivation for writing this post was to see if someone could poke holes in, and thus improve, my thinking.
I should note that this post basically takes as a starting assumption the Bayesian interpretation of probability, “in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief” (Wikipedia). But I think at least a decent amount of what I say would hold for other interpretations of probability (e.g., frequentism).
Of course, I could quickly and easily make an extremelysimplistic EPM, or use just a single EP. But then it’s unclear if that’d do better than similarly quick and easy alternative approaches, for the reasons discussed in the following sections. For a potentially contrasting perspective, see Using a Spreadsheet to Make Good Decisions: Five Examples.
This seems analogous to the idea that utilitarianism itself may often recommend against the action of trying to explicitly calculate which action would be recommended by utilitarianism (given that that’s likely to slow one down massively). Amanda Askell has written a post on that topic, in which she says: “As many utilitarians have pointed out, the act utilitarian claim that you should ‘act such that you maximize the aggregate wellbeing’ is best thought of as a criterion of rightness and not as a decision procedure. In fact, trying to use this criterion as a decision procedure will often fail to maximize the aggregate wellbeing. In such cases, utilitarianism will actually say that agents are forbidden to use the utilitarian criterion when they make decisions.”
Along similar lines, Holden Karnofsky (of GiveWell, at the time) writes: “It’s my view that my brain instinctively processes huge amounts of information, coming from many different reference classes, and arrives at a prior; if I attempt to formalize my prior, counting only what I can name and justify, I can worsen the accuracy a lot relative to going with my gut.”
This is different to the idea that people may tend to overestimate EPs, or overestimate cost-effectiveness, or things like that. That claim is also often made, and is probably worth discussing, but I leave it out of this post. Here I’m focusing instead on the separate possibility of people being overconfident about the accuracy of whatever estimate they’ve arrived at, whether it’s high or low.
Here’s Nate Soares making similar points: “In other words, even if my current credence is 50% I can still expect that in 35 years (after encountering a black swan or two) my credence will be very different. This has the effect of making me act uncertain about my current credence, allowing me to say “my credence for this is 50%” without much confidence. So long as I can’t predict the direction of the update, this is consistent Bayesian reasoning.
As a bounded Bayesian, I have all the behaviors recommended by those advocating Knightian uncertainty. I put high value on increasing my hypothesis space, and I often expect that a hypothesis will come out of left field and throw off my predictions. I’m happy to increase my error bars, and I often expect my credences to vary wildly over time. But I do all of this within a Bayesian framework, with no need for exotic “immeasurable” uncertainty.”
Smith’s own views on this point seem a bit confusing. At one point, he writes: “we don’t need to assume a strict dichotomy separates quantifiable risks from unquantifiable risks. Instead, real-world uncertainty falls on something like a spectrum.” But at various other points, he writes things like “The idea that all uncertainty must be explainable in terms of probability is a wrong-way reduction [i.e., a bad idea; see his post for details]”, and “I don’t think ignorance must cash out as a probability distribution”.
While I think this is a good point, I also think it may sometimes be worth considering the risk that one might anchor oneself to one’sown estimate. This could therefore be a downside of even just generating an EP oneself, not just of making EPs public.
Potential downsides of using explicit probabilities
In various communities (including the EA and rationalist communities), it’s common to make use of explicit, numerical probabilities.[1]
At an extreme end, this may be done in an attempt to calculate and then do whatever seems to maximise expected utility.
It could also involve attempts to create explicit, probabilistic models (EPMs), perhaps involving expected value calculations, and use this as an input into decision-making. (So the EPM may not necessarily be the only input, or necessarily be intended to include everything that’s important.) Examples of this include the cost-effectiveness analyses created by GiveWell or ALLFED.
Most simply, a person may generate just a single explicit probability (EP; e.g., “I have a 20% chance of getting this job”), and then use that as an input into decision-making.
(For simplicity, in this post I’ll often say “using EPs” as a catchall term for using a single EP, using EPMs, or maximising expected utility. I’ll also often say “alternative approaches” to refer to more qualitative or intuitive methods, ranging from simply “trusting your gut” to extensive deliberations where you don’t explicitly quantify probabilities.)
Many arguments for the value of using EPs have been covered elsewhere (and won’t be covered here). I find many of these quite compelling, and believe that one of the major things the EA and rationalist communities get right is relying on EPs more than the general public does.
But use of EPs is also often criticised. And I (along with most EAs and rationalists, I suspect) don’t use EPs for most everyday decisions, at least, and I think that that’s probably often a good thing.
So the first aim of this post is to explore some potential downsides of using EPs (compared to alternative approaches) that people have proposed. I’ll focus not on the case of ideal rational agents, but of actual humans, in practice, with our biases and limited computational abilities. Specifically, I discuss the following (non-exhaustive) list of potential downsides:
Time and effort costs
Excluding some of one’s knowledge (which could’ve been leveraged by alternative approaches)
Causing overconfidence
Underestimating the value of information
The optimizer’s curse
Anchoring (to the EP, or to the EPM’s output)
Causing reputational issues
As I’ll discuss, these downsides will not always apply when using EPs, and many will also sometimes apply when using alternative approaches. And when these downsides do apply to uses of EPs, they may often be outweighed by the benefits of using EPs. So this post is not meant to definitively determine the sorts of situations one should vs shouldn’t use EPs in. But I do think these downsides are often at least important factors to consider.
Sometimes people go further, linking discussion of these potential downsides of using EPs as humans, in practice, to claims that there’s an absolute, binary distinction between “risk” and “(Knightian) uncertainty”, or between situations in which we “have” vs “don’t have” probabilities, or something like that. Here’s one statement of this sort of view (from Dominic Roser, who disagrees with it):
I’ve previously argued that there’s no absolute, binary risk-uncertainty distinction, and that believing that there is such a distinction can lead to using bad decision-making procedures. I’ve also argued that we can always assign probabilities (or at least use something like an uninformative prior). But I didn’t address the idea that, in practice, it might be valuable for humans to sometimes act as if there’s a binary risk-uncertainty distinction, or as if it’s impossible to assign probabilities.
Thus, the second aim of this post is to explore whether that’s a good idea. I argue that it is not, with a potential caveat related to reputational issues.
So each section will:
outline a potential downside of using EPs
discuss whether that downside really applies more to using EPs than to alternative approaches
explain why I believe this downside doesn’t suggest one should even act as if there’s a binary risk-uncertainty distinction
Epistemic status: This is basically meant as a collection and analysis of existing ideas, not as anything brand new. I’m not an expert on the topics covered.
Time and effort costs
The most obvious downside of using EPs (or at least EPMs) is that it may often take a lot of time and energy to use them well enough to get better results than one would get from alternative approaches (e.g., trusting your gut).
For example, GiveWell’s researchers collectively spend “hundreds of hours [...] per year on cost-effectiveness analysis”. I’d argue that that’s worthwhile when the stakes are as high as they are in GiveWell’s case (i.e., determining which charities receive tens of millions of dollars each year).
But what if I’m just deciding what headphones to buy? Is it worth it for me to spend a few hours constructing a detailed model of all the factors relevant to the question, and then finding (or estimating) values for each of those factors, for each of a broad range of different headphones?
Here, the stakes involved are quite low, and it’s also fairly unlikely that I’ll use the EPM again. (In contrast, GiveWell continues to use its models, with modifications, year after year, making the initial investment in constructing the models more worthwhile.) It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things).[2][3]
Does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities?
Not at all. In fact, I’d argue that the headphones example is actually one where, if I did spend a few hours doing research, I could come up with probabilities that are much more “trustworthy” than many of the probabilities involved in situations like GiveWell’s (when it is useful for people to construct EPMs). So I think the issue of time and effort costs may be quite separate even from the question of how trustworthy our probabilities are, let alone the idea that there might be a binary risk-uncertainty distinction.
Excluding some of one’s knowledge
Let’s say that I’m an experienced firefighter in a burning building (untrue on both counts, but go with me on this). I want to know the odds that the floor I’m on will collapse. I could (quite arbitrarily) construct the following EPM:
I could also (quite arbitrarily) decide on values of 0.6 and 0.5, respectively. My model would then tell me that the probability of the floor collapsing is 0.3.
It seems like that could be done quite quickly, and while doing other things. So it seems that the time and effort costs involved in using this EPM are probably very similar to the costs involved in using an alternative approach (e.g., trusting my gut). Does this mean constructing an EPM here is a wise choice?
Intuitive expertise
There’s empirical evidence that the answer is “No” for examples like this; i.e., examples which meet the “conditions for intuitive expertise”:
an environment in which there’s a stable relationship between identifiable cues and later events or outcomes of actions
“adequate opportunities for learning the environment (prolonged practice and feedback that is both rapid and unequivocal)” (Kahneman & Klein)
In such situations, our intuitions may quite reliably predict later events. Furthermore, we may not consciously, explicitly know the factors that informed these intuitions. As Kahneman & Klein write: “Skilled judges are often unaware of the cues that guide them”.
Klein describes the true story that inspired my example, in which a team of firefighters were dealing with what they thought was a typical kitchen fire, when the lieutenant:
It seems that the lieutenant wasn’t consciously aware of the importance of the quietness of the fire. As such, if he’d constructed and relied on an EPM, he wouldn’t have included the quietness as a factor, and thus may not have pulled his crew out in time. But through a great deal of expertise, with reliable feedback from the environment, he was intuitively aware of the importance of that factor.
So when the conditions for intuitive expertise are met, methods other than EPM may reliably outperform EPM, even ignoring costs in time and energy, because they allow us to more fully leverage our knowledge.[4]
But, again, does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities? Again, not at all. In fact, the conditions for intuitive expertise would actually be met precisely when we could have relatively trustworthy probabilities—there have to be fairly stable patterns in the environment, and opportunities to learn these patterns. The issue is simply that, in practice, we often haven’t learned these probabilities on a conscious, explicit level.
On the flipside, using EPMs may often beat alternative methods when the conditions for intuitive expertise aren’t met, and this may be particularly likely when we face relatively “untrustworthy” probabilities.
Relatedly, it’s worth noting that us feeling, in some situation, more confident in our intuitive assessment than in an EPM doesn’t necessarily mean our intuitive assessment is actually more reliable in that situation. As Kahneman & Klein note:
Less measurable or legible things
An additional argument is that using EPs may make it harder to leverage knowledge about things that are less measurable and/or legible (with legibility seeming to approximately mean susceptibility to being predicted, understood, and monitored).
For example, let’s say Alice is deciding whether to donate to the Centre for Pesticide Suicide Prevention (CPSP), which focuses on advocating for policy changes, or to GiveDirectly, which simply gives unconditional cash transfers to people living in extreme poverty. She may decide CPSP’s impacts are “too hard to measure”, and “just can’t be estimated quantitatively”. Thus, if she uses EPs, she might neglect to even seriously consider CPSP. But if she considered in-depth, qualitative arguments, she might decide that CPSP seems a better bet.
I think it’s very plausible that this is a sort of situation where, in order to leverage as much of one’s knowledge as possible, it’s wise to use qualitative approaches. But we can still use EPs in these cases—we can just give our best guesses about the value of variables we can’t measure, and about what variables to consider and how to structure our model. (And in fact, GiveWell did construct a quantitative cost-effectiveness model for CPSP.) And it’s not obvious to me which of these approaches would typically make it easier for us to leverage our knowledge in these less measurable and legible cases.
Finally, what implications might this issue have for the idea of a binary risk-uncertainty distinction? I disagree with Alice’s view that CPSP’s impacts “just can’t be estimated quantitatively”. The reality is simply that CPSP’s impacts are very hard to estimate, and that the probabilities we’d arrive at if we estimated them would be relatively untrustworthy. In contrast, our estimates of GiveDirectly’s impact would be more trustworthy. That’s all we need to say to make sense of the idea that this is (perhaps) a situation in which we should use approaches other than EPs; I don’t think we need to even act as if there’s a binary risk-uncertainty distinction.
Causing overconfidence; underestimating the value of information
Two common critiques of using EPs are that:
Using EPs tends to make one overconfident about their estimates (and their models’ outputs); that is, it makes them underestimate how uncertain these estimates or outputs are.[5]
Therefore, using EPs tends to make one underestimate the value of (additional) information (VoI; here “information” can be seen as including just doing more thinking, without actually gathering more empirical data)
These critiques are closely related, so I’ll discuss both in this section.
An example of the first of those critiques comes from Chris Smith. Smith discusses one particular method for dealing with “poorly understood uncertainty”, and then writes:
An example of (I think) both of those critiques together comes from Daniela Waldhorn:
I’m somewhat sympathetic to these arguments. But I think it’s very unclear whether arguments about overconfidence and VoI should push us away from rather than towards using EPs; it really seems to me like it could go either way. This is for two reasons.
Firstly, we can clearly represent low confidence in our EPs, by:
using a probability distribution, rather than just a point estimate
giving that distribution (arbitrarily) wide confidence intervals
choosing the shape of that distribution to further represent the magnitude (and nature) of our uncertainty. (See this comment for diagrams.)
conducting sensitivity analyses, which show the extent to which plausible variations in our model’s inputs can affect our model’s outputs
visually representing these probability distributions and sensitivity analyses (which may make our uncertainty more striking and harder to ignore)
Secondly, if we do use EPs (and appropriately wide confidence intervals), this unlocks ways of moving beyond just the general idea that further information would be valuable; it lets us also:
explicitly calculate how valuable more info seems likely to be
identify which uncertainties it’d be most valuable to gather more info on
In fact, there’s an entire body of work on VoI analysis, and a necessary prerequisite for conducting such an analysis is having an EPM.
It does seem plausible to me that, even if we do all of those things, we or others will primarily focus on our (perhaps implicit) point estimate, and overestimate its trustworthiness, just due to human psychology (or EA/rationalist psychology). But that doesn’t seem obvious. Nor does it seem obvious that the overconfidence that may result from using EPs will tend to be greater than the overconfidence that may result from other approaches (like relying on all-things-considered intuitions; recall Kahneman & Klein’s comments from earlier).
And in any case, this whole discussion was easy to have just in terms of very untrustworthy or low-confidence probabilities—there was no need to invoke the idea of a binary risk-uncertainty distinction, or the idea that there are some matters about which we can simply can’t possibly estimate any probabilities.[6]
The optimizer’s curse
Smith gives a “rough sketch” of the optimizer’s curse:
The implications of, and potential solutions to, the optimizer’s curse seem to be complicated and debatable. For more detail, see this post, Smith’s post, comments on Smith’s post, and discussion of the related problem of Goodhart’s law.
As best I can tell:
The optimizer’s curse is likely to be a pervasive problem and is worth taking seriously.
In many situations, the curse will just indicate that we’re probably overestimating how much better than the alternatives the option we estimate is best is—it won’t indicate that we should actually change what option we pick.
But the curse can indicate that we should pick an option other than that which we estimate is best, if (a) we have reason to believe that our estimate of the value of the best option is more uncertain than our estimate of the value of the other options, and (b) we don’t model that information.
I’ve deliberately kept the above points brief (again, see the above links for further explanations and justifications). This is because those points, while clearly relevant to how to use EPs, are only relevant to when to use EPs (vs alternative approaches) if the optimizer’s curse is a larger problem when using EPs than when using alternative approaches. And I don’t think it necessarily is. For example, Smith notes:
This makes a lot of sense to me. But Smith also adds:
I’m less sure what he means by this. I’m guessing he simply means that using multiple, different perspectives means that the various errors and uncertainties are likely to “cancel out” to some extent, reducing the effective uncertainty, and thus reducing the amount by which one is likely to overestimate the value of the best-seeming thing. But if so, it seems that this partial protection could also be achieved by using multiple, different EPMs, making different assumptions in them, getting multiple people to estimate values for inputs, etc.
So ultimately, I think that the problem Smith raises is significant, but I’m quite unsure if it’s a downside of using EPs instead of alternative approaches.
I also don’t think that the optimizer’s curse suggests it’d be valuable to act as if there’s a binary risk-uncertainty distinction. It is clear that the curse gets worse as uncertainty increases (i.e., when one’s probabilities are less trustworthy), but it does so in a gradual, continuous manner. So it seems to me that, again, we’re best off speaking just in terms of more and less trustworthy probabilities, rather than imagining that totally different behaviours are warranted if we’re facing “risk” rather than “Knightian uncertainty”.[7]
Anchoring
One critique of using EPs, or at least making them public, seems to effectively be that people may become anchored on the EPs given. For example, Jason Schukraft writes:
I think that this critique has substantial merit, but that this is most clear in relation to making EPs public, rather than just in relation to using EPs oneself. As Schukraft writes:
I expect that one can somewhat mitigate this issue by providing various strong caveats when EPs are quite untrustworthy. And I think somewhat similar issues can also occur when not using EPs (e.g., if just saying something is “very likely”, or giving a general impression of disapproval of what a certain organisation is doing).
But I doubt that caveats would entirely remove the issue.[9] And I’d guess that the anchoring would be worse if using EPs than if not.
Finally, anchoring does seem a more important downside when one’s probabilities are less trustworthy, because then the odds people will be anchored to a bad estimate are higher. But again, it seems easy, and best, to think about this in terms of more and less trustworthy probabilities, rather than in terms of a binary risk-uncertainty distinction.
Reputational issues
Finally, in the same post, Schukraft notes another issue with using EPs:
Some of my thoughts on this potential downside mirror those I made with regards to anchoring:
This does seem like it would often be a real downside, and worth taking seriously.
This seems most clearly a downside of making EPs public, rather than of using EPs in one’s own thinking (or within a specific organisation or community).
This downside does seem more prominent the less trustworthy one’s probabilities would be.
But unlike all the other downsides I’ve covered, this one does seem like it might warrant acting (in public) as if there is a binary risk-uncertainty distinction. This is because the people one wants to maintain a good reputation with may think there is such a distinction (or effectively think as if that’s true). But it should be noted that this only requires publicly acting as if there’s such a distinction; you don’t have to think as if there’s such a distinction.
One last thing to note is that it also seems possible that similar reputational issues could result from not using EPs. For example, if one relies on qualitative or intuitive approaches, one’s thinking may be seen as “hand-wavey”, “soft”, and/or imprecise by people from a more “hard science” background.
Conclusions
There are some real downsides that can occur in practice when actual humans use EPs (or EPMs, or maximising expected utility)
But some downsides that have been suggested (particularly causing overconfidence and understating the VoI) might actually be more pronounced for approaches other than using EPs
Some downsides (particularly relating to the optimizer’s curse, anchoring, and reputational issues) may be more pronounced when the probabilities one has (or could have) are less trustworthy
Other downsides (particularly excluding one’s intuitive knowledge) may be more pronounced when the probabilities one has (or could have) are more trustworthy
Only one downside (reputational issues) seems to provide any argument for even acting as if there’s a binary risk-uncertainty distinction
And even in that case the argument is quite unclear, and wouldn’t suggest we should use the idea of such a distinction in our own thinking
The above point, combined with arguments I made in an earlier post, makes me believe that we should abandon the concept of the risk-uncertainty distinction in our own thinking (and at least most communication), and that we should think instead in terms of:
a continuum of more to less trustworthy probabilities
the practical upsides and downsides of using EPs, for actual humans.
I’d be interested in people’s thoughts on all of the above; one motivation for writing this post was to see if someone could poke holes in, and thus improve, my thinking.
I should note that this post basically takes as a starting assumption the Bayesian interpretation of probability, “in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief” (Wikipedia). But I think at least a decent amount of what I say would hold for other interpretations of probability (e.g., frequentism).
Of course, I could quickly and easily make an extremely simplistic EPM, or use just a single EP. But then it’s unclear if that’d do better than similarly quick and easy alternative approaches, for the reasons discussed in the following sections. For a potentially contrasting perspective, see Using a Spreadsheet to Make Good Decisions: Five Examples.
This seems analogous to the idea that utilitarianism itself may often recommend against the action of trying to explicitly calculate which action would be recommended by utilitarianism (given that that’s likely to slow one down massively). Amanda Askell has written a post on that topic, in which she says: “As many utilitarians have pointed out, the act utilitarian claim that you should ‘act such that you maximize the aggregate wellbeing’ is best thought of as a criterion of rightness and not as a decision procedure. In fact, trying to use this criterion as a decision procedure will often fail to maximize the aggregate wellbeing. In such cases, utilitarianism will actually say that agents are forbidden to use the utilitarian criterion when they make decisions.”
Along similar lines, Holden Karnofsky (of GiveWell, at the time) writes: “It’s my view that my brain instinctively processes huge amounts of information, coming from many different reference classes, and arrives at a prior; if I attempt to formalize my prior, counting only what I can name and justify, I can worsen the accuracy a lot relative to going with my gut.”
This is different to the idea that people may tend to overestimate EPs, or overestimate cost-effectiveness, or things like that. That claim is also often made, and is probably worth discussing, but I leave it out of this post. Here I’m focusing instead on the separate possibility of people being overconfident about the accuracy of whatever estimate they’ve arrived at, whether it’s high or low.
Here’s Nate Soares making similar points: “In other words, even if my current credence is 50% I can still expect that in 35 years (after encountering a black swan or two) my credence will be very different. This has the effect of making me act uncertain about my current credence, allowing me to say “my credence for this is 50%” without much confidence. So long as I can’t predict the direction of the update, this is consistent Bayesian reasoning.
As a bounded Bayesian, I have all the behaviors recommended by those advocating Knightian uncertainty. I put high value on increasing my hypothesis space, and I often expect that a hypothesis will come out of left field and throw off my predictions. I’m happy to increase my error bars, and I often expect my credences to vary wildly over time. But I do all of this within a Bayesian framework, with no need for exotic “immeasurable” uncertainty.”
Smith’s own views on this point seem a bit confusing. At one point, he writes: “we don’t need to assume a strict dichotomy separates quantifiable risks from unquantifiable risks. Instead, real-world uncertainty falls on something like a spectrum.” But at various other points, he writes things like “The idea that all uncertainty must be explainable in terms of probability is a wrong-way reduction [i.e., a bad idea; see his post for details]”, and “I don’t think ignorance must cash out as a probability distribution”.
While I think this is a good point, I also think it may sometimes be worth considering the risk that one might anchor oneself to one’s own estimate. This could therefore be a downside of even just generating an EP oneself, not just of making EPs public.
I briefly discuss empirical findings that are somewhat relevant to these points here.