Chris Smith comments on [Link] The Optimizer’s Curse & Wrong-Way Reductions

Chris Smith Apr 4, 2019, 8:39 PM
2 points
0 ∶ 0
Thanks for the detailed comment!
I expect we’ll remain in disagreement, but I’ll clarify where I stand on a couple of points you raised:
“Optimizer’s curse only matters when comparing better-understood projects to worse-understood projects, but you are talking about “prioritizing among funding opportunities that involve substantial, poorly understood uncertainty.”
Certainly, the optimizer’s curse may be a big deal when well-understood projects are compared with poorly-understood projects. However, I don’t think it’s the case that all projects involving “substantial, poorly understood uncertainty” are on the same footing. Rather, each project is on its own footing, and we’re somewhat ignorant about how firm that footing is.
“We can use prior distributions.”
Yes, absolutely. What I worry about is how reliable those priors will be. I maintain that, in many situations, it’s very hard to defend any particular prior.
“And there is no reason to assume that probabilistic decision makers will overestimate as opposed to underestimate.”
This gets at what I’m really worried about! Let’s assume decisionmakers coming up with probabilistic estimates to assess potential activities don’t have a tendency to overestimate or underestimate. However, once a decisionmaker has made many estimates, there is reason to believe the activities that look most promising likely involve overestimates (because of the optimizer’s curse).
“Here’s a question: how are you going to adjust for the optimizer’s curse if you don’t use probability (implicitly or explicitly)?”
This is a great question!
Rather than saying, “This is a hard problem, and I have an awesome solution no one else has proposed,” I’m trying to say something more like, “This is a problem we should acknowledge! Let’s also acknowledge that it’s a damn hard problem and may not have an easy solution!”
That said, I think there are approaches that have promise (but are not complete solutions):
-Favoring opportunities that look promising under multiple models.
-Being skeptical of opportunities that look promising under only a single model.
-Learning more (if that can cause probability estimates to become less uncertain & hazy).
-Doing more things to put society in a good position to handle problems when they arise (or become apparent) instead of trying to predict problems before they arise (or become apparent).
“Here’s what it means, formally: given that I have an equal desire to be right about the existence of God and the nonexistence of God, and given some basic assumptions about my money and my desire for money, I would make a bet with at most 50:1 odds that all-powerful-God exists.”
This is how a lot of people think about statements of probability, and I think that’s usually reasonable. I’m concerned that people are sometimes accidentally equivocating between: “I would bet on this with at most 50:1 odds” and “this is as likely to occur as a perfectly fair 50-sided die being rolled and coming up ‘17’”
“But in Bayesian decision theory, they aren’t on the same footing. They have very different levels of robustness. They are not well-grounded and this matters for how readily we update away from them. Is the notion of robustness inadequate for solving some problem here?”
The notion of robustness points in the right direction, but I think it’s difficult (perhaps impossible) to reliably and explicitly quantify robustness in the situations we’re concerned about.
- kbog Apr 4, 2019, 9:23 PM
  5 points
  0 ∶ 0
  Parent
  Certainly, the optimizer’s curse may be a big deal when well-understood projects are compared with poorly-understood projects. However, I don’t think it’s the case that all projects involving “substantial, poorly understood uncertainty” are on the same footing. Rather, each project is on its own footing, and we’re somewhat ignorant about how firm that footing is.
  “Footing” here is about the robustness of our credences, so I’m not sure that we can really be ignorant of them. Yes different projects in a poorly understood domain will have different levels of poorly understood uncertainty, but it’s not clear that this is more important than the different levels of uncertainty in better-understood domains (e.g. comparisons across Givewell charities).
  What I worry about is how reliable will those priors be.
  What do you mean by reliable?
  I maintain that, in many situations, it’s very hard to defend any particular prior.
  Yes, but it’s very hard to attack any particular prior as well.
  Let’s assume decisionmakers coming up with probabilistic estimates to assess potential activities don’t have a tendency to overestimate or underestimate. However, once a decisionmaker has made many estimates, there is reason to believe the activities that look most promising likely involve overestimates (because of the optimizer’s curse).
  Yes I know but again it’s the ordering that matters. And we can correct for optimizer’s curse, and we don’t know if these corrections will overcorrect or undercorrect.
  “This is a problem we should acknowledge! Let’s also acknowledge that it’s a damn hard problem and may not have an easy solution!”
  “The problem” should be precisely defined. Identifying the correct intervention is hard because the optimizer’s curse complicates comparisons between better- and worse-substantiated projects? Yes we acknowledge that. And you are not just saying that there’s a problem, you are saying that there is a problem with a particular methodology, Bayesian probability. That is very unclear.
  -Favoring opportunities that look promising under multiple models.
  -Being skeptical of opportunities that look promising under only a single model.
  -Learning more (if that can cause probability estimates to become less uncertain & hazy).
  -Doing more things to put society in a good position to handle problems when they arise (or become apparent) instead of trying to predict problems before they arise (or become apparent).
  This is just a generic bucket of “stuff that makes estimates more accurate, sometimes” without any more connection to the optimizer’s curse than to any other facets of uncertainty.
  Let’s imagine I make a new group whose job is to randomly select projects and then estimate each project’s expected utility as accurately and precisely as possible. In this case the optimizer’s curse will not apply to me. But I’ll still want to evaluate things with multiple models, learn more and use proxies such as social capacity.
  What is some advice that my group should not follow, that Givewell or Open Philanthropy should follow? Aside from the existing advice for how to make adjustments for the Optimizer’s Curse.
  The notion of robustness points in the right direction, but I think it’s difficult (perhaps impossible) to reliably and explicitly quantify robustness in the situations we’re concerned about.
  If you want, you can define some set of future updates (e.g. researching something for 1 week) and specify a probability distribution for your belief state after that process. I don’t think that level of explicit detail is typically necessary though. You can just give a rough idea of your confidence level alongside likelihood estimates.
  - MichaelStJules Apr 14, 2019, 5:57 PM
    3 points
    0 ∶ 0
    Parent
    Yes, but it’s very hard to attack any particular prior as well.
    I don’t think this leaves you in a good position if your estimates and rankings are very sensitive to the choice of “reasonable” priors. Chris illustrated this in his post at the end of part 2 (with the atheist example), and in part 3.
    You could try to choose some compromise between these priors, but there are multiple “reasonable” ways to compromise. You could introduce a prior on these priors, but you could run into the same problem with multiple “reasonable” choices for this new prior.
    - kbog Apr 14, 2019, 8:15 PM
      3 points
      0 ∶ 0
      Parent
      I don’t think this leaves you in a good position if your estimates and rankings are very sensitive to the choice of “reasonable” priors.
      What do you mean by “a good position”?
      You could try to choose some compromise between these priors, but there are multiple “reasonable” ways to compromise. You could introduce a prior on these priors, but you could run into the same problem with multiple “reasonable” choices for this new prior.
      Ah, I guess we’ll have to switch to a system of epistemology which doesn’t bottom out in unproven assumptions. Hey hold on a minute, there is none.
      I’m getting a little confused about what sorts of concrete conclusions we are supposed to take away from here.
      - MichaelStJules Apr 14, 2019, 11:09 PM
        3 points
        0 ∶ 0
        Parent
        What do you mean by “a good position”?
        (...)
        I’m getting a little confused about what sorts of concrete conclusions we are supposed to take away from here.
        I’m not saying we shouldn’t use priors or that they’ll never help. What I am saying is that they don’t address the optimizer’s curse just by including them, and I suspect they won’t help at all on their own in some cases.
        Maybe checking sensitivity to priors and further promoting interventions whose value depends less on them (among some set of “reasonable” priors) would help. You could see this as a special case of Chris’s suggestion to “Entertain multiple models”.
        Perhaps you could even use an explicit model to combine the estimates or posteriors from multiple models into a single one in a way that either penalizes sensitivity to priors or gives less weight to more extreme estimates, but a simpler decision rule might be more transparent or otherwise preferable. From my understanding, GiveWell already uses medians of its analysts’ estimates this way.
        Ah, I guess we’ll have to switch to a system of epistemology which doesn’t bottom out in unproven assumptions. Hey hold on a minute, there is none.
        I get your point, but the snark isn’t helpful.
        kbog Apr 15, 2019, 3:34 AM
        2 points
        0 ∶ 0
        Parent
        What I am saying is that they don’t address the optimizer’s curse just by including them, and I suspect they won’t help at all on their own in some cases.
        You seem to be using “people all agree” as a stand-in for “the optimizer’s curse has been addressed”. I don’t get this. Addressing the optimizer’s curse has been mathematically demonstrated. Different people can disagree about the specific inputs, so people will disagree, but that doesn’t mean they haven’t addressed the optimizer’s curse.
        Maybe checking sensitivity to priors and further promoting interventions whose value depends less on them (among some set of “reasonable” priors) would help. You could see this as a special case of Chris’s suggestion to “Entertain multiple models”.
        Perhaps you could even use an explicit model to combine the estimates or posteriors from multiple models into a single one in a way that either penalizes sensitivity to priors or gives less weight to more extreme estimates, but a simpler decision rule might be more transparent or otherwise preferable.
        I think combining into a single model is generally appropriate. And the sub-models need not be fully, explicitly laid out.
        Suppose I’m demonstrating that poverty charity > animal charity. I don’t have to build one model assuming “1 human = 50 chickens”, another model assuming “1 human = 100 chickens”, and so on.
        Instead I just set a general standard for how robust my claims are going to be, and I feel sufficiently confident saying “1 human = at least 60 chickens”, so I use that rather than my mean expectation (e.g. 90).
        MichaelStJules Apr 16, 2019, 6:42 AM
        3 points
        0 ∶ 0
        Parent
        You seem to be using “people all agree” as a stand-in for “the optimizer’s curse has been addressed”. I don’t get this. Addressing the optimizer’s curse has been mathematically demonstrated. Different people can disagree about the specific inputs, so people will disagree, but that doesn’t mean they haven’t addressed the optimizer’s curse.
        Maybe we’re thinking about the optimizer’s curse in different ways.
        The proposed solution of using priors just pushes the problem to selecting good priors. It’s also only a solution in the sense that it reduces the likelihood of mistakes happening (discovered in hindsight, and under the assumption of good priors), but not provably to its minimum, since it does not eliminate the impacts of noise. (I don’t think there’s any complete solution to the optimizer’s curse, since, as long as estimates are at least somewhat sensitive to noise, “lucky” estimates will tend to be favoured, and you can’t tell in principle between “lucky” and “better” interventions.)
        If you’re presented with multiple priors, and they all seem similarly reasonable to you, but depending on which ones you choose, different actions will be favoured, how would you choose how to act? It’s not just a matter of different people disagreeing on priors, it’s also a matter of committing to particular priors in the first place.
        If one action is preferred with almost all of the priors (perhaps rare in practice), isn’t that a reason (perhaps insufficient) to prefer it? To me, using this could be an improvement over just using priors, because I suspect it will further reduce the impacts of noise, and if it is an improvement, then just using priors never fully solved the problem in practice in the first place.
        I agree with the rest of your comment. I think something like that would be useful.
        kbog Apr 17, 2019, 3:23 AM
        10 points
        0 ∶ 0
        Parent
        The proposed solution of using priors just pushes the problem to selecting good priors.
        The problem of the optimizer’s curse is that the EV estimates of high-EV-options are predictably over-optimistic in proportion with how unreliable the estimates are. That problem doesn’t exist anymore.
        The fact that you don’t have guaranteed accurate information doesn’t mean the optimizer’s curse still exists.
        I don’t think there’s any complete solution to the optimizer’s curse
        Well there is, just spend too much time worrying about model uncertainty and other people’s priors and too little time worrying about expected value estimation. Then you’re solving the optimizer’s curse too much, so that your charity selections will be less accurate and predictably biased in favor of low EV, high reliability options. So it’s a bad idea, but you’ve solved the optimizer’s curse.
        If you’re presented with multiple priors, and they all seem similarly reasonable to you, but depending on which ones you choose, different actions will be favoured, how would you choose how to act?
        Maximize the expected outcome over the distribution of possibilities.
        If one action is preferred with almost all of the priors (perhaps rare in practice), isn’t that a reason (perhaps insufficient) to prefer it?
        What do you mean by “the priors”? Other people’s priors? Well if they’re other people’s priors and I don’t have reason to update my beliefs based on their priors, then it’s trivially true that this doesn’t give me a reason to prefer the action. But you seem to think that other people’s priors will be “reasonable”, so obviously I should update based on their priors, in which case of course this is true—but only in a banal, trivial sense that has nothing to do with the optimizer’s curse.
        To me, using this could be an improvement over just using priors
        Hm? You’re just suggesting updating one’s prior by looking at other people’s priors. Assuming that other people’s priors might be rational, this is banal—of course we should be reasonable, epistemically modest, etc. But this has nothing to do with the optimizer’s curse in particular, it’s equally true either way.
        I ask the same question I asked of OP: give me some guidance that applies for estimating the impact of maximizing actions that doesn’t apply for estimating the impact of randomly selected actions. So far it still seems like there is none—aside from the basic idea given by Muelhauser.
        just using priors never fully solved the problem in practice in the first place
        Is the problem the lack of guaranteed knowledge about charity impacts, or is the problem the optimizer’s curse? You seem to (incorrectly) think that chipping away at the former necessarily means chipping away at the latter.
        Chris Smith Apr 18, 2019, 4:08 AM
        4 points
        0 ∶ 0
        Parent
        It’s always worth entertaining multiple models if you can do that at no cost. However, doing that often comes at some cost (money, time, etc). In situations with lots of uncertainty (where the optimizer’s curse is liable to cause significant problems), it’s worth paying much higher costs to entertain multiple models (or do other things I suggested) than it is in cases where the optimizer’s curse is unlikely to cause serious problems.
        kbog Apr 21, 2019, 3:04 AM
        3 points
        0 ∶ 0
        Parent
        In situations with lots of uncertainty (where the optimizer’s curse is liable to cause significant problems), it’s worth paying much higher costs to entertain multiple models (or do other things I suggested) than it is in cases where the optimizer’s curse is unlikely to cause serious problems.
        I don’t agree. Why is the uncertainty that comes from model uncertainty—as opposed to any other kind of uncertainty—uniquely important for the optimizer’s curse? The optimizer’s curse does not discriminate between estimates that are too high for modeling reasons, versus estimates that are too high for any other reason.
        The mere fact that there’s more uncertainty is not relevant, because we are talking about how much time we should spend worrying about one kind of uncertainty versus another. “Do more to reduce uncertainty” is just a platitude, we always want to reduce uncertainty.
        MichaelStJules Apr 20, 2019, 10:32 PM
        1 point
        0 ∶ 0
        Parent
        I made a long top-level comment that I hope will clarify some problems with the solution proposed in the original paper.
        I ask the same question I asked of OP: give me some guidance that applies for estimating the impact of maximizing actions that doesn’t apply for estimating the impact of randomly selected actions.
        This is a good point. Somehow, I think you’d want to adjust your posterior downward based on the set or the number of options under consideration and how unlikely the data that makes the intervention look good. This is not really useful, since I don’t know how much you should adjust these. Maybe there’s a way to model this explicitly, but it seems like you’d be trying to model your selection process itself before you’ve defined it, and then you look for a selection process which satisfies some properties.
        You might also want to spend more effort looking for arguments and evidence against each option the more options you’re considering.
        When considering a larger number of options, you could use some randomness in your selection process or spread funding further (although the latter will be vulnerable to the satisficer’s curse if you’re using cutoffs).
        What do you mean by “the priors”?
        If I haven’t decided on a prior, and multiple different priors (even an infinite set of them) seem equally reasonable to me.
        kbog Apr 21, 2019, 3:01 AM
        1 point
        0 ∶ 0
        Parent
        Somehow, I think you’d want to adjust your posterior downward based on the set or the number of options under consideration and how unlikely the data that makes the intervention look good.
        That’s the basic idea given by Muelhauser. Corrected posterior EV estimates.
        You might also want to spend more effort looking for arguments and evidence against each option the more options you’re considering.
        As opposed to equal effort for and against? OK, I’m satisfied. However, if I’ve done the corrected posterior EV estimation, and then my specific search for arguments-against turns up short, then I should increase my EV estimates back towards the original naive estimate.
        When considering a larger number of options, you could use some randomness in your selection process
        As I recall, that post found that randomized funding doesn’t make sense. Which 100% matches my presumptions, I do not see how it could improve funding outcomes.
        or spread funding further
        I don’t see how that would improve funding outcomes.
        If I haven’t decided on a prior, and multiple different priors (even an infinite set of them) seem equally reasonable to me.
        In Bayesian rationality, you always have a prior. You seem to be considering or defining things differently.
        Here we would probably say that your actual prior exists and is simply some kind of aggregate of these possible priors, therefore it’s not the case that we should leap outside our own priors in some sort of violation of standard Bayesian rationality.
        Milan Griffes Apr 16, 2019, 2:21 PM
        6 points
        0 ∶ 0
        Parent
        The proposed solution of using priors just pushes the problem to selecting good priors.
        +1
        In conversations I’ve had about this stuff, it seems like the crux is often the question of how easy it is to choose good priors, and whether a “good” prior is even an intelligible concept.
        Compare Chris’ piece (“selecting good priors is really hard!”) with this piece by Luke Muehlhauser (“the optimizer’s curse is trivial, just choose an appropriate prior!”)
        kbog Apr 17, 2019, 3:31 AM
        3 points
        0 ∶ 0
        Parent
        it seems like the crux is often the question of how easy it is to choose good priors
        Before anything like a crux can be identified, complainants need to identify what a “good prior” even means, or what strategies are better than others. Until then, they’re not even wrong—it’s not even possible to say what disagreement exists. To airily talk about “good priors” or “bad priors”, being “easy” or “hard” to identify, is just empty phrasing and suggests confusion about rationality and probability.
        Chris Smith Apr 18, 2019, 4:02 AM
        4 points
        0 ∶ 0
        Parent
        Hey Kyle, I’d stopped responding since I felt like we were well beyond the point where we were likely to convince one another or say things that those reading the comments would find insightful.
        I understand why you think “good prior” needs to be defined better.
        As I try to communicate (but may not quite say explicitly) in my post, I think that in situations where uncertainty is poorly understood, it’s hard to come up with priors that are good enough that choosing actions based explicit Bayesian calculations will lead to better outcomes than choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking.
        Chris Smith Apr 18, 2019, 4:15 AM
        9 points
        0 ∶ 0
        Parent
        As a real world example:
        Venture capitalists frequently fund things that they’re extremely uncertain about. It’s my impression that Bayesian calculations rarely play into these situations. Instead, smart VCs think hard and critically and come to conclusions based on processes that they probably don’t full understand themselves.
        It could be that VCs have just failed to realize the amazingness of Bayesianism. However, given that they’re smart & there’s a ton of money on the table, I think the much more plausible explanation is that hardcore Bayesianism wouldn’t lead to better results than whatever it is that successful VCs actually do.
        Expand this thread
        Chris Smith Apr 18, 2019, 4:22 AM
        5 points
        0 ∶ 0
        Parent
        Again, none of this is to say that Bayesianism is fundamentally broken or that high-level Bayesian-ish things like “I have a very skeptical prior so I should not take this estimate of impact at face value” are crazy.
        kbog Apr 21, 2019, 3:23 AM
        4 points
        0 ∶ 0
        Parent
        Venture capitalists frequently fund things that they’re extremely uncertain about. It’s my impression that Bayesian calculations rarely play into these situations. Instead, smart VCs think hard and critically and come to conclusions based on processes that they probably don’t full understand themselves.
        I interned for a VC, albeit a small and unknown one. Sure, they don’t do Bayesian calculations, if you want to be really precise. But they make extensive use of quantitative estimates all the same. If anything, they are cruder than what EAs do. As far as I know, they don’t bother correcting for the optimizer’s curse! I never heard it mentioned. VCs don’t primarily rely on the quantitative models, but other areas of finance do. If what they do is OK, then what EAs do is better. This is consistent with what finance professionals told me about the financial modeling that I did.
        Plus, this is not about the optimizer’s curse. Imagine that you told those VCs that they were no longer choosing which startups are best, instead they now have to select which ones are better-than-average and which ones are worse-than-average. The optimizer’s curse will no longer interfere. Yet they’re not going to start relying more on explicit Bayesian calculations. They’re going to use the same way of thinking as always.
        And explicit Bayesian calculation is rarely used by anyone anywhere. Humans encounter many problems which are not about optimizing, and they still don’t use explicit Bayesian calculation. So clearly the optimizer’s curse is not the issue. Instead, it’s a matter of which kinds of cognition and calculation people are more or less comfortable with.
        kbog Apr 21, 2019, 3:23 AM
        2 points
        0 ∶ 0
        Parent
        it’s hard to come up with priors that are good enough that choosing actions based explicit Bayesian calculations will lead to better outcomes than choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking.
        Explicit Bayesian calculation is a way of choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking. (With math too.)
        I’m guessing you mean we should use intuition for the final selection, instead of quantitative estimates. OK, but I don’t see how the original post is supposed to back it up; I don’t see what the optimizer’s curse has to do with it.
  - Chris Smith Apr 5, 2019, 10:37 PM
    2 points
    0 ∶ 0
    Parent
    I’m struggling to understand how your proposed new group avoids the optimizer’s curse, and I’m worried we’re already talking past each other. To be clear, I’m don’t believe there’s something wrong with Bayesian methods in the abstract. Those methods are correct in a technical sense. They clearly work in situations where everything that matters can be completely quantified.
    The position I’m taking is that the scope of real-world problems that those methods are useful for is limited because our ability to precisely quantify things is severely limited in many real-world scenarios. In my post, I try to build the case for why attempting Bayesian approaches in scenarios where things are really hard to quantify might be misguided.
    - kbog Apr 6, 2019, 12:41 AM
      3 points
      0 ∶ 0
      Parent
      I’m struggling to understand how your proposed new group avoids the optimizer’s curse,
      Because I’m not optimizing!
      Of course it is still the case that the highest-scoring estimates will probably be overestimates in my new group. The difference is, I don’t care about getting the right scores on the highest-scoring estimates. Now I care about getting the best scores on all my estimates.
      Or to phrase it another way, suppose that the intervention will be randomly selected rather than picked from the top.
      The position I’m taking is that the scope of real-world problems that those methods are useful for is limited because our ability to precisely quantify things is severely limited in many real-world scenarios. In my post, I try to build the case for why attempting Bayesian approaches in scenarios where things are really hard to quantify might be misguided.
      Well yes, but I think the methods work better than anything else for all these scenarios.