I’m getting a little confused about what sorts of concrete conclusions we are supposed to take away from here.
I’m not saying we shouldn’t use priors or that they’ll never help. What I am saying is that they don’t address the optimizer’s curse just by including them, and I suspect they won’t help at all on their own in some cases.
Maybe checking sensitivity to priors and further promoting interventions whose value depends less on them (among some set of “reasonable” priors) would help. You could see this as a special case of Chris’s suggestion to “Entertain multiple models”.
Perhaps you could even use an explicit model to combine the estimates or posteriors from multiple models into a single one in a way that either penalizes sensitivity to priors or gives less weight to more extreme estimates, but a simpler decision rule might be more transparent or otherwise preferable. From my understanding, GiveWell already uses medians of its analysts’ estimates this way.
Ah, I guess we’ll have to switch to a system of epistemology which doesn’t bottom out in unproven assumptions. Hey hold on a minute, there is none.
What I am saying is that they don’t address the optimizer’s curse just by including them, and I suspect they won’t help at all on their own in some cases.
You seem to be using “people all agree” as a stand-in for “the optimizer’s curse has been addressed”. I don’t get this. Addressing the optimizer’s curse has been mathematically demonstrated. Different people can disagree about the specific inputs, so people will disagree, but that doesn’t mean they haven’t addressed the optimizer’s curse.
Maybe checking sensitivity to priors and further promoting interventions whose value depends less on them (among some set of “reasonable” priors) would help. You could see this as a special case of Chris’s suggestion to “Entertain multiple models”.
Perhaps you could even use an explicit model to combine the estimates or posteriors from multiple models into a single one in a way that either penalizes sensitivity to priors or gives less weight to more extreme estimates, but a simpler decision rule might be more transparent or otherwise preferable.
I think combining into a single model is generally appropriate. And the sub-models need not be fully, explicitly laid out.
Suppose I’m demonstrating that poverty charity > animal charity. I don’t have to build one model assuming “1 human = 50 chickens”, another model assuming “1 human = 100 chickens”, and so on.
Instead I just set a general standard for how robust my claims are going to be, and I feel sufficiently confident saying “1 human = at least 60 chickens”, so I use that rather than my mean expectation (e.g. 90).
You seem to be using “people all agree” as a stand-in for “the optimizer’s curse has been addressed”. I don’t get this. Addressing the optimizer’s curse has been mathematically demonstrated. Different people can disagree about the specific inputs, so people will disagree, but that doesn’t mean they haven’t addressed the optimizer’s curse.
Maybe we’re thinking about the optimizer’s curse in different ways.
The proposed solution of using priors just pushes the problem to selecting good priors. It’s also only a solution in the sense that it reduces the likelihood of mistakes happening (discovered in hindsight, and under the assumption of good priors), but not provably to its minimum, since it does not eliminate the impacts of noise. (I don’t think there’s any complete solution to the optimizer’s curse, since, as long as estimates are at least somewhat sensitive to noise, “lucky” estimates will tend to be favoured, and you can’t tell in principle between “lucky” and “better” interventions.)
If you’re presented with multiple priors, and they all seem similarly reasonable to you, but depending on which ones you choose, different actions will be favoured, how would you choose how to act? It’s not just a matter of different people disagreeing on priors, it’s also a matter of committing to particular priors in the first place.
If one action is preferred with almost all of the priors (perhaps rare in practice), isn’t that a reason (perhaps insufficient) to prefer it? To me, using this could be an improvement over just using priors, because I suspect it will further reduce the impacts of noise, and if it is an improvement, then just using priors never fully solved the problem in practice in the first place.
I agree with the rest of your comment. I think something like that would be useful.
The proposed solution of using priors just pushes the problem to selecting good priors.
The problem of the optimizer’s curse is that the EV estimates of high-EV-options are predictably over-optimistic in proportion with how unreliable the estimates are. That problem doesn’t exist anymore.
The fact that you don’t have guaranteed accurate information doesn’t mean the optimizer’s curse still exists.
I don’t think there’s any complete solution to the optimizer’s curse
Well there is, just spend too much time worrying about model uncertainty and other people’s priors and too little time worrying about expected value estimation. Then you’re solving the optimizer’s curse too much, so that your charity selections will be less accurate and predictably biased in favor of low EV, high reliability options. So it’s a bad idea, but you’ve solved the optimizer’s curse.
If you’re presented with multiple priors, and they all seem similarly reasonable to you, but depending on which ones you choose, different actions will be favoured, how would you choose how to act?
Maximize the expected outcome over the distribution of possibilities.
If one action is preferred with almost all of the priors (perhaps rare in practice), isn’t that a reason (perhaps insufficient) to prefer it?
What do you mean by “the priors”? Other people’s priors? Well if they’re other people’s priors and I don’t have reason to update my beliefs based on their priors, then it’s trivially true that this doesn’t give me a reason to prefer the action. But you seem to think that other people’s priors will be “reasonable”, so obviously I should update based on their priors, in which case of course this is true—but only in a banal, trivial sense that has nothing to do with the optimizer’s curse.
To me, using this could be an improvement over just using priors
Hm? You’re just suggesting updating one’s prior by looking at other people’s priors. Assuming that other people’s priors might be rational, this is banal—of course we should be reasonable, epistemically modest, etc. But this has nothing to do with the optimizer’s curse in particular, it’s equally true either way.
I ask the same question I asked of OP: give me some guidance that applies for estimating the impact of maximizing actions that doesn’t apply for estimating the impact of randomly selected actions. So far it still seems like there is none—aside from the basic idea given by Muelhauser.
just using priors never fully solved the problem in practice in the first place
Is the problem the lack of guaranteed knowledge about charity impacts, or is the problem the optimizer’s curse? You seem to (incorrectly) think that chipping away at the former necessarily means chipping away at the latter.
It’s always worth entertaining multiple models if you can do that at no cost. However, doing that often comes at some cost (money, time, etc). In situations with lots of uncertainty (where the optimizer’s curse is liable to cause significant problems), it’s worth paying much higher costs to entertain multiple models (or do other things I suggested) than it is in cases where the optimizer’s curse is unlikely to cause serious problems.
In situations with lots of uncertainty (where the optimizer’s curse is liable to cause significant problems), it’s worth paying much higher costs to entertain multiple models (or do other things I suggested) than it is in cases where the optimizer’s curse is unlikely to cause serious problems.
I don’t agree. Why is the uncertainty that comes from model uncertainty—as opposed to any other kind of uncertainty—uniquely important for the optimizer’s curse? The optimizer’s curse does not discriminate between estimates that are too high for modeling reasons, versus estimates that are too high for any other reason.
The mere fact that there’s more uncertainty is not relevant, because we are talking about how much time we should spend worrying about one kind of uncertainty versus another. “Do more to reduce uncertainty” is just a platitude, we always want to reduce uncertainty.
I made a long top-level comment that I hope will clarify some problems with the solution proposed in the original paper.
I ask the same question I asked of OP: give me some guidance that applies for estimating the impact of maximizing actions that doesn’t apply for estimating the impact of randomly selected actions.
This is a good point. Somehow, I think you’d want to adjust your posterior downward based on the set or the number of options under consideration and how unlikely the data that makes the intervention look good. This is not really useful, since I don’t know how much you should adjust these. Maybe there’s a way to model this explicitly, but it seems like you’d be trying to model your selection process itself before you’ve defined it, and then you look for a selection process which satisfies some properties.
You might also want to spend more effort looking for arguments and evidence against each option the more options you’re considering.
When considering a larger number of options, you could use some randomness in your selection process or spread funding further (although the latter will be vulnerable to the satisficer’s curse if you’re using cutoffs).
What do you mean by “the priors”?
If I haven’t decided on a prior, and multiple different priors (even an infinite set of them) seem equally reasonable to me.
Somehow, I think you’d want to adjust your posterior downward based on the set or the number of options under consideration and how unlikely the data that makes the intervention look good.
That’s the basic idea given by Muelhauser. Corrected posterior EV estimates.
You might also want to spend more effort looking for arguments and evidence against each option the more options you’re considering.
As opposed to equal effort for and against? OK, I’m satisfied. However, if I’ve done the corrected posterior EV estimation, and then my specific search for arguments-against turns up short, then I should increase my EV estimates back towards the original naive estimate.
As I recall, that post found that randomized funding doesn’t make sense. Which 100% matches my presumptions, I do not see how it could improve funding outcomes.
or spread funding further
I don’t see how that would improve funding outcomes.
If I haven’t decided on a prior, and multiple different priors (even an infinite set of them) seem equally reasonable to me.
In Bayesian rationality, you always have a prior. You seem to be considering or defining things differently.
Here we would probably say that your actual prior exists and is simply some kind of aggregate of these possible priors, therefore it’s not the case that we should leap outside our own priors in some sort of violation of standard Bayesian rationality.
The proposed solution of using priors just pushes the problem to selecting good priors.
+1
In conversations I’ve had about this stuff, it seems like the crux is often the question of how easy it is to choose good priors, and whether a “good” prior is even an intelligible concept.
Compare Chris’ piece (“selecting good priors is really hard!”) with this piece by Luke Muehlhauser (“the optimizer’s curse is trivial, just choose an appropriate prior!”)
it seems like the crux is often the question of how easy it is to choose good priors
Before anything like a crux can be identified, complainants need to identify what a “good prior” even means, or what strategies are better than others. Until then, they’re not even wrong—it’s not even possible to say what disagreement exists. To airily talk about “good priors” or “bad priors”, being “easy” or “hard” to identify, is just empty phrasing and suggests confusion about rationality and probability.
Hey Kyle, I’d stopped responding since I felt like we were well beyond the point where we were likely to convince one another or say things that those reading the comments would find insightful.
I understand why you think “good prior” needs to be defined better.
As I try to communicate (but may not quite say explicitly) in my post, I think that in situations where uncertainty is poorly understood, it’s hard to come up with priors that are good enough that choosing actions based explicit Bayesian calculations will lead to better outcomes than choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking.
Venture capitalists frequently fund things that they’re extremely uncertain about. It’s my impression that Bayesian calculations rarely play into these situations. Instead, smart VCs think hard and critically and come to conclusions based on processes that they probably don’t full understand themselves.
It could be that VCs have just failed to realize the amazingness of Bayesianism. However, given that they’re smart & there’s a ton of money on the table, I think the much more plausible explanation is that hardcore Bayesianism wouldn’t lead to better results than whatever it is that successful VCs actually do.
Again, none of this is to say that Bayesianism is fundamentally broken or that high-level Bayesian-ish things like “I have a very skeptical prior so I should not take this estimate of impact at face value” are crazy.
Venture capitalists frequently fund things that they’re extremely uncertain about. It’s my impression that Bayesian calculations rarely play into these situations. Instead, smart VCs think hard and critically and come to conclusions based on processes that they probably don’t full understand themselves.
I interned for a VC, albeit a small and unknown one. Sure, they don’t do Bayesian calculations, if you want to be really precise. But they make extensive use of quantitative estimates all the same. If anything, they are cruder than what EAs do. As far as I know, they don’t bother correcting for the optimizer’s curse! I never heard it mentioned. VCs don’t primarily rely on the quantitative models, but other areas of finance do. If what they do is OK, then what EAs do is better. This is consistent with what finance professionals told me about the financial modeling that I did.
Plus, this is not about the optimizer’s curse. Imagine that you told those VCs that they were no longer choosing which startups are best, instead they now have to select which ones are better-than-average and which ones are worse-than-average. The optimizer’s curse will no longer interfere. Yet they’re not going to start relying more on explicit Bayesian calculations. They’re going to use the same way of thinking as always.
And explicit Bayesian calculation is rarely used by anyone anywhere. Humans encounter many problems which are not about optimizing, and they still don’t use explicit Bayesian calculation. So clearly the optimizer’s curse is not the issue. Instead, it’s a matter of which kinds of cognition and calculation people are more or less comfortable with.
it’s hard to come up with priors that are good enough that choosing actions based explicit Bayesian calculations will lead to better outcomes than choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking.
Explicit Bayesian calculation is a way of choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking. (With math too.)
I’m guessing you mean we should use intuition for the final selection, instead of quantitative estimates. OK, but I don’t see how the original post is supposed to back it up; I don’t see what the optimizer’s curse has to do with it.
I’m not saying we shouldn’t use priors or that they’ll never help. What I am saying is that they don’t address the optimizer’s curse just by including them, and I suspect they won’t help at all on their own in some cases.
Maybe checking sensitivity to priors and further promoting interventions whose value depends less on them (among some set of “reasonable” priors) would help. You could see this as a special case of Chris’s suggestion to “Entertain multiple models”.
Perhaps you could even use an explicit model to combine the estimates or posteriors from multiple models into a single one in a way that either penalizes sensitivity to priors or gives less weight to more extreme estimates, but a simpler decision rule might be more transparent or otherwise preferable. From my understanding, GiveWell already uses medians of its analysts’ estimates this way.
I get your point, but the snark isn’t helpful.
You seem to be using “people all agree” as a stand-in for “the optimizer’s curse has been addressed”. I don’t get this. Addressing the optimizer’s curse has been mathematically demonstrated. Different people can disagree about the specific inputs, so people will disagree, but that doesn’t mean they haven’t addressed the optimizer’s curse.
I think combining into a single model is generally appropriate. And the sub-models need not be fully, explicitly laid out.
Suppose I’m demonstrating that poverty charity > animal charity. I don’t have to build one model assuming “1 human = 50 chickens”, another model assuming “1 human = 100 chickens”, and so on.
Instead I just set a general standard for how robust my claims are going to be, and I feel sufficiently confident saying “1 human = at least 60 chickens”, so I use that rather than my mean expectation (e.g. 90).
Maybe we’re thinking about the optimizer’s curse in different ways.
The proposed solution of using priors just pushes the problem to selecting good priors. It’s also only a solution in the sense that it reduces the likelihood of mistakes happening (discovered in hindsight, and under the assumption of good priors), but not provably to its minimum, since it does not eliminate the impacts of noise. (I don’t think there’s any complete solution to the optimizer’s curse, since, as long as estimates are at least somewhat sensitive to noise, “lucky” estimates will tend to be favoured, and you can’t tell in principle between “lucky” and “better” interventions.)
If you’re presented with multiple priors, and they all seem similarly reasonable to you, but depending on which ones you choose, different actions will be favoured, how would you choose how to act? It’s not just a matter of different people disagreeing on priors, it’s also a matter of committing to particular priors in the first place.
If one action is preferred with almost all of the priors (perhaps rare in practice), isn’t that a reason (perhaps insufficient) to prefer it? To me, using this could be an improvement over just using priors, because I suspect it will further reduce the impacts of noise, and if it is an improvement, then just using priors never fully solved the problem in practice in the first place.
I agree with the rest of your comment. I think something like that would be useful.
The problem of the optimizer’s curse is that the EV estimates of high-EV-options are predictably over-optimistic in proportion with how unreliable the estimates are. That problem doesn’t exist anymore.
The fact that you don’t have guaranteed accurate information doesn’t mean the optimizer’s curse still exists.
Well there is, just spend too much time worrying about model uncertainty and other people’s priors and too little time worrying about expected value estimation. Then you’re solving the optimizer’s curse too much, so that your charity selections will be less accurate and predictably biased in favor of low EV, high reliability options. So it’s a bad idea, but you’ve solved the optimizer’s curse.
Maximize the expected outcome over the distribution of possibilities.
What do you mean by “the priors”? Other people’s priors? Well if they’re other people’s priors and I don’t have reason to update my beliefs based on their priors, then it’s trivially true that this doesn’t give me a reason to prefer the action. But you seem to think that other people’s priors will be “reasonable”, so obviously I should update based on their priors, in which case of course this is true—but only in a banal, trivial sense that has nothing to do with the optimizer’s curse.
Hm? You’re just suggesting updating one’s prior by looking at other people’s priors. Assuming that other people’s priors might be rational, this is banal—of course we should be reasonable, epistemically modest, etc. But this has nothing to do with the optimizer’s curse in particular, it’s equally true either way.
I ask the same question I asked of OP: give me some guidance that applies for estimating the impact of maximizing actions that doesn’t apply for estimating the impact of randomly selected actions. So far it still seems like there is none—aside from the basic idea given by Muelhauser.
Is the problem the lack of guaranteed knowledge about charity impacts, or is the problem the optimizer’s curse? You seem to (incorrectly) think that chipping away at the former necessarily means chipping away at the latter.
It’s always worth entertaining multiple models if you can do that at no cost. However, doing that often comes at some cost (money, time, etc). In situations with lots of uncertainty (where the optimizer’s curse is liable to cause significant problems), it’s worth paying much higher costs to entertain multiple models (or do other things I suggested) than it is in cases where the optimizer’s curse is unlikely to cause serious problems.
I don’t agree. Why is the uncertainty that comes from model uncertainty—as opposed to any other kind of uncertainty—uniquely important for the optimizer’s curse? The optimizer’s curse does not discriminate between estimates that are too high for modeling reasons, versus estimates that are too high for any other reason.
The mere fact that there’s more uncertainty is not relevant, because we are talking about how much time we should spend worrying about one kind of uncertainty versus another. “Do more to reduce uncertainty” is just a platitude, we always want to reduce uncertainty.
I made a long top-level comment that I hope will clarify some problems with the solution proposed in the original paper.
This is a good point. Somehow, I think you’d want to adjust your posterior downward based on the set or the number of options under consideration and how unlikely the data that makes the intervention look good. This is not really useful, since I don’t know how much you should adjust these. Maybe there’s a way to model this explicitly, but it seems like you’d be trying to model your selection process itself before you’ve defined it, and then you look for a selection process which satisfies some properties.
You might also want to spend more effort looking for arguments and evidence against each option the more options you’re considering.
When considering a larger number of options, you could use some randomness in your selection process or spread funding further (although the latter will be vulnerable to the satisficer’s curse if you’re using cutoffs).
If I haven’t decided on a prior, and multiple different priors (even an infinite set of them) seem equally reasonable to me.
That’s the basic idea given by Muelhauser. Corrected posterior EV estimates.
As opposed to equal effort for and against? OK, I’m satisfied. However, if I’ve done the corrected posterior EV estimation, and then my specific search for arguments-against turns up short, then I should increase my EV estimates back towards the original naive estimate.
As I recall, that post found that randomized funding doesn’t make sense. Which 100% matches my presumptions, I do not see how it could improve funding outcomes.
I don’t see how that would improve funding outcomes.
In Bayesian rationality, you always have a prior. You seem to be considering or defining things differently.
Here we would probably say that your actual prior exists and is simply some kind of aggregate of these possible priors, therefore it’s not the case that we should leap outside our own priors in some sort of violation of standard Bayesian rationality.
+1
In conversations I’ve had about this stuff, it seems like the crux is often the question of how easy it is to choose good priors, and whether a “good” prior is even an intelligible concept.
Compare Chris’ piece (“selecting good priors is really hard!”) with this piece by Luke Muehlhauser (“the optimizer’s curse is trivial, just choose an appropriate prior!”)
Before anything like a crux can be identified, complainants need to identify what a “good prior” even means, or what strategies are better than others. Until then, they’re not even wrong—it’s not even possible to say what disagreement exists. To airily talk about “good priors” or “bad priors”, being “easy” or “hard” to identify, is just empty phrasing and suggests confusion about rationality and probability.
Hey Kyle, I’d stopped responding since I felt like we were well beyond the point where we were likely to convince one another or say things that those reading the comments would find insightful.
I understand why you think “good prior” needs to be defined better.
As I try to communicate (but may not quite say explicitly) in my post, I think that in situations where uncertainty is poorly understood, it’s hard to come up with priors that are good enough that choosing actions based explicit Bayesian calculations will lead to better outcomes than choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking.
As a real world example:
Venture capitalists frequently fund things that they’re extremely uncertain about. It’s my impression that Bayesian calculations rarely play into these situations. Instead, smart VCs think hard and critically and come to conclusions based on processes that they probably don’t full understand themselves.
It could be that VCs have just failed to realize the amazingness of Bayesianism. However, given that they’re smart & there’s a ton of money on the table, I think the much more plausible explanation is that hardcore Bayesianism wouldn’t lead to better results than whatever it is that successful VCs actually do.
Again, none of this is to say that Bayesianism is fundamentally broken or that high-level Bayesian-ish things like “I have a very skeptical prior so I should not take this estimate of impact at face value” are crazy.
I interned for a VC, albeit a small and unknown one. Sure, they don’t do Bayesian calculations, if you want to be really precise. But they make extensive use of quantitative estimates all the same. If anything, they are cruder than what EAs do. As far as I know, they don’t bother correcting for the optimizer’s curse! I never heard it mentioned. VCs don’t primarily rely on the quantitative models, but other areas of finance do. If what they do is OK, then what EAs do is better. This is consistent with what finance professionals told me about the financial modeling that I did.
Plus, this is not about the optimizer’s curse. Imagine that you told those VCs that they were no longer choosing which startups are best, instead they now have to select which ones are better-than-average and which ones are worse-than-average. The optimizer’s curse will no longer interfere. Yet they’re not going to start relying more on explicit Bayesian calculations. They’re going to use the same way of thinking as always.
And explicit Bayesian calculation is rarely used by anyone anywhere. Humans encounter many problems which are not about optimizing, and they still don’t use explicit Bayesian calculation. So clearly the optimizer’s curse is not the issue. Instead, it’s a matter of which kinds of cognition and calculation people are more or less comfortable with.
Explicit Bayesian calculation is a way of choosing actions based on a combination of careful skepticism, information gathering, hunches, and critical thinking. (With math too.)
I’m guessing you mean we should use intuition for the final selection, instead of quantitative estimates. OK, but I don’t see how the original post is supposed to back it up; I don’t see what the optimizer’s curse has to do with it.