I think the example Ben cites in his reply is very illustrative.
You might feel that you can’t justify your one specific choice of prior over another prior, so that particular choice is arbitrary, and then what you should do could depend on this arbitrary choice, whereas an equally reasonable prior would recommend a different decision. Someone else could have exactly the same information as you, but due to a different psychology, or just different patterns of neurons firing, come up with a different prior that ends up recommending a different decision. Choosing one prior over another without reason seems like a whim or a bias, and potentially especially prone to systematic error.
It seems bad if we’re basing how to do the most good on whims and biases.
If you’re lucky enough to have only finitely many equally reasonable priors, then I think it does make sense to just use a uniform meta-prior over them, i.e. just take their average. This doesn’t seem to work with infinitely many priors, since you could use different parametrizations to represent the same continuous family of distributions, with a different uniform distribution and therefore average for each parametrization. You’d have to justify your choice of parametrization!
As another example, imagine you have a coin that someone (who is trustworthy) has told you is biased towards heads, but they haven’t given you any hint how much, and you want to come up with a probability distribution for the fraction of heads over 1,000,000 flips. So, you want a distribution over the interval [0, 1]. Which distribution would you use? Say you give me a probability density function f. Why not (1−p)f(x)+p for some p∈(0,1)? Why not 1∫10f(xp)dxf(xp) for some p>0? If f is a weighted average of multiple distributions, why not apply one of these transformations to one of the component distributions and choose the resulting weighted average instead? Why the particular weights you’ve chosen and not slightly different ones?
Which distribution would you use? Why the particular weights you’ve chosen and not slightly different ones?
I think you just have to make your distribution uninformative enough that reasonable differences in the weights don’t change your overall conclusion. If they do, then I would concede that the solution to your specific question really is clueless. Otherwise, you can probably find a response.
come up with a probability distribution for the fraction of heads over 1,000,000 flips.
Rather than thinking of directly of appropriate distribution for the 1,000,000 flips, I’d think of a distribution to model p itself. Then you can run simulations based on the distribution of p to calculate the distribution of the fraction of 1000,000 flips. p∈(0.5,1.0], and then we need to select a distribution for p over that range.
There is no one correct probability distribution for p because any probability is just an expression of our belief, so you may use whatever probability distribution genuinely reflects your prior belief. A uniform distribution is a reasonable start. Perhaps you really are clueless about p, in which case, yes, there’s a certain amount of subjectivity about your choice. But prior beliefs are always inherently subjective, because they simply describe your belief about the state of the world as you know it now. The fact you might have to select a distribution, or set of distributions with some weighted average, is merely an expression of your uncertainty. This in itself, I think, doesn’t stop you from trying to estimate the result.
I think this expresses within Bayesian terms the philosophical idea that we can only make moral choices based on information available at the time; one can’t be held morally responsible for mistakes made on the basis of the information we didn’t have.
Perhaps you disagree with me that a uniform distribution is the best choice. You reason thus: “we have some idea about the properties of coins in general. It’s difficult to make a coin that is 100% biased towards heads. So that seems unlikely”. So we could pick a distribution that better reflects your prior belief. Perhaps a suitable choice might be Beta(2,2) with a truncation at 0.5, which will give the greatest likelihood of p just above 0.5, and a declining likelihood down to 1.0.
Maybe you and i just can’t agree after all that there is still no consistent and reasonable prior choice you can make, and not even any compromise. And let’s say we both run simulations using our own priors and find entirely different results and we can’t agree on any suitable weighting between them. In that case, yes, I can see you have cluelessness. I don’t think it follows that, if we went through the same process for estimating the longtermist moral worth of malaria bednet distribution, we must have intractable complex cluelessness about specific problems like malaria bednet distribution. I think I can admit that perhaps, right now, in our current belief state, we are genuinely clueless, but it seems that there is some work that can be done that might eliminate the cluelessness.
I think the example Ben cites in his reply is very illustrative.
You might feel that you can’t justify your one specific choice of prior over another prior, so that particular choice is arbitrary, and then what you should do could depend on this arbitrary choice, whereas an equally reasonable prior would recommend a different decision. Someone else could have exactly the same information as you, but due to a different psychology, or just different patterns of neurons firing, come up with a different prior that ends up recommending a different decision. Choosing one prior over another without reason seems like a whim or a bias, and potentially especially prone to systematic error.
It seems bad if we’re basing how to do the most good on whims and biases.
If you’re lucky enough to have only finitely many equally reasonable priors, then I think it does make sense to just use a uniform meta-prior over them, i.e. just take their average. This doesn’t seem to work with infinitely many priors, since you could use different parametrizations to represent the same continuous family of distributions, with a different uniform distribution and therefore average for each parametrization. You’d have to justify your choice of parametrization!
As another example, imagine you have a coin that someone (who is trustworthy) has told you is biased towards heads, but they haven’t given you any hint how much, and you want to come up with a probability distribution for the fraction of heads over 1,000,000 flips. So, you want a distribution over the interval [0, 1]. Which distribution would you use? Say you give me a probability density function f. Why not (1−p)f(x)+p for some p∈(0,1)? Why not 1∫10f(xp)dxf(xp) for some p>0? If f is a weighted average of multiple distributions, why not apply one of these transformations to one of the component distributions and choose the resulting weighted average instead? Why the particular weights you’ve chosen and not slightly different ones?
I think you just have to make your distribution uninformative enough that reasonable differences in the weights don’t change your overall conclusion. If they do, then I would concede that the solution to your specific question really is clueless. Otherwise, you can probably find a response.
Rather than thinking of directly of appropriate distribution for the 1,000,000 flips, I’d think of a distribution to model p itself. Then you can run simulations based on the distribution of p to calculate the distribution of the fraction of 1000,000 flips. p∈(0.5,1.0], and then we need to select a distribution for p over that range.
There is no one correct probability distribution for p because any probability is just an expression of our belief, so you may use whatever probability distribution genuinely reflects your prior belief. A uniform distribution is a reasonable start. Perhaps you really are clueless about p, in which case, yes, there’s a certain amount of subjectivity about your choice. But prior beliefs are always inherently subjective, because they simply describe your belief about the state of the world as you know it now. The fact you might have to select a distribution, or set of distributions with some weighted average, is merely an expression of your uncertainty. This in itself, I think, doesn’t stop you from trying to estimate the result.
I think this expresses within Bayesian terms the philosophical idea that we can only make moral choices based on information available at the time; one can’t be held morally responsible for mistakes made on the basis of the information we didn’t have.
Perhaps you disagree with me that a uniform distribution is the best choice. You reason thus: “we have some idea about the properties of coins in general. It’s difficult to make a coin that is 100% biased towards heads. So that seems unlikely”. So we could pick a distribution that better reflects your prior belief. Perhaps a suitable choice might be Beta(2,2) with a truncation at 0.5, which will give the greatest likelihood of p just above 0.5, and a declining likelihood down to 1.0.
Maybe you and i just can’t agree after all that there is still no consistent and reasonable prior choice you can make, and not even any compromise. And let’s say we both run simulations using our own priors and find entirely different results and we can’t agree on any suitable weighting between them. In that case, yes, I can see you have cluelessness. I don’t think it follows that, if we went through the same process for estimating the longtermist moral worth of malaria bednet distribution, we must have intractable complex cluelessness about specific problems like malaria bednet distribution. I think I can admit that perhaps, right now, in our current belief state, we are genuinely clueless, but it seems that there is some work that can be done that might eliminate the cluelessness.