Cosmologist: Well, Iâm a little uncomfortable with this, but Iâll give it a shot. I will tentatively say that the odds of doom are higher than 1 in a googol. But I donât know the order of magnitude of the actual threat. To convey this:
Iâll give a 1% chance itâs between 10^-100 and 10^-99
A 1% chance itâs between 10^-99 and 10^-98
A 1% chance itâs between 10^-98 and 10^-97,
And so on, all the way up to a 1% chance itâs between 1 in 10 and 100%.
I think the root of the problem in this paradox is that this isnât a very defensible humble/âuniform prior, and if the cosmologist were to think it through more they could come up with one that gives a lower p(doom) (or at least, doesnât look much like the distribution stated initially).
So, I agree with this as a criticism of pop-Bayes in the sense that people will often come up with a quick uniform-prior-sounding explanation for why some unlikely event has a probability that is around 1%, but I think the problem here is that the prior is wrong[1] rather than failing to consider the whole distribution, seeing as a distribution over probabilities collapses to a single probability anyway.
Imo the deeper problem is how to generate the correct prior, which can be a problem due to âpop Bayesâ, but also remains when you try to do the actual Bayesian statistics.
Explanation of why I think this is quite an unnatural estimate in this case
Disclaimer: I too have no particular claim on being great at stats, so take this with a pinch of salt
The cosmologist is supposing a model where the universe as it exists is analogous to the result of a single Bernoulli trial, where the âyesâ outcome is that the universe is a simulation that will be shut down. Writing this Bernoulli distribution as B(Îł)[2], they are then claiming uncertainty over the value of Îł. So far so uncontroversial.
They then propose to take the pdf over Îł to be:
p(Îł)={0if Îł<10â100kÎłfor 10â100â€Îłâ€1(A)
Where k is a normalisation constant. This is the distribution that results in the property that each OOM has an equal probability[3]. Questions about this:
Is this the appropriate non-informative prior?
Is this a situation where itâs appropriate to appeal to a non-informative prior anyway?
Is this the appropriate non-informative prior?
I will tentatively say that the odds of doom are higher than 1 in a googol. But I donât know the order of magnitude of the actual threat.
The basis on which the cosmologist chooses this model is an appeal to a kind of âtotal uncertaintyâ/ânon-informative-prior style reasoning, but:
They are inserting a concrete value of x=10â100 as a lower bound
They are supposing the total uncertainty is over the order of magnitude of the probability, which is quite a specific choice
This results in a model where E(Îł)=â1/ln(x)=1/100ln(10)â1in230 in this case, so the expected probability is very sensitive to this lower bound parameter, which is a red flag for a model that is supposed to represent total uncertainty.
There is apparently a generally accepted way to generate non-informative-priors for parameters in statistical models, which is to use a Jeffreys prior. The Jeffreys prior[4] for the Bernoulli distribution is:
p(Îł)=1âÎł(1âÎł)(B)
This doesnât look much like equation (A) that the cosmologist proposed. There are parameters where the Jeffreys prior is â1x, such as the standard deviation in the normal distribution, but these tend to be scale parameters that can range from 0 to â. Using it for a probability does seem quite unnatural when you contrast it with these examples, because a probability has hard bounds at 0 and 1.
Is this a situation where itâs appropriate to appeal to a non-informative prior anyway?
Using the recommended non-informative prior (B), we get that the expected probability is 0.5. Which makes sense for the class of problems concerned with something that either happens or doesnât, where we are totally uncertain about this.
I expect the cosmologist would take issue with this as well, and say âok, Iâm not that uncertainâ. Some reasons he would be right to take issue are:
A general prior that âout of the space of things that could be the case, most are not the case[5]â, this should update the probability towards 0. And in fact massively so, such that in the absence of any other evidence you should think the probability is vanishingly small, as you would for the question of âIs the universe riding on the back of a giant turtle?â
The reason to consider this simulation possibility in the first place, is not just that it is in principle allowed by the known laws of physics, but that there is a specific argument for why it should be the case. This should update the probability away from 0
The real problem the cosmologist has is uncertainty in how to incorporate the evidence of (2) into a probability (distribution). Clearly they think there is enough to the argument to not immediately reject it out of hand, or they would put it in the same category as the turtle-universe, but they are uncertain about how strong the argument actually is and therefore how much it should update their default-low prior.
...
I think this deeper problem gets related to the idea of non-informative priors in Bayesian statistics via a kind of linguistic collision.
Non-informative priors are about having a model which you have not yet updated based on evidence, so you are âmaximally uncertainâ about the parameters. In the case of having evidence only in the form of a clever argument, you might think âwell Iâm very uncertain about how to turn this into a probability, and the thing you do when youâre very uncertain is use a non-informative priorâ. You might therefore come up with a model where the parameters have the kind of neat symmetry-based uncertainty that you tend to see in non-informative priors (as the cosmologist did in your example).
I think these cases are quite different though, arguably close to being opposites. In the second (the case of having evidence only in the form of a clever argument), the problem is not a lack of information, but that the information doesnât come in the form of observations of random variables. Itâs therefore hard to come up with a likelihood function based on this evidence, and so I donât have a good recommendation for what the cosmologist should say instead. But I think the original problem of how they end up with a 1 in 230 probability is due to a failed attempt to avoid this by appealing to an non-informative prior over order of magnitude.
There is also a meta problem where the prior will tend to be too high rather than too low, because probabilities canât go below zero, and this leads to people on average being overly spooked by low probability events
Îł being the âtrue probabilityâ. Iâm using Îł rather than p because 1) in general parameters of probability distributions donât need to be probabilities themselves, e.g. the mean of a normal distribution, 2) Îł is a random variable in this case, so talking about the probability of p taking a certain value could be confusing 3) itâs what is used in the linked Wikipedia article on Jeffreys priors
For some things you can make a mutual exclusivity + uncertainty argument for why the probability should be low. E.g. for the case of the universe riding on the back of the turtle you could consider all the other types of animals it could be riding on the back of, and point out that you have no particular reason to prefer a turtle. For the simulation argument and various other cases itâs trickier because they might be consistent with lots of other things, but you can still appeal to Occamâs razor and/âor viewing this as an empirical fact about the universe
you are saying that if there are n possibilities and we have no evidence then probability is 1/ân rather than 1â2. But for these topics there is an infinite set of discrete possibilities. Since the set is infinite it doesnât make sense to have uniform prior because there is no way for the probabilities to add up to 1.
I think the root of the problem in this paradox is that this isnât a very defensible humble/âuniform prior, and if the cosmologist were to think it through more they could come up with one that gives a lower p(doom) (or at least, doesnât look much like the distribution stated initially).
So, I agree with this as a criticism of pop-Bayes in the sense that people will often come up with a quick uniform-prior-sounding explanation for why some unlikely event has a probability that is around 1%, but I think the problem here is that the prior is wrong[1] rather than failing to consider the whole distribution, seeing as a distribution over probabilities collapses to a single probability anyway.
Imo the deeper problem is how to generate the correct prior, which can be a problem due to âpop Bayesâ, but also remains when you try to do the actual Bayesian statistics.
Explanation of why I think this is quite an unnatural estimate in this case
Disclaimer: I too have no particular claim on being great at stats, so take this with a pinch of salt
The cosmologist is supposing a model where the universe as it exists is analogous to the result of a single Bernoulli trial, where the âyesâ outcome is that the universe is a simulation that will be shut down. Writing this Bernoulli distribution as B(Îł)[2], they are then claiming uncertainty over the value of Îł. So far so uncontroversial.
They then propose to take the pdf over Îł to be:
p(Îł)={0if Îł<10â100kÎłfor 10â100â€Îłâ€1(A)Where k is a normalisation constant. This is the distribution that results in the property that each OOM has an equal probability[3]. Questions about this:
Is this the appropriate non-informative prior?
Is this a situation where itâs appropriate to appeal to a non-informative prior anyway?
Is this the appropriate non-informative prior?
The basis on which the cosmologist chooses this model is an appeal to a kind of âtotal uncertaintyâ/ânon-informative-prior style reasoning, but:
They are inserting a concrete value of x=10â100 as a lower bound
They are supposing the total uncertainty is over the order of magnitude of the probability, which is quite a specific choice
This results in a model where E(Îł)=â1/ln(x)=1/100ln(10)â1in230 in this case, so the expected probability is very sensitive to this lower bound parameter, which is a red flag for a model that is supposed to represent total uncertainty.
There is apparently a generally accepted way to generate non-informative-priors for parameters in statistical models, which is to use a Jeffreys prior. The Jeffreys prior[4] for the Bernoulli distribution is:
p(Îł)=1âÎł(1âÎł)(B)This doesnât look much like equation (A) that the cosmologist proposed. There are parameters where the Jeffreys prior is â1x, such as the standard deviation in the normal distribution, but these tend to be scale parameters that can range from 0 to â. Using it for a probability does seem quite unnatural when you contrast it with these examples, because a probability has hard bounds at 0 and 1.
Is this a situation where itâs appropriate to appeal to a non-informative prior anyway?
Using the recommended non-informative prior (B), we get that the expected probability is 0.5. Which makes sense for the class of problems concerned with something that either happens or doesnât, where we are totally uncertain about this.
I expect the cosmologist would take issue with this as well, and say âok, Iâm not that uncertainâ. Some reasons he would be right to take issue are:
A general prior that âout of the space of things that could be the case, most are not the case[5]â, this should update the probability towards 0. And in fact massively so, such that in the absence of any other evidence you should think the probability is vanishingly small, as you would for the question of âIs the universe riding on the back of a giant turtle?â
The reason to consider this simulation possibility in the first place, is not just that it is in principle allowed by the known laws of physics, but that there is a specific argument for why it should be the case. This should update the probability away from 0
The real problem the cosmologist has is uncertainty in how to incorporate the evidence of (2) into a probability (distribution). Clearly they think there is enough to the argument to not immediately reject it out of hand, or they would put it in the same category as the turtle-universe, but they are uncertain about how strong the argument actually is and therefore how much it should update their default-low prior.
...
I think this deeper problem gets related to the idea of non-informative priors in Bayesian statistics via a kind of linguistic collision.
Non-informative priors are about having a model which you have not yet updated based on evidence, so you are âmaximally uncertainâ about the parameters. In the case of having evidence only in the form of a clever argument, you might think âwell Iâm very uncertain about how to turn this into a probability, and the thing you do when youâre very uncertain is use a non-informative priorâ. You might therefore come up with a model where the parameters have the kind of neat symmetry-based uncertainty that you tend to see in non-informative priors (as the cosmologist did in your example).
I think these cases are quite different though, arguably close to being opposites. In the second (the case of having evidence only in the form of a clever argument), the problem is not a lack of information, but that the information doesnât come in the form of observations of random variables. Itâs therefore hard to come up with a likelihood function based on this evidence, and so I donât have a good recommendation for what the cosmologist should say instead. But I think the original problem of how they end up with a 1 in 230 probability is due to a failed attempt to avoid this by appealing to an non-informative prior over order of magnitude.
There is also a meta problem where the prior will tend to be too high rather than too low, because probabilities canât go below zero, and this leads to people on average being overly spooked by low probability events
Îł being the âtrue probabilityâ. Iâm using Îł rather than p because 1) in general parameters of probability distributions donât need to be probabilities themselves, e.g. the mean of a normal distribution, 2) Îł is a random variable in this case, so talking about the probability of p taking a certain value could be confusing 3) itâs what is used in the linked Wikipedia article on Jeffreys priors
â«10aakÎłdÎł=kln(10)=constant
There is some controversy about whether this the right prior to use, but whatever the right one is it would give E(Îł)=0.5
For some things you can make a mutual exclusivity + uncertainty argument for why the probability should be low. E.g. for the case of the universe riding on the back of the turtle you could consider all the other types of animals it could be riding on the back of, and point out that you have no particular reason to prefer a turtle. For the simulation argument and various other cases itâs trickier because they might be consistent with lots of other things, but you can still appeal to Occamâs razor and/âor viewing this as an empirical fact about the universe
you are saying that if there are n possibilities and we have no evidence then probability is 1/ân rather than 1â2. But for these topics there is an infinite set of discrete possibilities. Since the set is infinite it doesnât make sense to have uniform prior because there is no way for the probabilities to add up to 1.