There is a sense in which making odds additive is deeply weird, so this doesn’t yet feel natural to me. In particular, I’m more used to thinking in terms of bits, where, e..g, people not wanting to is ~one bit against nuclear war, Putin sort of threatening it is ~one bit in favor, and I generally expect the Laplace estimator to overshoot by ~one bit.
It’s not about the odds; it’s about Beta distribution. You are right to be suspicious about the addition of odds, but there is nothing wrong with adding shape parameters of Beta distributions.
I don’t want to go into many details but a teaser for readers:
One want’s to figure out p, probability of some event happening.
One starts with some prior about p; if it’s uniform over [0,1], the prior is B(1,1).
If one then observes a successes and b failures, one would update to Beta(a+1,b+1).
If one then wants to get the probability of success next time, one needs to integrate over possible p (basically to take expected value). It would lead to ˆp=a+1a+b+2.
For Laplace’s law of succession, you start with B(1,1), observe n failures and update to B(1,n+1). And your estimate is 1n+2. In this context, Pablo suggests starting with different prior of B(1,20) (which corresponds to the probability of success 121 and odds of 1:20) to then update to B(1,90) after observing 70 failures.
There is a sense in which making odds additive is deeply weird, so this doesn’t yet feel natural to me. In particular, I’m more used to thinking in terms of bits, where, e..g, people not wanting to is ~one bit against nuclear war, Putin sort of threatening it is ~one bit in favor, and I generally expect the Laplace estimator to overshoot by ~one bit.
But it’s a cool idea.
It’s not about the odds; it’s about Beta distribution. You are right to be suspicious about the addition of odds, but there is nothing wrong with adding shape parameters of Beta distributions.
I don’t want to go into many details but a teaser for readers:
One want’s to figure out p, probability of some event happening.
One starts with some prior about p; if it’s uniform over [0,1], the prior is B(1,1).
If one then observes a successes and b failures, one would update to Beta(a+1,b+1).
If one then wants to get the probability of success next time, one needs to integrate over possible p (basically to take expected value). It would lead to ˆp=a+1a+b+2.
For Laplace’s law of succession, you start with B(1,1), observe n failures and update to B(1,n+1). And your estimate is 1n+2. In this context, Pablo suggests starting with different prior of B(1,20) (which corresponds to the probability of success 121 and odds of 1:20) to then update to B(1,90) after observing 70 failures.
I can confirm this is correct.
By the way, a similar modelling approach (Beta prior, binomial likelihood function) was used in this report.
This (pointed me to something that) makes so much more sense, thanks Misha, strongly updated.