[I learned the following from Tom Davidson, as well as that this perspective goes back to at least Carnap. Note that all of this essentially is just an explanation of beta distributions.]
Laplace’s Rule of Succession really is a special case of a more general family of rules. One useful way of describing the general family is as follows:
Recall that Laplace’s Rule of Succession essentially describes a prior for Bernoulli experiments; i.e. a series of independent trials with a binary outcome of success or failure. E.g. every day we observe whether the sun rises (‘success’) or not (‘failure’) [and, perhaps wrongly, we assume that whether the sun rises on one day is independent from whether it rose on any other day].
The family of priors is as follows: We pretend that prior to any actual trials we’ve seen N_v “virtual trials”, among which were M_v successes. Then at any point after having seen N_a actual trials with M_a successes, we adopt the maximum likelihood estimate for the success probability p of a single trial based on both virtual and actual observations. I.e.,
p = (M_v + M_a) / (N_v + N_a).
Laplace’s Rule of Succession simply is the special case for N_v = 2and M_v = 1. In particular, this means that before the first actual trial we expect it to succeed with probability 1⁄2. But Laplace’s Rule isn’t the only prior with that property! We’d also expect the first trial to succeed with probability 1⁄2 if we took, e.g., N_v = 42 and M_v = 21. The difference compared to Laplace’s Rule would be that our estimate for p will move much slower in response to actual observations—intuitively we’ll need 42 actual observations until they get the same weight as virtual observations, whereas for Laplace’s Rule this happens after 2 actual observations.
And of course, we don’t have to “start” with p = 1⁄2 either—by varying N_v and M_v we can set this to any value.
[I learned the following from Tom Davidson, as well as that this perspective goes back to at least Carnap. Note that all of this essentially is just an explanation of beta distributions.]
Laplace’s Rule of Succession really is a special case of a more general family of rules. One useful way of describing the general family is as follows:
Recall that Laplace’s Rule of Succession essentially describes a prior for Bernoulli experiments; i.e. a series of independent trials with a binary outcome of success or failure. E.g. every day we observe whether the sun rises (‘success’) or not (‘failure’) [and, perhaps wrongly, we assume that whether the sun rises on one day is independent from whether it rose on any other day].
The family of priors is as follows: We pretend that prior to any actual trials we’ve seen N_v “virtual trials”, among which were M_v successes. Then at any point after having seen N_a actual trials with M_a successes, we adopt the maximum likelihood estimate for the success probability p of a single trial based on both virtual and actual observations. I.e.,
Laplace’s Rule of Succession simply is the special case for N_v = 2 and M_v = 1. In particular, this means that before the first actual trial we expect it to succeed with probability 1⁄2. But Laplace’s Rule isn’t the only prior with that property! We’d also expect the first trial to succeed with probability 1⁄2 if we took, e.g., N_v = 42 and M_v = 21. The difference compared to Laplace’s Rule would be that our estimate for p will move much slower in response to actual observations—intuitively we’ll need 42 actual observations until they get the same weight as virtual observations, whereas for Laplace’s Rule this happens after 2 actual observations.
And of course, we don’t have to “start” with p = 1⁄2 either—by varying N_v and M_v we can set this to any value.