I think I disagree that that is the right maximum entropy prior in my ball example.
You know that you are drawing balls without replacement from a bag containing 100 balls, which can only be coloured blue or red. The maximum entropy prior given this information is that every one of the 2^100 possible colourings {Ball 1, Ball 2, Ball 3, …} → {Red, Blue} is equally likely (i.e. from the start the probability that all balls are red is 1 over 2^100).
I think the model you describe is only the correct approach if you make an additional assumption that all balls were coloured using an identical procedure, and were assigned to red or blue with some unknown, but constant, probability p. But that is an additional assumption. The assumption that the unknown p is the same for each ball is actually a very strong assumption.
If you want to adopt the maximum entropy prior consistent with the information I gave in the set-up of the problem, you’d adopt a prior where each of the 2^100 possible colourings are equally likely.
I think this is the right way to think about it anyway.
The re-paremetrisation example is very nice though, I wasn’t aware of that before.
Thanks for the clarification—I see your concern more clearly now. You’re right, my model does assume that all balls were coloured using the same procedure, in some sense—I’m assuming they’re independently and identically distributed.
Your case is another reasonable way to apply the maximum entropy principle and I think it’s points to another problem with the maximum entropy principle but I think I’d frame it slightly differently. I don’t think that the maximum entropy principle is actually directly problematic in the case you describe. If we assume that all balls are coloured by completely different procedures (i.e. so that the colour of one ball doesn’t tell us anything about the colours of the other balls), then seeing 99 red balls doesn’t tell us anything about the final ball. In that case, I think it’s reasonable (even required!) to have a 50% credence that it’s red and unreasonable to have a 99% credence, if your prior was 50%. If you find that result counterintuitive, then I think that’s more of a challenge to the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others rather than a challenge to the maximum entropy principle. (I appreciate you want to assume nothing about the colouring processes rather than making the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others, but in setting up your model this way, I think you’re assuming that implicitly.)
Perhaps another way to see this: if you don’t follow the maximum entropy principle and instead have a prior of 30% that the final ball is red and then draw 99 red balls, in your scenario, you should maintain 30% credence (if you don’t, then you’ve assumed something about the colouring process that makes the balls not independent). If you find that counterintuitive, then the issue is with the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others because we haven’t used the principle of maximum entropy in that case.
I think this actually points to a different problem with the maximum entropy principle in practice: we rarely come from a position of complete ignorance (or complete ignorance besides a given mean, variance etc.), so it’s actually rarely applicable. Following the principle sometimes gives counterintuive/​unreasonable results because we actually know a lot more than we realise and we lose much of that information when we apply the maximum entropy principle.
I think I disagree with your claim that I’m implicitly assuming independence of the ball colourings.
I start by looking for the maximum entropy distribution within all possible probability distributions over the 2^100 possible colourings. Most of these probability distributions do not have the property that balls are coloured independently. For example, if the distribution was a 50% probability of all balls being red, and 50% probability of all balls being blue, then learning the colour of a single ball would immediately tell you the colour of all of the others.
But it just so happens that for the probability distribution which maximises the entropy, the ball colourings do turn out to be independent. If you adopt the maximum entropy distribution as your prior, then learning the colour of one tells you nothing about the others. This is an output of the calculation, rather than an assumption.
I think I agree with your last paragraph, although there are some real problems here that I don’t know how to solve. Why should we expect any of our existing knowledge to be a good guide to what we will observe in future? It has been a good guide in the past, but so what? 99 red balls apparently doesn’t tell us that the 100th will likely be red, for certain seemingly reasonable choices of prior.
I guess what I was trying to say in my first comment is that the maximum entropy principle is not a solution to the problem of induction, or even an approximate solution. Ultimately, I don’t think anyone knows how to choose priors in a properly principled way. But I’d very much like to be corrected on this.
As a side-note, the maximum entropy principle would tell you to choose the maximum entropy prior given the information you have, and so if you intuit the information that the balls are likely to be produced by the same process, you’ll get a different prior that if you don’t have that information.
I.e., your disagreement might stem from the fact that the maximum entropy principle gives different answers conditional on different information.
I.e., you actually have information to differentiate between drawing n balls and flipping a fair coin n times.
I think I disagree that that is the right maximum entropy prior in my ball example.
You know that you are drawing balls without replacement from a bag containing 100 balls, which can only be coloured blue or red. The maximum entropy prior given this information is that every one of the 2^100 possible colourings {Ball 1, Ball 2, Ball 3, …} → {Red, Blue} is equally likely (i.e. from the start the probability that all balls are red is 1 over 2^100).
I think the model you describe is only the correct approach if you make an additional assumption that all balls were coloured using an identical procedure, and were assigned to red or blue with some unknown, but constant, probability p. But that is an additional assumption. The assumption that the unknown p is the same for each ball is actually a very strong assumption.
If you want to adopt the maximum entropy prior consistent with the information I gave in the set-up of the problem, you’d adopt a prior where each of the 2^100 possible colourings are equally likely.
I think this is the right way to think about it anyway.
The re-paremetrisation example is very nice though, I wasn’t aware of that before.
Thanks for the clarification—I see your concern more clearly now. You’re right, my model does assume that all balls were coloured using the same procedure, in some sense—I’m assuming they’re independently and identically distributed.
Your case is another reasonable way to apply the maximum entropy principle and I think it’s points to another problem with the maximum entropy principle but I think I’d frame it slightly differently. I don’t think that the maximum entropy principle is actually directly problematic in the case you describe. If we assume that all balls are coloured by completely different procedures (i.e. so that the colour of one ball doesn’t tell us anything about the colours of the other balls), then seeing 99 red balls doesn’t tell us anything about the final ball. In that case, I think it’s reasonable (even required!) to have a 50% credence that it’s red and unreasonable to have a 99% credence, if your prior was 50%. If you find that result counterintuitive, then I think that’s more of a challenge to the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others rather than a challenge to the maximum entropy principle. (I appreciate you want to assume nothing about the colouring processes rather than making the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others, but in setting up your model this way, I think you’re assuming that implicitly.)
Perhaps another way to see this: if you don’t follow the maximum entropy principle and instead have a prior of 30% that the final ball is red and then draw 99 red balls, in your scenario, you should maintain 30% credence (if you don’t, then you’ve assumed something about the colouring process that makes the balls not independent). If you find that counterintuitive, then the issue is with the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others because we haven’t used the principle of maximum entropy in that case.
I think this actually points to a different problem with the maximum entropy principle in practice: we rarely come from a position of complete ignorance (or complete ignorance besides a given mean, variance etc.), so it’s actually rarely applicable. Following the principle sometimes gives counterintuive/​unreasonable results because we actually know a lot more than we realise and we lose much of that information when we apply the maximum entropy principle.
I think I disagree with your claim that I’m implicitly assuming independence of the ball colourings.
I start by looking for the maximum entropy distribution within all possible probability distributions over the 2^100 possible colourings. Most of these probability distributions do not have the property that balls are coloured independently. For example, if the distribution was a 50% probability of all balls being red, and 50% probability of all balls being blue, then learning the colour of a single ball would immediately tell you the colour of all of the others.
But it just so happens that for the probability distribution which maximises the entropy, the ball colourings do turn out to be independent. If you adopt the maximum entropy distribution as your prior, then learning the colour of one tells you nothing about the others. This is an output of the calculation, rather than an assumption.
I think I agree with your last paragraph, although there are some real problems here that I don’t know how to solve. Why should we expect any of our existing knowledge to be a good guide to what we will observe in future? It has been a good guide in the past, but so what? 99 red balls apparently doesn’t tell us that the 100th will likely be red, for certain seemingly reasonable choices of prior.
I guess what I was trying to say in my first comment is that the maximum entropy principle is not a solution to the problem of induction, or even an approximate solution. Ultimately, I don’t think anyone knows how to choose priors in a properly principled way. But I’d very much like to be corrected on this.
As a side-note, the maximum entropy principle would tell you to choose the maximum entropy prior given the information you have, and so if you intuit the information that the balls are likely to be produced by the same process, you’ll get a different prior that if you don’t have that information.
I.e., your disagreement might stem from the fact that the maximum entropy principle gives different answers conditional on different information.
I.e., you actually have information to differentiate between drawing n balls and flipping a fair coin n times.