Perhaps this is a nice explanation for some people with mathematical or statistical knowledge, but alas it goes way over my head.
(Specifically, I get lost here: “ We just consider all probability distributions that predict that the odd bits will be zero with probability one, and without saying anything at all—the even bits, they can be anything.”)
(Granted, I now at least think I understand what a convex set is, although I fail to understand its relevance in this conversation.)
In 1D, convex sets are just intervals, [a, b], and convex sets of probability distributions basically correspond to intervals of probability values, e.g. [0.1, 0.5], which are often called “imprecise probabilities”.
Let’s generalize this idea to 2D. There are two events, A and B, which I am uncertain about. If I were really confident, I could say that I think A happens with probability 0.2, and B happens with probability 0.8. But what if I feel so ignorant that I can’t assign a probability to event B? That means I think P(B) could be any probability between [0.0, 1.0], while keeping P(A) = 0.2. So my joint probability distribution P(A, B) is somewhere within the line segment (0.2, 0.0) to (0.2, 1.0). Line segments are a convex set.
You can generalize this notion to infinite dimensions—e.g. for a bit sequence of infinite length, specifying a complete probability distribution would require saying how probable each bit is likely to be equal to 1, conditioned on the values of all of the other bits. But we could instead only assign probabilities to the odd bits, not the even bits, and that would correspond to a convex set of probability distributions.
Hopefully that explains the convex set bit. The other part is why it’s better to use convex sets. Well, one reason is that sometimes we might be unwilling to specify a probability distribution, because we know the true underlying process is uncomputable. This problem arises, for example, when an agent is trying to simulate itself. I* can never perfectly simulate a copy of myself within my mind, even probabilistically, because that leads to infinite regress—this sort of paradox is related to the halting problem and Godel’s incompleteness theorem.
In at least these cases it seems better to say “I don’t know how to simulate this part of me”, rather pretending I can assign a computable distribution to how I will behave. For example, if I don’t know if I’m going to finish writing this comment in 5 minutes, I can assign it the imprecise probability [0.2, 1.0]. And then if I want to act safely, I just assume the worst case outcomes for the parts of me I don’t know how to simulate, and act accordingly. This applies to other parts of the world I can’t simulate as well—the physical world (which contains me), or simply other agents I have reason to believe are smarter than me.
(*I’m using “I” here, but I really mean some model or computer that is capable of more precise simulation and prediction than humans are capable of.)
Does it make more sense to think about all probability distributions that offers a probability of 50% for rain tomorrow? If we say this represents our epistemic state, then we’re saying something like “the probability of rain tomorrow is 50%, and we withhold judgement about rain on any other day”.
It feels more natural, but I’m unclear what this example is trying to prove. It still reads to me like “if we think rain is 50% likely tomorrow then it makes sense to say rain is 50% likely tomorrow” (which I realize is presumably not what is meant, but it’s how it feels).
Perhaps this is a nice explanation for some people with mathematical or statistical knowledge, but alas it goes way over my head.
(Specifically, I get lost here: “ We just consider all probability distributions that predict that the odd bits will be zero with probability one, and without saying anything at all—the even bits, they can be anything.”)
(Granted, I now at least think I understand what a convex set is, although I fail to understand its relevance in this conversation.)
Fair point! Sorry it wasn’t the most helpful. My attempt at explaining a bit more below:
Convex sets are just sets where each point in the set can be expressed as weighted sum of the points on the exterior of the set, e.g.:
(source: https://reference.wolfram.com/language/ref/ConvexHullMesh.html)
In 1D, convex sets are just intervals, [a, b], and convex sets of probability distributions basically correspond to intervals of probability values, e.g. [0.1, 0.5], which are often called “imprecise probabilities”.
Let’s generalize this idea to 2D. There are two events, A and B, which I am uncertain about. If I were really confident, I could say that I think A happens with probability 0.2, and B happens with probability 0.8. But what if I feel so ignorant that I can’t assign a probability to event B? That means I think P(B) could be any probability between [0.0, 1.0], while keeping P(A) = 0.2. So my joint probability distribution P(A, B) is somewhere within the line segment (0.2, 0.0) to (0.2, 1.0). Line segments are a convex set.
You can generalize this notion to infinite dimensions—e.g. for a bit sequence of infinite length, specifying a complete probability distribution would require saying how probable each bit is likely to be equal to 1, conditioned on the values of all of the other bits. But we could instead only assign probabilities to the odd bits, not the even bits, and that would correspond to a convex set of probability distributions.
Hopefully that explains the convex set bit. The other part is why it’s better to use convex sets. Well, one reason is that sometimes we might be unwilling to specify a probability distribution, because we know the true underlying process is uncomputable. This problem arises, for example, when an agent is trying to simulate itself. I* can never perfectly simulate a copy of myself within my mind, even probabilistically, because that leads to infinite regress—this sort of paradox is related to the halting problem and Godel’s incompleteness theorem.
In at least these cases it seems better to say “I don’t know how to simulate this part of me”, rather pretending I can assign a computable distribution to how I will behave. For example, if I don’t know if I’m going to finish writing this comment in 5 minutes, I can assign it the imprecise probability [0.2, 1.0]. And then if I want to act safely, I just assume the worst case outcomes for the parts of me I don’t know how to simulate, and act accordingly. This applies to other parts of the world I can’t simulate as well—the physical world (which contains me), or simply other agents I have reason to believe are smarter than me.
(*I’m using “I” here, but I really mean some model or computer that is capable of more precise simulation and prediction than humans are capable of.)
Does it make more sense to think about all probability distributions that offers a probability of 50% for rain tomorrow? If we say this represents our epistemic state, then we’re saying something like “the probability of rain tomorrow is 50%, and we withhold judgement about rain on any other day”.
It feels more natural, but I’m unclear what this example is trying to prove. It still reads to me like “if we think rain is 50% likely tomorrow then it makes sense to say rain is 50% likely tomorrow” (which I realize is presumably not what is meant, but it’s how it feels).