In 1D, convex sets are just intervals, [a, b], and convex sets of probability distributions basically correspond to intervals of probability values, e.g. [0.1, 0.5], which are often called “imprecise probabilities”.
Let’s generalize this idea to 2D. There are two events, A and B, which I am uncertain about. If I were really confident, I could say that I think A happens with probability 0.2, and B happens with probability 0.8. But what if I feel so ignorant that I can’t assign a probability to event B? That means I think P(B) could be any probability between [0.0, 1.0], while keeping P(A) = 0.2. So my joint probability distribution P(A, B) is somewhere within the line segment (0.2, 0.0) to (0.2, 1.0). Line segments are a convex set.
You can generalize this notion to infinite dimensions—e.g. for a bit sequence of infinite length, specifying a complete probability distribution would require saying how probable each bit is likely to be equal to 1, conditioned on the values of all of the other bits. But we could instead only assign probabilities to the odd bits, not the even bits, and that would correspond to a convex set of probability distributions.
Hopefully that explains the convex set bit. The other part is why it’s better to use convex sets. Well, one reason is that sometimes we might be unwilling to specify a probability distribution, because we know the true underlying process is uncomputable. This problem arises, for example, when an agent is trying to simulate itself. I* can never perfectly simulate a copy of myself within my mind, even probabilistically, because that leads to infinite regress—this sort of paradox is related to the halting problem and Godel’s incompleteness theorem.
In at least these cases it seems better to say “I don’t know how to simulate this part of me”, rather pretending I can assign a computable distribution to how I will behave. For example, if I don’t know if I’m going to finish writing this comment in 5 minutes, I can assign it the imprecise probability [0.2, 1.0]. And then if I want to act safely, I just assume the worst case outcomes for the parts of me I don’t know how to simulate, and act accordingly. This applies to other parts of the world I can’t simulate as well—the physical world (which contains me), or simply other agents I have reason to believe are smarter than me.
(*I’m using “I” here, but I really mean some model or computer that is capable of more precise simulation and prediction than humans are capable of.)
Fair point! Sorry it wasn’t the most helpful. My attempt at explaining a bit more below:
Convex sets are just sets where each point in the set can be expressed as weighted sum of the points on the exterior of the set, e.g.:
(source: https://reference.wolfram.com/language/ref/ConvexHullMesh.html)
In 1D, convex sets are just intervals, [a, b], and convex sets of probability distributions basically correspond to intervals of probability values, e.g. [0.1, 0.5], which are often called “imprecise probabilities”.
Let’s generalize this idea to 2D. There are two events, A and B, which I am uncertain about. If I were really confident, I could say that I think A happens with probability 0.2, and B happens with probability 0.8. But what if I feel so ignorant that I can’t assign a probability to event B? That means I think P(B) could be any probability between [0.0, 1.0], while keeping P(A) = 0.2. So my joint probability distribution P(A, B) is somewhere within the line segment (0.2, 0.0) to (0.2, 1.0). Line segments are a convex set.
You can generalize this notion to infinite dimensions—e.g. for a bit sequence of infinite length, specifying a complete probability distribution would require saying how probable each bit is likely to be equal to 1, conditioned on the values of all of the other bits. But we could instead only assign probabilities to the odd bits, not the even bits, and that would correspond to a convex set of probability distributions.
Hopefully that explains the convex set bit. The other part is why it’s better to use convex sets. Well, one reason is that sometimes we might be unwilling to specify a probability distribution, because we know the true underlying process is uncomputable. This problem arises, for example, when an agent is trying to simulate itself. I* can never perfectly simulate a copy of myself within my mind, even probabilistically, because that leads to infinite regress—this sort of paradox is related to the halting problem and Godel’s incompleteness theorem.
In at least these cases it seems better to say “I don’t know how to simulate this part of me”, rather pretending I can assign a computable distribution to how I will behave. For example, if I don’t know if I’m going to finish writing this comment in 5 minutes, I can assign it the imprecise probability [0.2, 1.0]. And then if I want to act safely, I just assume the worst case outcomes for the parts of me I don’t know how to simulate, and act accordingly. This applies to other parts of the world I can’t simulate as well—the physical world (which contains me), or simply other agents I have reason to believe are smarter than me.
(*I’m using “I” here, but I really mean some model or computer that is capable of more precise simulation and prediction than humans are capable of.)