David Johnston comments on On the Universal Distribution

David Johnston 29 Oct 2021 22:01 UTC
1 point
0 ∶ 0
One thing to think about: in order to reason about “observations” using mathematical theory, we need to (and do) convert then into mathematical things. Probability theory can only address the mathematical things we get in the end.

Most schemes for doing this ignore a lot of important stuff. E.g. “measure my height in cm, write down the answer” is a procedure that produces a real number, but also one that is indifferent to almost every “observation” I might care about in the future.

(The quotes around observation are to indicate that I don’t know if it’s exactly the right word).

One thing we could try to to is to propose a scheme for mathematising every observation we care about. One way we could try to do this is to try to come up with a sequence of questions “are my observations like X or like not X?”. Then the mathematical object our observations become will be a binary sequence. In practice, this will never solve the problem of distinguishing any two observations we care to distinguish, but maybe imagining something like this that goes on forever is not a bad idealization in the sense that we might care less and less about the remaining undistinguished observations.

Can this story capture something like the tale of the universal prior? The problem here is that what I’ve described looks a bit like a Turing machine—it outputs a sequence of binary digits—but it isn’t a Turing machine because it has no well-defined domain. In fact, the problem of getting from a vague domain to something mathematical is what it was meant to solve to begin with.

One way we can conceptualize of inputs to this process is to postulate “more powerful observers”. For example, if I turn an observation into n binary questions, a more powerful observer is one that asks the same n questions and also asks one more. Then our “observation process” is a Turing machine that takes the output of the more powerful observer and drops the last digit.

However, if we consider the n->infinity limit of this, it seems consistent to me that the more powerful observer could be an anti-inductor or a randomiser vs us every step of the way.

So it seems that this story at least requires an assumption like “we can eventually predict the more powerful observer perfectly”.

There are lots of other ways to make more powerful observers, they just need to be capable of distinguishing everything our observation process distinguishes.