Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I thought that this post was neat. I was already familiar with Solomonoff induction, but the post still taught me a few things.
Not necessarily true! See Scott Aaronson on this (but iirc, he makes some assumptions I disagreed with)
I also doubt there’s a way around the “what if ‘0X’ is our first big hit?” problem without making further assumptions (like a speed prior).
Here’s another way to approximate the length of the shortest program that prints out a given string from above: interleave the programs and run them alternatively, increasing the number of different programs being run. Enumerate all of the programs, P1, P2, P3, by length in non-decreasing order.
Then, for each of the steps below, run exactly one step of each program in the step’s list:
1. P1
2. P1, P2
3. P1, P2, P3
4. P1, P2, P3, P4
...
n. P1, P2, …, Pn
...
i.e. at step n, run one more step of programs P1, P2, …, Pn.
Do this while keeping track of the shortest program so far that printed your string (and you can stop running programs that halted or programs that are as long as or longer than your shortest so far). Eventually, you would find the shortest program that prints your string and record it as the shortest so far, and in some cases, you would know that it’s the shortest, if all the shorter programs halt and you wait long enough for them to do so. In many cases, your upper and lower bounds (the length of the shortest program that hasn’t halted yet) will never match, since your lower bound program doesn’t halt.
If you have an infinite enumerable list of hypotheses, you can do the same trick and interleave those as well, and keep track of shortest programs for each hypothesis. With a finite list of hypotheses, you don’t need to interleave, and you can just start them all at the same time.
Of course, this is probably slow, but if you want to make it orders of magnitude faster, I’d expect you to need more assumptions (or maybe use randomness?).
One thing to think about: in order to reason about “observations” using mathematical theory, we need to (and do) convert then into mathematical things. Probability theory can only address the mathematical things we get in the end.
Most schemes for doing this ignore a lot of important stuff. E.g. “measure my height in cm, write down the answer” is a procedure that produces a real number, but also one that is indifferent to almost every “observation” I might care about in the future.
(The quotes around observation are to indicate that I don’t know if it’s exactly the right word).
One thing we could try to to is to propose a scheme for mathematising every observation we care about. One way we could try to do this is to try to come up with a sequence of questions “are my observations like X or like not X?”. Then the mathematical object our observations become will be a binary sequence. In practice, this will never solve the problem of distinguishing any two observations we care to distinguish, but maybe imagining something like this that goes on forever is not a bad idealization in the sense that we might care less and less about the remaining undistinguished observations.
Can this story capture something like the tale of the universal prior? The problem here is that what I’ve described looks a bit like a Turing machine—it outputs a sequence of binary digits—but it isn’t a Turing machine because it has no well-defined domain. In fact, the problem of getting from a vague domain to something mathematical is what it was meant to solve to begin with.
One way we can conceptualize of inputs to this process is to postulate “more powerful observers”. For example, if I turn an observation into n binary questions, a more powerful observer is one that asks the same n questions and also asks one more. Then our “observation process” is a Turing machine that takes the output of the more powerful observer and drops the last digit.
However, if we consider the n->infinity limit of this, it seems consistent to me that the more powerful observer could be an anti-inductor or a randomiser vs us every step of the way.
So it seems that this story at least requires an assumption like “we can eventually predict the more powerful observer perfectly”.
There are lots of other ways to make more powerful observers, they just need to be capable of distinguishing everything our observation process distinguishes.