Quoting Nate’s supplement from OpenPhil’s review of “Proof-producing reflection for HOL” (PPRHOL) :
there are basic gaps in our models of what it means to do good reasoning (especially when it comes
to things like long-running computations, and doubly so when those computations are the
reasoner’s source code)
How far along the way are you towards narrowing these gaps, now that “Logical Induction” is a thing people can talk about? Are there variants of it that narrow these gaps, or are there planned follow-ups to PPRHOL that might improve our models? What kinds of experiments seem valuable for this subgoal?
I endorse Tsvi’s comment above. I’ll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we’re taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)
Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world—and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns high probability to” and replace “true” with “probably true”.
It is an active area of research to understand logical induction more deeply, and to apply it to decision-theoretic problems that require reflective properties. For example, the current framework uses “traders” that express their “beliefs” in terms of strategies for making trades against the market prices (probabilities) output by a logical inductor. Then traders are rewarded based on buying shares in sentences that turn out to be highly valued by later market prices. It would be nice to understand this process in Bayesian terms, e.g., where traders are hypotheses that output predictions about the market and have their probabilities updated by Bayesian updating.
Quoting Nate’s supplement from OpenPhil’s review of “Proof-producing reflection for HOL” (PPRHOL) :
How far along the way are you towards narrowing these gaps, now that “Logical Induction” is a thing people can talk about? Are there variants of it that narrow these gaps, or are there planned follow-ups to PPRHOL that might improve our models? What kinds of experiments seem valuable for this subgoal?
I endorse Tsvi’s comment above. I’ll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we’re taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)
Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world—and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns high probability to” and replace “true” with “probably true”.
It is an active area of research to understand logical induction more deeply, and to apply it to decision-theoretic problems that require reflective properties. For example, the current framework uses “traders” that express their “beliefs” in terms of strategies for making trades against the market prices (probabilities) output by a logical inductor. Then traders are rewarded based on buying shares in sentences that turn out to be highly valued by later market prices. It would be nice to understand this process in Bayesian terms, e.g., where traders are hypotheses that output predictions about the market and have their probabilities updated by Bayesian updating.