Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world—and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns high probability to” and replace “true” with “probably true”.
It is an active area of research to understand logical induction more deeply, and to apply it to decision-theoretic problems that require reflective properties. For example, the current framework uses “traders” that express their “beliefs” in terms of strategies for making trades against the market prices (probabilities) output by a logical inductor. Then traders are rewarded based on buying shares in sentences that turn out to be highly valued by later market prices. It would be nice to understand this process in Bayesian terms, e.g., where traders are hypotheses that output predictions about the market and have their probabilities updated by Bayesian updating.
Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world—and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns high probability to” and replace “true” with “probably true”.
It is an active area of research to understand logical induction more deeply, and to apply it to decision-theoretic problems that require reflective properties. For example, the current framework uses “traders” that express their “beliefs” in terms of strategies for making trades against the market prices (probabilities) output by a logical inductor. Then traders are rewarded based on buying shares in sentences that turn out to be highly valued by later market prices. It would be nice to understand this process in Bayesian terms, e.g., where traders are hypotheses that output predictions about the market and have their probabilities updated by Bayesian updating.