Agree that values have to be learned indirectly (as we do), but I’m either skeptical or confused about what could uncharitably be framed as “inserting a math equation into an AI”. Making an idea technically precise, or specifying it with mathy symbols, does not guarantee that it can be used for training a neural network (which realistically is the only paradigm of AI we can work with, I think).
We will probably have to find a balance between a precisely specified loss function we believe will indirectly lead to good values, and a loss function that we have a large amount of corresponding training data for.
Feel free to ignore the following section, it’s a hurried rambling before breakfast.
||One way of potentially getting extra power out a limited set of data, is to first 1) constrain the amount of information the network receives about its true loss function per unit of computation (i.e. an information bottleneck), maybe via something like training on a larger “batch size”. And then 2) use something like temporal difference learning so the network gets trained on a proxy (an estimator) for the true loss function that it refines over time.
This would amount to having a fluid proximal (high-information) reward function and a fixed distal (low-information) reward function. The brain does something similar, and it predictably leads to mesa-optimisation if the influence from proximal rewards dominates the distal ones. The trick would then be to balance their respective learning rates so the distal rewards always constrains the evolution of the network more than the estimator. While the evolution of DNA resulted in a mesa-optimisation “catastrophe”, we have the advantage that we get to monitor the network and intervene intelligently on the learning process in real-time.||
Nice succinct post.
A related reference you might like https://www.lesswrong.com/posts/NjYdGP59Krhie4WBp/updating-utility-functions that goes into getting it to care about what we want before it knows what we want.
Thanks!
Agree that values have to be learned indirectly (as we do), but I’m either skeptical or confused about what could uncharitably be framed as “inserting a math equation into an AI”. Making an idea technically precise, or specifying it with mathy symbols, does not guarantee that it can be used for training a neural network (which realistically is the only paradigm of AI we can work with, I think).
We will probably have to find a balance between a precisely specified loss function we believe will indirectly lead to good values, and a loss function that we have a large amount of corresponding training data for.
Feel free to ignore the following section, it’s a hurried rambling before breakfast.
||One way of potentially getting extra power out a limited set of data, is to first 1) constrain the amount of information the network receives about its true loss function per unit of computation (i.e. an information bottleneck), maybe via something like training on a larger “batch size”. And then 2) use something like temporal difference learning so the network gets trained on a proxy (an estimator) for the true loss function that it refines over time.
This would amount to having a fluid proximal (high-information) reward function and a fixed distal (low-information) reward function. The brain does something similar, and it predictably leads to mesa-optimisation if the influence from proximal rewards dominates the distal ones. The trick would then be to balance their respective learning rates so the distal rewards always constrains the evolution of the network more than the estimator. While the evolution of DNA resulted in a mesa-optimisation “catastrophe”, we have the advantage that we get to monitor the network and intervene intelligently on the learning process in real-time.||