I’ve been using the models I’ve been learning to understand the problems associated with inner alignment to model evolution during this discussion, as it is a stochastic gradient descent process, so many of the arguments for properties that trained models should have can be applied to evolutionary processes.
So I guess you can start with Hubinger et al’s Risks from Learned Optimization? But this seems a nonstandard approach to trying to learn evolutionary biology.
I’ve been using the models I’ve been learning to understand the problems associated with inner alignment to model evolution during this discussion, as it is a stochastic gradient descent process, so many of the arguments for properties that trained models should have can be applied to evolutionary processes.
So I guess you can start with Hubinger et al’s Risks from Learned Optimization? But this seems a nonstandard approach to trying to learn evolutionary biology.