This is totally spitballing, but doing anything that encourages modularity in the circuits (or perhaps at another level?) of the AIs and the ability to swap mind modules would be really good for interpretability.
Ever since this project, I’ve had a vague sense that genome architecture has something interesting to teach us about interpreting/predicting NNs, but I’ve never had a particularly useful insight from it. Love this book on it by Micheal Lynch if anyone’s interested.
This is totally spitballing, but doing anything that encourages modularity in the circuits (or perhaps at another level?) of the AIs and the ability to swap mind modules would be really good for interpretability.
Ever since this project, I’ve had a vague sense that genome architecture has something interesting to teach us about interpreting/predicting NNs, but I’ve never had a particularly useful insight from it. Love this book on it by Micheal Lynch if anyone’s interested.