Brilliant and compelling writeup, Remmelt! Thank you so much for sharing it. (And thank you so much for your kind words about my post! I really appreciate it.)
I strongly agree with you that mechanistic interpretability is very unlikely to contribute to long-term AI safety. To put it bluntly, the fact that so many talented and well-meaning people sink their time into this unpromising research direction is unfortunate.
I think we AI safety researchers should be more open to new ideas and approaches, rather than getting stuck in the same old research directions that we know are unlikely to meaningfully help. The post “What an actually pessimistic containment strategy looks like” has potentially good ideas on this front.
Brilliant and compelling writeup, Remmelt! Thank you so much for sharing it. (And thank you so much for your kind words about my post! I really appreciate it.)
I strongly agree with you that mechanistic interpretability is very unlikely to contribute to long-term AI safety. To put it bluntly, the fact that so many talented and well-meaning people sink their time into this unpromising research direction is unfortunate.
I think we AI safety researchers should be more open to new ideas and approaches, rather than getting stuck in the same old research directions that we know are unlikely to meaningfully help. The post “What an actually pessimistic containment strategy looks like” has potentially good ideas on this front.
Yes, I think we independently came at similar conclusions.