I’m very supportive of this post. Also I will shamelessly share here a sequence I posted in February called “The Engineer’s Interpretability Sequence”. One of the main messages of the sequence could be described as how existing mechanistic interpretability research is not on the ball.
I’m very supportive of this post. Also I will shamelessly share here a sequence I posted in February called “The Engineer’s Interpretability Sequence”. One of the main messages of the sequence could be described as how existing mechanistic interpretability research is not on the ball.
https://www.alignmentforum.org/s/a6ne2ve5uturEEQK7