stecas comments on Nobody’s on the ball on AGI alignment

stecas 11 Apr 2023 19:18 UTC
3 points
0 ∶ 0
I’m very supportive of this post. Also I will shamelessly share here a sequence I posted in February called “The Engineer’s Interpretability Sequence”. One of the main messages of the sequence could be described as how existing mechanistic interpretability research is not on the ball.
https://www.alignmentforum.org/s/a6ne2ve5uturEEQK7