Note: When an earlier private version of these notes was circulated, a senior figure in technical AI safety strongly contested my description. They believe that the Anthropic SAE work is much more valuable than the independent SAE work, as both were published around the same time, but the Anthropic work provides sufficient evidence to be worth extending by other researchers, whereas the independent research was not dispositive.
For the record, if the researcher here was COI’d, eg working at Anthropic, I think you should say so, and you should also substantially discount what they said.
For the record, if the researcher here was COI’d, eg working at Anthropic, I think you should say so, and you should also substantially discount what they said.
I agree! (They are not from Anthropic. I probably shouldn’t deanonymize further). :)