Visualization is pretty important in exploratory mechanistic interp work, but this is more about fast research code: see any of Neel’s exploratory notebooks.
When Redwood had a big interpretability team, they were also developing their own data viz tooling. This never got open-sourced, and this could have been due to lack of experience by the people who wrote such tooling. Anthropic has their own libraries too, Transformerlens could use more visualization, and I hear David Bau’s lab is developing a better open-source interpretability library. My guess is there is more impact if you’re willing to participate in interp research yourself, but still probably some opportunities to mostly do data viz at some interp shop.
With regard to bottlenecks being on knowing where/how to look, the important thing is to work with the right team. From a quick glance the Learning Interpretability Tool is not focused on mechinterp, and the field of interp is so much larger than the subset targeted at alignment that you’d likely have more impact at something more targeted. In your position I’d likely talk to a bunch of empirical alignment researchers about their frontend / data viz needs, see if a top tier team like Superalignment is hiring, and have an 80k call while developing a good inside view on the problem
Visualization is pretty important in exploratory mechanistic interp work, but this is more about fast research code: see any of Neel’s exploratory notebooks.
When Redwood had a big interpretability team, they were also developing their own data viz tooling. This never got open-sourced, and this could have been due to lack of experience by the people who wrote such tooling. Anthropic has their own libraries too, Transformerlens could use more visualization, and I hear David Bau’s lab is developing a better open-source interpretability library. My guess is there is more impact if you’re willing to participate in interp research yourself, but still probably some opportunities to mostly do data viz at some interp shop.
With regard to bottlenecks being on knowing where/how to look, the important thing is to work with the right team. From a quick glance the Learning Interpretability Tool is not focused on mechinterp, and the field of interp is so much larger than the subset targeted at alignment that you’d likely have more impact at something more targeted. In your position I’d likely talk to a bunch of empirical alignment researchers about their frontend / data viz needs, see if a top tier team like Superalignment is hiring, and have an 80k call while developing a good inside view on the problem