I’ve certainly heard of your work but it’s far enough out of my research interests that I’ve never taken a particularly strong interest. Writing this in this context makes me realise I might have made a bit of a one-man echo chamber for myself… Do you mind if we leave this as ‘undecided’ for a while?
Regarding ELK—I think the core of the problem as I understand it is fairly clear once you begin thinking about interpretability. Understanding the relation between AI and human ontologies was part of the motivation behind my work on alphazero (as well as an interest in the natural abstractions hypothesis). Section 4 “Encoding of human conceptual knowledge” and Section 8 “Exploring activations with unsupervised methods” are the places to look. The section on challenges and limitations in concept probing I think echoes a lot of the concerns in ELK.
In terms of subsequent work on ELK, I don’t think much of the work on solving ELK was particularly useful, and often reinvented existing methods (e.g. sparse probing, causal interchange interventions). If I were to try and work on it then I think the best way to do so would be to embed the core challenge in a tractable research program, for instance trying to extract new scientific knowledge from ML models like alphafold.
To move this in a more positive direction, the most fruitful/exciting conceptual work I’ve seen is probably (1) the natural abstractions hypothesis and (2) debate. When I think a bit about why I particularly like these, for (1) it’s because it seems plausibly true, extremely useful if true, and amenable to both formal theoretical work and empirical study. For (2) it’s because it’s a pretty striking new idea that seems very powerful/scalable, but also can be put into practice a bit ahead of really powerful systems.
I’ve certainly heard of your work but it’s far enough out of my research interests that I’ve never taken a particularly strong interest. Writing this in this context makes me realise I might have made a bit of a one-man echo chamber for myself… Do you mind if we leave this as ‘undecided’ for a while?
Regarding ELK—I think the core of the problem as I understand it is fairly clear once you begin thinking about interpretability. Understanding the relation between AI and human ontologies was part of the motivation behind my work on alphazero (as well as an interest in the natural abstractions hypothesis). Section 4 “Encoding of human conceptual knowledge” and Section 8 “Exploring activations with unsupervised methods” are the places to look. The section on challenges and limitations in concept probing I think echoes a lot of the concerns in ELK.
In terms of subsequent work on ELK, I don’t think much of the work on solving ELK was particularly useful, and often reinvented existing methods (e.g. sparse probing, causal interchange interventions). If I were to try and work on it then I think the best way to do so would be to embed the core challenge in a tractable research program, for instance trying to extract new scientific knowledge from ML models like alphafold.
To move this in a more positive direction, the most fruitful/exciting conceptual work I’ve seen is probably (1) the natural abstractions hypothesis and (2) debate. When I think a bit about why I particularly like these, for (1) it’s because it seems plausibly true, extremely useful if true, and amenable to both formal theoretical work and empirical study. For (2) it’s because it’s a pretty striking new idea that seems very powerful/scalable, but also can be put into practice a bit ahead of really powerful systems.