Basically, there are simple arguments around ‘they are an AGI capabilities organization, so obviously they’re bad’, and more complicated arguments around ‘but they say they want to do alignment work’, and then even more complicated arguments on those arguments going ‘well, actually it doesn’t seem like their alignment work is all that good actually, and their capabilities work is pushing capabilities, and still makes it difficult for AGI companies to coordinate to not build AGI, so in fact the simple arguments were correct’. Getting more into depth would require a writeup of my current picture of alignment, which I am writing, but which is difficult to convey via a quick comment.
I don’t feel qualified to say. My impression of Anthropic’s epistemics is weakly negative (see here), but I haven’t read any of their research, but my prior is relatively high AI scepticism. Not because I feel like I understand anything about the field, but because every time I do engage with some small part of the dialogue, it seems totally unconvincing (see same comment), so I have the faint suspicion many of the people worrying about AI safety (sometimes including me) are subject to some mass-Gell-Mann amnesia effect.
Mass Gell-Mann amnesia effect because, say, I may look at others talking about my work or work I know closely, and say “wow! That’s wrong”, but look at others talking about work I don’t know closely and say “wow! That implies DOOM!” (like dreadfully wrong corruptions of the orthogonality thesis), and so decide to work on work that seems relevant to that DOOM?
Yeah, basically that. Even if those same people ultimately find much more convincing (or at least less obviously flawed) arguments, I still worry about the selection effects Nuno mentioned in his thread.
Basically, there are simple arguments around ‘they are an AGI capabilities organization, so obviously they’re bad’, and more complicated arguments around ‘but they say they want to do alignment work’, and then even more complicated arguments on those arguments going ‘well, actually it doesn’t seem like their alignment work is all that good actually, and their capabilities work is pushing capabilities, and still makes it difficult for AGI companies to coordinate to not build AGI, so in fact the simple arguments were correct’. Getting more into depth would require a writeup of my current picture of alignment, which I am writing, but which is difficult to convey via a quick comment.
I upvoted and did not disagreevote this, for the record. I’ll be interested to see your writeup :)
Do you disagree, assuming my writeup provides little information or context to you?
I don’t feel qualified to say. My impression of Anthropic’s epistemics is weakly negative (see here), but I haven’t read any of their research, but my prior is relatively high AI scepticism. Not because I feel like I understand anything about the field, but because every time I do engage with some small part of the dialogue, it seems totally unconvincing (see same comment), so I have the faint suspicion many of the people worrying about AI safety (sometimes including me) are subject to some mass-Gell-Mann amnesia effect.
Mass Gell-Mann amnesia effect because, say, I may look at others talking about my work or work I know closely, and say “wow! That’s wrong”, but look at others talking about work I don’t know closely and say “wow! That implies DOOM!” (like dreadfully wrong corruptions of the orthogonality thesis), and so decide to work on work that seems relevant to that DOOM?
Yeah, basically that. Even if those same people ultimately find much more convincing (or at least less obviously flawed) arguments, I still worry about the selection effects Nuno mentioned in his thread.