The decisions which caused the FTX catastrophe, the fact that EA is counterfactually responsible for the three primary AGI labs, Anthropic being entirely run by EAs yet still doing net negative work, and the funding of mostly capabilities oriented ML work with vague alignment justifications (and potentially similar dynamics in biotech which are more speculative for me right now), with the creation of GPT and[1] RLHF as particular examples of this.
I recently found out that GPT was not in fact developed for alignment work. I had gotten confused with some rhetoric used by OpenAI and employees during the earlier days which turned out to be entirely independent from modern alignment considerations.
Strong disagree for misattributing blame and eliding the question.
To the extent that “EA is counterfactually responsible for the three primary AGI labs,” you would need to claim that the ex-ante expected value of specific decisions was negative, and that those decisions were because of EA, not that it went poorly ex-post. Perhaps you can make those arguments, but you aren’t.
Ditto for “The decisions which caused the FTX catastrophe”—Whose decisions, where does the blame go, and to what extent are they about EA? SBF’s decision to misappropriate funds, or fraudulently misrepresent what he did? CEA not knowing about it? OpenPhil not investigating? Goldman Sachs doing a bad job with due diligence?
I agree with this, except when you tell me I was eliding the question (and, of course, when you tell me I was misattributing blame). I was giving a summary of my position, not an analysis which I think would be deep enough to convince all skeptics.
Basically, there are simple arguments around ‘they are an AGI capabilities organization, so obviously they’re bad’, and more complicated arguments around ‘but they say they want to do alignment work’, and then even more complicated arguments on those arguments going ‘well, actually it doesn’t seem like their alignment work is all that good actually, and their capabilities work is pushing capabilities, and still makes it difficult for AGI companies to coordinate to not build AGI, so in fact the simple arguments were correct’. Getting more into depth would require a writeup of my current picture of alignment, which I am writing, but which is difficult to convey via a quick comment.
I don’t feel qualified to say. My impression of Anthropic’s epistemics is weakly negative (see here), but I haven’t read any of their research, but my prior is relatively high AI scepticism. Not because I feel like I understand anything about the field, but because every time I do engage with some small part of the dialogue, it seems totally unconvincing (see same comment), so I have the faint suspicion many of the people worrying about AI safety (sometimes including me) are subject to some mass-Gell-Mann amnesia effect.
Mass Gell-Mann amnesia effect because, say, I may look at others talking about my work or work I know closely, and say “wow! That’s wrong”, but look at others talking about work I don’t know closely and say “wow! That implies DOOM!” (like dreadfully wrong corruptions of the orthogonality thesis), and so decide to work on work that seems relevant to that DOOM?
Yeah, basically that. Even if those same people ultimately find much more convincing (or at least less obviously flawed) arguments, I still worry about the selection effects Nuno mentioned in his thread.
I could list my current theories about how these problems are interrelated, but I fear such a listing would anchor me to the wrong one, and too many claims in a statement produces more discussion around minor sub-claims than major points (an example of a shallow criticism of EA discussion norms).
The decisions which caused the FTX catastrophe, the fact that EA is counterfactually responsible for the three primary AGI labs, Anthropic being entirely run by EAs yet still doing net negative work, and the funding of mostly capabilities oriented ML work with vague alignment justifications (and potentially similar dynamics in biotech which are more speculative for me right now), with the creation of
GPT and[1] RLHF as particular examples of this.I recently found out that GPT was not in fact developed for alignment work. I had gotten confused with some rhetoric used by OpenAI and employees during the earlier days which turned out to be entirely independent from modern alignment considerations.
Strong disagree for misattributing blame and eliding the question.
To the extent that “EA is counterfactually responsible for the three primary AGI labs,” you would need to claim that the ex-ante expected value of specific decisions was negative, and that those decisions were because of EA, not that it went poorly ex-post. Perhaps you can make those arguments, but you aren’t.
Ditto for “The decisions which caused the FTX catastrophe”—Whose decisions, where does the blame go, and to what extent are they about EA? SBF’s decision to misappropriate funds, or fraudulently misrepresent what he did? CEA not knowing about it? OpenPhil not investigating? Goldman Sachs doing a bad job with due diligence?
I agree with this, except when you tell me I was eliding the question (and, of course, when you tell me I was misattributing blame). I was giving a summary of my position, not an analysis which I think would be deep enough to convince all skeptics.
You say you agree, but I was asking questions about what you were claiming and who you were blaming.
EAs are counterfactually responsible for DeepMind?
Off topic, but can you clarify why you think Anthropic does net negative work?
Basically, there are simple arguments around ‘they are an AGI capabilities organization, so obviously they’re bad’, and more complicated arguments around ‘but they say they want to do alignment work’, and then even more complicated arguments on those arguments going ‘well, actually it doesn’t seem like their alignment work is all that good actually, and their capabilities work is pushing capabilities, and still makes it difficult for AGI companies to coordinate to not build AGI, so in fact the simple arguments were correct’. Getting more into depth would require a writeup of my current picture of alignment, which I am writing, but which is difficult to convey via a quick comment.
I upvoted and did not disagreevote this, for the record. I’ll be interested to see your writeup :)
Do you disagree, assuming my writeup provides little information or context to you?
I don’t feel qualified to say. My impression of Anthropic’s epistemics is weakly negative (see here), but I haven’t read any of their research, but my prior is relatively high AI scepticism. Not because I feel like I understand anything about the field, but because every time I do engage with some small part of the dialogue, it seems totally unconvincing (see same comment), so I have the faint suspicion many of the people worrying about AI safety (sometimes including me) are subject to some mass-Gell-Mann amnesia effect.
Mass Gell-Mann amnesia effect because, say, I may look at others talking about my work or work I know closely, and say “wow! That’s wrong”, but look at others talking about work I don’t know closely and say “wow! That implies DOOM!” (like dreadfully wrong corruptions of the orthogonality thesis), and so decide to work on work that seems relevant to that DOOM?
Yeah, basically that. Even if those same people ultimately find much more convincing (or at least less obviously flawed) arguments, I still worry about the selection effects Nuno mentioned in his thread.
I could list my current theories about how these problems are interrelated, but I fear such a listing would anchor me to the wrong one, and too many claims in a statement produces more discussion around minor sub-claims than major points (an example of a shallow criticism of EA discussion norms).