Thank you for explaining your position. Like you, I am concerned that organizations like OpenAI and the capabilities race theyāve created have robbed us of the precious time we need to figure out how to make AGI safe. However, I think weāre talking past each other to an extent: importantly, I said that we mostly shouldnāt criticize people for the organizations they work at, not for the roles they play in those organizations.
Most ML engineers have a lot of options of where to work, so choosing to work on AI capabilities research when they have a lot of alternatives outside of AI labs seems morally wrong. (On the other hand, given that AI capabilities teams exist, Iād rather they be staffed by engineers who are concerned about AI safety than engineers who arenāt.) However, I think there are many roles that plausibly advance AI safety that you could only do at an AI lab, such as promoting self-regulation in the AI industry. Iāve also heard arguments that advancing AI safety work sometimes requires advancing AI capabilities first. I think this was more true earlier: GPT-2 taught the AI safety community that they need to focus on aligning large language models. But I am really doubtful that itās true now.
In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are āAI capabilities peopleā who arenāt serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.
There is also risk of miscommunication: in another interaction I had at another EA-adjacent party, I got lambasted after I told someone that I āwork on AIā. I quickly clarified that I donāt work on cutting-edge stuff, but I feel that I shouldnāt have had to do this, especially at a casual event.
Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about āselling outā and point #3 about ācapabilities peopleā versus āsafety peopleā made me think you had working on capabilities in mind.
If you think itās āfair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm,ā then Iām confused about the framing of the post as being āplease donāt criticize EAs who āsell outā,ā since this seems like ācriticizingā to me. It also seems important to sometimes do this even when unsolicited, contra point 2. If the point is to avoid alienating people by making them feel attacked, then I agree, but the norms proposed here go a lot further than that.
Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about āselling outā and point #3 about ācapabilities peopleā versus āsafety peopleā made me think you had working on capabilities in mind.
Yes! I realize that ācapabilities peopleā was not a good choice of words. Itās a shorthand based on phrases Iāve heard people use at events.
In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are āAI capabilities peopleā who arenāt serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.
I think this perspective makes more sense than my original understanding of the OP, but I do think it is still misguided. Sadly, it is not very difficult for an organization to just label a job āAI Safetyā and then have them work on stuff whose primary aim is to make them more money, in this case by working on things like AI bias, or setting up RLHF pipelines, which might help a bit with some safety, but where the primary result is still billions of additional dollars flowing into AI labs primarily doing scaling-related work.
I sadly do not think that just because someone is working on āAI Safetyā that they have weighed and properly considered the obvious objections to their career choices. Indeed, safety-washing seems easy and common, and if you can just hire top EAs by slapping a safety label on a capabilities position, then we will likely make the world worse.
I do react differently to someone working in a safety position, but I do actually have a separate additional negative judgement if I find out that someone is actually working in capabilities but calling their work safety. I think that kind of deception is increasingly happening, and additionally makes coordinating and working in this space harder.
I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment Iām almost entirely uninformed on that. If anyoneās written a summary on where they think these grey-area research areas lie Iād be interested to read it. Scottās recent post was not a bad entry into the genre but obviously just worked a a very high level.
Thank you for explaining your position. Like you, I am concerned that organizations like OpenAI and the capabilities race theyāve created have robbed us of the precious time we need to figure out how to make AGI safe. However, I think weāre talking past each other to an extent: importantly, I said that we mostly shouldnāt criticize people for the organizations they work at, not for the roles they play in those organizations.
Most ML engineers have a lot of options of where to work, so choosing to work on AI capabilities research when they have a lot of alternatives outside of AI labs seems morally wrong. (On the other hand, given that AI capabilities teams exist, Iād rather they be staffed by engineers who are concerned about AI safety than engineers who arenāt.) However, I think there are many roles that plausibly advance AI safety that you could only do at an AI lab, such as promoting self-regulation in the AI industry. Iāve also heard arguments that advancing AI safety work sometimes requires advancing AI capabilities first. I think this was more true earlier: GPT-2 taught the AI safety community that they need to focus on aligning large language models. But I am really doubtful that itās true now.
In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are āAI capabilities peopleā who arenāt serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.
There is also risk of miscommunication: in another interaction I had at another EA-adjacent party, I got lambasted after I told someone that I āwork on AIā. I quickly clarified that I donāt work on cutting-edge stuff, but I feel that I shouldnāt have had to do this, especially at a casual event.
Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about āselling outā and point #3 about ācapabilities peopleā versus āsafety peopleā made me think you had working on capabilities in mind.
If you think itās āfair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm,ā then Iām confused about the framing of the post as being āplease donāt criticize EAs who āsell outā,ā since this seems like ācriticizingā to me. It also seems important to sometimes do this even when unsolicited, contra point 2. If the point is to avoid alienating people by making them feel attacked, then I agree, but the norms proposed here go a lot further than that.
Yes! I realize that ācapabilities peopleā was not a good choice of words. Itās a shorthand based on phrases Iāve heard people use at events.
I think this perspective makes more sense than my original understanding of the OP, but I do think it is still misguided. Sadly, it is not very difficult for an organization to just label a job āAI Safetyā and then have them work on stuff whose primary aim is to make them more money, in this case by working on things like AI bias, or setting up RLHF pipelines, which might help a bit with some safety, but where the primary result is still billions of additional dollars flowing into AI labs primarily doing scaling-related work.
I sadly do not think that just because someone is working on āAI Safetyā that they have weighed and properly considered the obvious objections to their career choices. Indeed, safety-washing seems easy and common, and if you can just hire top EAs by slapping a safety label on a capabilities position, then we will likely make the world worse.
I do react differently to someone working in a safety position, but I do actually have a separate additional negative judgement if I find out that someone is actually working in capabilities but calling their work safety. I think that kind of deception is increasingly happening, and additionally makes coordinating and working in this space harder.
I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment Iām almost entirely uninformed on that. If anyoneās written a summary on where they think these grey-area research areas lie Iād be interested to read it. Scottās recent post was not a bad entry into the genre but obviously just worked a a very high level.