I fully agree that we should expect corporations to engage in safety-washing, as merely marketing yourself as X is always gonna be cheaper than actually doing X, whatever X moral thing is.
However there is a key difference between greenwashing/humanewashing and safety-washing, and that is that we don’t know what the correct approach to safety is. We can actually look at the carbon emissions of a company or how they treat their animals, but it’s very hard to look at a company and objectively say they’re “doing it wrong”.
Take one of your examples here:
Talk about “safety” when you really mean other kinds of things the public might want an AI to be, like un-biased and not-hateful
I would argue, and plenty of others have made this point before, that making your AI un-biased and not-hateful is actually highly relevant to AI safety. If you’re trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now.
This makes me concerned that the term “safety-washing” will simply be used as a bludgeon against anyone who doesn’t agree with your personal opinion of the best safety approach.
This is a good point, thanks! I don’t want more bludgeons-to-be-used-in-disagreements.
[Writing very quickly.]
Although I might push back against the implied extent to which we know what the correct approaches to humane food or the climate are.
And while I agree with “If you’re trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now,” I do think that there are different things that happen, and they’re worth distinguishing:
“I want to make sure that my work on AI doesn’t end up killing everyone, and part of that is to learn how to make sure the systems I’m developing don’t say anything sexist” (which seems like the position you’re arguing for)
“I genuinely think that it’s really important to make sure that my AI work doesn’t lead to increased sexism” (which seems very good but also very different from mitigating existential risk from AI, except accidentally)
“People are worried about sexist AI systems, and also about safety, and honestly AI safety seems really hard, but I do know of some things that I could do on the sexism fronts, so I’m going to focus on that kind of AI ‘safety’” (which seems like the type of thing that would cause confusion and potential harm)
Yes, perhaps I’m just injecting some of my broader concerns about who gets to use the word “safety” here.
I’m thinking of scenario 4 here:
4. A researcher looked into AI risk, and is convinced that AI could be highly dangerous if misused, and that “misalignment” is a serious problem that could lead to some very bad outcomes, from increased sexism to significant amount of deaths from say misaligned weapon systems or healthcare diagnosis. However, they think the arguments for existential risk are very flimsy, that any x-risk threat is very far away, and that it’s legitimately a waste of time to work on x-risk. So they focus their team on preventing near and medium term danger from existing AI systems.
In case it wasn’t obvious, the opinion of the researcher is the one I hold, and I know a significant amount of others do as well (some perhaps secretly). I don’t think it’s wrong for them to claim they are working on “AI safety”, when they are literally working on making AI safer, but it seems like they would be open to accusations of safewashing if they made that claim.
I like your suggestion of using the phrase “existential safety” instead, I think it would clear up a lot of confusion.
I fully agree that we should expect corporations to engage in safety-washing, as merely marketing yourself as X is always gonna be cheaper than actually doing X, whatever X moral thing is.
However there is a key difference between greenwashing/humanewashing and safety-washing, and that is that we don’t know what the correct approach to safety is. We can actually look at the carbon emissions of a company or how they treat their animals, but it’s very hard to look at a company and objectively say they’re “doing it wrong”.
Take one of your examples here:
I would argue, and plenty of others have made this point before, that making your AI un-biased and not-hateful is actually highly relevant to AI safety. If you’re trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now.
This makes me concerned that the term “safety-washing” will simply be used as a bludgeon against anyone who doesn’t agree with your personal opinion of the best safety approach.
This is a good point, thanks! I don’t want more bludgeons-to-be-used-in-disagreements.
[Writing very quickly.]
Although I might push back against the implied extent to which we know what the correct approaches to humane food or the climate are.
And while I agree with “If you’re trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now,” I do think that there are different things that happen, and they’re worth distinguishing:
“I want to make sure that my work on AI doesn’t end up killing everyone, and part of that is to learn how to make sure the systems I’m developing don’t say anything sexist” (which seems like the position you’re arguing for)
“I genuinely think that it’s really important to make sure that my AI work doesn’t lead to increased sexism” (which seems very good but also very different from mitigating existential risk from AI, except accidentally)
“People are worried about sexist AI systems, and also about safety, and honestly AI safety seems really hard, but I do know of some things that I could do on the sexism fronts, so I’m going to focus on that kind of AI ‘safety’” (which seems like the type of thing that would cause confusion and potential harm)
Yes, perhaps I’m just injecting some of my broader concerns about who gets to use the word “safety” here.
I’m thinking of scenario 4 here:
4. A researcher looked into AI risk, and is convinced that AI could be highly dangerous if misused, and that “misalignment” is a serious problem that could lead to some very bad outcomes, from increased sexism to significant amount of deaths from say misaligned weapon systems or healthcare diagnosis. However, they think the arguments for existential risk are very flimsy, that any x-risk threat is very far away, and that it’s legitimately a waste of time to work on x-risk. So they focus their team on preventing near and medium term danger from existing AI systems.
In case it wasn’t obvious, the opinion of the researcher is the one I hold, and I know a significant amount of others do as well (some perhaps secretly). I don’t think it’s wrong for them to claim they are working on “AI safety”, when they are literally working on making AI safer, but it seems like they would be open to accusations of safewashing if they made that claim.
I like your suggestion of using the phrase “existential safety” instead, I think it would clear up a lot of confusion.