The most obvious (and generic) objection is that censorship is bad.
This strikes me as a weird argument because it isn’t object-level at all. There’s nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea. There was a Twitter thread of doing GPT-3 injections to make a remote work bot make violent threats to people. That was pretty convincing evidence to me that there is too much scope for abuse without some front-end modifications.
If you have an object-level, non-generic argument for why this form of censorship is bad, I would love to hear it.
This will create the illusion of greater safety than actually exists, and (imo) is practically begging for something to go wrong.
If true, this would be the most convincing objection to me. But I don’t think this is actually how public perception works. Who is really out there who thinks that Stable Diffusion is safe, but if they saw it generate a violent image they would be convinced that Stable Diffusion is a problem? Most people who celebrate stable diffusion or GPT3 know they could be used for bad ends, they just think the good ends are more important/the bad ends are fixable. I just don’t see how a front-end tweak really convinces people who otherwise would have been skeptical. I think it’s much more realistic that people see this as transparently just a bandaid solution, and they just vary in how much they care about the underlying issue.
I also think there’s a distinction between a model being “not aligned” and being misaligned. Insofar as a model is spitting out objectionable inputs it certainly doesn’t meet the gold standard of aligned AI. But I also struggle to see how it is actually concretely misaligned. In fact, one of the biggest worries of AI safety is AIs being able to circumvent restrictions placed on them by the modeller. So it seems like an AI that is easily muzzled by front-end tweaks is not likely to be the biggest cause for concern.
Calling content censorship “AI safety” (or even “bias reduction”) severely damages the reputation of actual, existential AI safety advocates.
This is very unconvincing. The AI safety vs AI ethics conflict is long-standing, goes way beyond some particular front-end censorship and is unlikely to be affected by any of these individual issues. If your broader point is that calling AI ethics AI safety is bad, then yes. But I don’t think the cited tweets are really evidence that AI safety is widely viewed as synonymous with AI ethics. Timnit Gebru has far more followers than any of these tweets will ever reach, and is quite vocal about criticizing AI safety people. The contribution of front-end censorship to this debate is probably quite overstated.
There’s nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea.
The argument in that section was not actually an object-level one, but rather an argument from history and folk deontological philosophy (in the sense that “censorship is bad” is a useful, if not perfect, heuristic used in most modern Western societies). Nonetheless, here’s a few reasons why what you mentioned could be a bad idea: Goodhart’s law, the Scunthorpe Problem, and the general tendency for unintended side effects. We can’t directly measure “diversity” or assign an exact “violence level” to a piece of text or media (at least not without a lot more context which we may not always have), so instead any automated censorship program is forced to use proxies for toxicity instead.To give a real-world and slightly silly example, TikTok’s content filters have led to almost all transcriptions of curse words and sensitive topics to be replaced with some similar-sounding but unrelated words, which in turn has spawned a new form of internet “algospeak.” (I highly recommend reading the linked article if you have the time) This was never the intention of the censors, but people adopted to optimize for the proxy by changing their dialect instead of their content actually becoming any less toxic. On a darker note, this also had a really bad side effect where videos about vital-but-sensitive topics such as sex education, pandemic preparedness, war coverage, etc. became much harder to find and understand (to outsiders) as a result. Instead of increasing diversity, well-meaning censorship can lead to further breakdowns in communication surprisingly often.
I think this makes a lot of sense for algorithmic regulation of human expression, but I still don’t see the link to algorithmic expression itself. In particular I agree that we can’t perfectly measure the violence of a speech act, but the consequences of incorrectly classifying something as violent seem way less severe for a language model than for a platform of humans.
Yes, the consequences are probably less severe in this context, which is why I wouldn’t consider this a particularly strong argument. Imo, it’s more important to understand this line of thinking for the purpose of modeling outsider’s reactions to potential censorship, as this seems to be how people irl are responding to OpenAI, et al’s policy decisions.
I would also like to emphasize again that sometimes regulation is necessary, and I am not against it on principle, though I do believe it should be used with caution; this post is critiquing the details of how we are implementing censorship in large models, not so much its use in the first place.
This strikes me as a weird argument because it isn’t object-level at all. There’s nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea. There was a Twitter thread of doing GPT-3 injections to make a remote work bot make violent threats to people. That was pretty convincing evidence to me that there is too much scope for abuse without some front-end modifications.
If you have an object-level, non-generic argument for why this form of censorship is bad, I would love to hear it.
If true, this would be the most convincing objection to me. But I don’t think this is actually how public perception works. Who is really out there who thinks that Stable Diffusion is safe, but if they saw it generate a violent image they would be convinced that Stable Diffusion is a problem? Most people who celebrate stable diffusion or GPT3 know they could be used for bad ends, they just think the good ends are more important/the bad ends are fixable. I just don’t see how a front-end tweak really convinces people who otherwise would have been skeptical. I think it’s much more realistic that people see this as transparently just a bandaid solution, and they just vary in how much they care about the underlying issue.
I also think there’s a distinction between a model being “not aligned” and being misaligned. Insofar as a model is spitting out objectionable inputs it certainly doesn’t meet the gold standard of aligned AI. But I also struggle to see how it is actually concretely misaligned. In fact, one of the biggest worries of AI safety is AIs being able to circumvent restrictions placed on them by the modeller. So it seems like an AI that is easily muzzled by front-end tweaks is not likely to be the biggest cause for concern.
This is very unconvincing. The AI safety vs AI ethics conflict is long-standing, goes way beyond some particular front-end censorship and is unlikely to be affected by any of these individual issues. If your broader point is that calling AI ethics AI safety is bad, then yes. But I don’t think the cited tweets are really evidence that AI safety is widely viewed as synonymous with AI ethics. Timnit Gebru has far more followers than any of these tweets will ever reach, and is quite vocal about criticizing AI safety people. The contribution of front-end censorship to this debate is probably quite overstated.
The argument in that section was not actually an object-level one, but rather an argument from history and folk deontological philosophy (in the sense that “censorship is bad” is a useful, if not perfect, heuristic used in most modern Western societies). Nonetheless, here’s a few reasons why what you mentioned could be a bad idea: Goodhart’s law, the Scunthorpe Problem, and the general tendency for unintended side effects. We can’t directly measure “diversity” or assign an exact “violence level” to a piece of text or media (at least not without a lot more context which we may not always have), so instead any automated censorship program is forced to use proxies for toxicity instead.To give a real-world and slightly silly example, TikTok’s content filters have led to almost all transcriptions of curse words and sensitive topics to be replaced with some similar-sounding but unrelated words, which in turn has spawned a new form of internet “algospeak.” (I highly recommend reading the linked article if you have the time) This was never the intention of the censors, but people adopted to optimize for the proxy by changing their dialect instead of their content actually becoming any less toxic. On a darker note, this also had a really bad side effect where videos about vital-but-sensitive topics such as sex education, pandemic preparedness, war coverage, etc. became much harder to find and understand (to outsiders) as a result. Instead of increasing diversity, well-meaning censorship can lead to further breakdowns in communication surprisingly often.
I think this makes a lot of sense for algorithmic regulation of human expression, but I still don’t see the link to algorithmic expression itself. In particular I agree that we can’t perfectly measure the violence of a speech act, but the consequences of incorrectly classifying something as violent seem way less severe for a language model than for a platform of humans.
Yes, the consequences are probably less severe in this context, which is why I wouldn’t consider this a particularly strong argument. Imo, it’s more important to understand this line of thinking for the purpose of modeling outsider’s reactions to potential censorship, as this seems to be how people irl are responding to OpenAI, et al’s policy decisions.
I would also like to emphasize again that sometimes regulation is necessary, and I am not against it on principle, though I do believe it should be used with caution; this post is critiquing the details of how we are implementing censorship in large models, not so much its use in the first place.