Yitz comments on Public-facing Censorship Is Safety Theater, Causing Reputational Damage

Yitz 23 Sep 2022 10:38 UTC
2 points
0 ∶ 0
There’s nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea.
The argument in that section was not actually an object-level one, but rather an argument from history and folk deontological philosophy (in the sense that “censorship is bad” is a useful, if not perfect, heuristic used in most modern Western societies). Nonetheless, here’s a few reasons why what you mentioned could be a bad idea: Goodhart’s law, the Scunthorpe Problem, and the general tendency for unintended side effects. We can’t directly measure “diversity” or assign an exact “violence level” to a piece of text or media (at least not without a lot more context which we may not always have), so instead any automated censorship program is forced to use proxies for toxicity instead.To give a real-world and slightly silly example, TikTok’s content filters have led to almost all transcriptions of curse words and sensitive topics to be replaced with some similar-sounding but unrelated words, which in turn has spawned a new form of internet “algospeak.” (I highly recommend reading the linked article if you have the time) This was never the intention of the censors, but people adopted to optimize for the proxy by changing their dialect instead of their content actually becoming any less toxic. On a darker note, this also had a really bad side effect where videos about vital-but-sensitive topics such as sex education, pandemic preparedness, war coverage, etc. became much harder to find and understand (to outsiders) as a result. Instead of increasing diversity, well-meaning censorship can lead to further breakdowns in communication surprisingly often.
- Karthik Tadepalli 23 Sep 2022 10:59 UTC
  3 points
  1 ∶ 0
  Parent
  I think this makes a lot of sense for algorithmic regulation of human expression, but I still don’t see the link to algorithmic expression itself. In particular I agree that we can’t perfectly measure the violence of a speech act, but the consequences of incorrectly classifying something as violent seem way less severe for a language model than for a platform of humans.
  - Yitz 23 Sep 2022 18:17 UTC
    1 point
    0 ∶ 0
    Parent
    Yes, the consequences are probably less severe in this context, which is why I wouldn’t consider this a particularly strong argument. Imo, it’s more important to understand this line of thinking for the purpose of modeling outsider’s reactions to potential censorship, as this seems to be how people irl are responding to OpenAI, et al’s policy decisions.
    I would also like to emphasize again that sometimes regulation is necessary, and I am not against it on principle, though I do believe it should be used with caution; this post is critiquing the details of how we are implementing censorship in large models, not so much its use in the first place.