I’m an artist, writer, and human being.
To be a little more precise: I make video games, edit Wikipedia, and write here and on LessWrong!
I’m an artist, writer, and human being.
To be a little more precise: I make video games, edit Wikipedia, and write here and on LessWrong!
There’s nothing in this section about why censoring model outputs to be diverse/not use slurs/not target individuals or create violent speech is actually a bad idea.
The argument in that section was not actually an object-level one, but rather an argument from history and folk deontological philosophy (in the sense that “censorship is bad” is a useful, if not perfect, heuristic used in most modern Western societies). Nonetheless, here’s a few reasons why what you mentioned could be a bad idea: Goodhart’s law, the Scunthorpe Problem, and the general tendency for unintended side effects. We can’t directly measure “diversity” or assign an exact “violence level” to a piece of text or media (at least not without a lot more context which we may not always have), so instead any automated censorship program is forced to use proxies for toxicity instead.To give a real-world and slightly silly example, TikTok’s content filters have led to almost all transcriptions of curse words and sensitive topics to be replaced with some similar-sounding but unrelated words, which in turn has spawned a new form of internet “algospeak.” (I highly recommend reading the linked article if you have the time) This was never the intention of the censors, but people adopted to optimize for the proxy by changing their dialect instead of their content actually becoming any less toxic. On a darker note, this also had a really bad side effect where videos about vital-but-sensitive topics such as sex education, pandemic preparedness, war coverage, etc. became much harder to find and understand (to outsiders) as a result. Instead of increasing diversity, well-meaning censorship can lead to further breakdowns in communication surprisingly often.
Came across this post today—I assume the bounty has been long-closed by now?
Thanks, I think I somehow missed some of those!
Thanks for the clarification! I might try to do something on the Orthogonality thesis if I get the chance, since I think that tends to be glossed over in a lot of popular introductions.
My perspective on the issue is that by accepting the wager, you are likely to become far less effective at achieving your terminal goals, (since even if you can discount higher-probability wagers, there will eventually be a lower-probability one that you won’t be able to think your way out of and thus have to entertain on principle), and become vulnerable to adversarial attacks, leading to actions which in the vast majority of possible universes are losing moves. If your epistemics require that you spend all your money on projects that will, for all intents and purposes do nothing (and which if universally followed would lead to a clearly dystopian world where only muggers get money), then I’d wager that the epistemics are the problem. Rationalists, and EAs, should play to win, and not fall prey to obvious basilisks of our own making.
Question—is $20,000 awarded to every entry which qualifies under the rules, or is there one winner selected among the pool of all who submit an entry?
This is really exciting! I’m glad there are so many talented people on the case, and hope the good news will only grow from here :)
I strongly agree with you on points one and two, though I’m not super confident on three. For me the biggest takeaway is we should be putting more effort into attempts to instill “false” beliefs which are safety-promoting and self-stable.
Thanks for making this document public, it’s an interesting model! I am slightly concerned this could lead to reduced effectiveness within the organization due to reduced communication, which could plausibly cause more net harm in EV than the increased risk of infohazard leakage. I assume you’ve done that cost/benefit analysis already of course, but thought it’s worth mentioning just in case.
We are in the process of reaching out to individuals and we will include them after they confirm. If you have suggestions for individuals to include please add a comment here.
It may be worth talking with a trusted PR expert before going public. I’ve done PR work for a tech company in the past, and my experience there was that sometimes people can be clueless about how the average person or the media will receive a story once it leaves their cultural circle. It is not always obvious to insiders that a given action will lead to public blowblack (or loss of important private allies/investors), so if that’s of potential concern, I highly recommend talking with someone who does good work in public/media relations. If necessary feel free to tap me, though note that I am closer to hobbyist than expert, so you should find a more experienced go-to PR person if possible.
There is a very severe potential downside if many funders think in this manner, which is that it will discourage people from writing about potentially important ideas. I’m strongly in favor of putting more effort and funding into PR (disclaimer that I’ve worked in media relations in the past), but if we refuse to fund people with diverse, potentially provocative takes, that’s not a worthwhile trade-off, imo. I want EA to be capable of supporting an intellectual environment where we can ask about and discuss hard questions publicly without worrying about being excluded as a result. If that means bad-faith journalists have slightly more material to work with, than so be it.
Not a bad idea! I’d love to try to actually test this hypothesis—my hunch is that it will do worse at prediction in most areas, but there may be some scenarios where thinking things through from a narrative perspective could provide otherwise hard-to-reach insight
I was personally unaware of the situation until reading this comment thread, so can confirm
My brother was recently very freaked out when I asked him to pose a set of questions that he thinks an AI wouldn’t be able to answer, and GPT-3 gave excellent-sounding responses to his prompts.
Seconding this—I’m definitely personally curious what such a chart would look like!
I don’t think that would imply that nothing really matters, since reducing suffering and maximizing happiness (as well as good ol’ “care about other human beings while they live”) could still be valid sources of meaning. In fact, insuring that we do not become extinct too early would be extremely important to insure the best possible fate of the universe (that being a quick and painless destruction or whatever), so just doing what feels best at the moment probably would not be a great strategy for a True Believer in this hypothetical.
Yes, the consequences are probably less severe in this context, which is why I wouldn’t consider this a particularly strong argument. Imo, it’s more important to understand this line of thinking for the purpose of modeling outsider’s reactions to potential censorship, as this seems to be how people irl are responding to OpenAI, et al’s policy decisions.
I would also like to emphasize again that sometimes regulation is necessary, and I am not against it on principle, though I do believe it should be used with caution; this post is critiquing the details of how we are implementing censorship in large models, not so much its use in the first place.