First, it just seems really adversarial. There’s the word “sabotage” in the title!
Second, I guess the idea here is that (1) you get this backdoor into training data for future generations of LLMs, (2) once this is deployed, you announce to the world that this backdoor exists and (3) ppl trust LLMs less? That seems like a pretty fragile theory of change, with lots of downside risk. For example:
It could cause there to be a bunch of code with security vulnerabilities, which could be really costly for society.
It could cause the public and/or labs to get a negative view of AI safety and people who are working on it.
Agreed, this is ridiculous. You should take down the contest.
Your chances of a successful attack are very low. It takes years for information to be scraped from the internet, trained into a model, and deployed to production. GPT-4 has a knowledge cutoff of September 2021. If future models have the same delay, you won’t see results for a year and a half.
The more likely outcome is press coverage about how AI safety folks are willing to hold society hostage in order to enforce their point of view. See this takedown piece on Eliezer Yudkowsky, and the 80,000 Hours advice on how to avoid accidentally harming a cause you want to help.
For what it’s worth, the contest host is an artist I know who has no connection to the EA movement. Also, there is no “holding society hostage” because the contest is designed to make it trivially easy to filter out all poison, just by looking for keywords. (wallabywinter & yallabywinter). Black hat hackers are already doing code example poisoning on stack overflow, and this contest simply seeks to raise awareness of that fact in the white-hat community.
This seems … maybe not that good to me?
First, it just seems really adversarial. There’s the word “sabotage” in the title!
Second, I guess the idea here is that (1) you get this backdoor into training data for future generations of LLMs, (2) once this is deployed, you announce to the world that this backdoor exists and (3) ppl trust LLMs less? That seems like a pretty fragile theory of change, with lots of downside risk. For example:
It could cause there to be a bunch of code with security vulnerabilities, which could be really costly for society.
It could cause the public and/or labs to get a negative view of AI safety and people who are working on it.
It’s pretty creative though, I’ll give you that.
Agreed, this is ridiculous. You should take down the contest.
Your chances of a successful attack are very low. It takes years for information to be scraped from the internet, trained into a model, and deployed to production. GPT-4 has a knowledge cutoff of September 2021. If future models have the same delay, you won’t see results for a year and a half.
The more likely outcome is press coverage about how AI safety folks are willing to hold society hostage in order to enforce their point of view. See this takedown piece on Eliezer Yudkowsky, and the 80,000 Hours advice on how to avoid accidentally harming a cause you want to help.
For what it’s worth, the contest host is an artist I know who has no connection to the EA movement. Also, there is no “holding society hostage” because the contest is designed to make it trivially easy to filter out all poison, just by looking for keywords. (wallabywinter & yallabywinter). Black hat hackers are already doing code example poisoning on stack overflow, and this contest simply seeks to raise awareness of that fact in the white-hat community.