Protests are by nature adversarial and high-variance actions prone to creating backlash, so I think that if you’re going to be organizing them, you need to be careful to actually convey the right message (and in particular, way more careful than you need to be in non-adversarial environments—e.g. if news media pick up on this, they’re likely going to twist your words). I don’t think this post is very careful on that axis. In particular, two things I think are important to change:
“Meta’s frontier AI models are fundamentally unsafe.”
I disagree; the current models are not dangerous on anywhere near the level that most AI safety people are concerned about. Since “current models are not dangerous yet” is one of the main objections people have to prioritizing AI safety, it seems really important to be clearer about what you mean by “safe” so that it doesn’t sound like the protest is about language models saying bad things, etc.
Suggestion: be very clear that you’re protesting the policy that Meta has of releasing model weights because of future capabilities that models could have, rather than the previous decisions they made of releasing model weights.
“Stop free-riding on the goodwill of the open-source community. Llama models are not and have never been open source, says the Open Source Initiative.”
This basically just seems like a grab-bag accusation… you’re accusing them of not being open-source enough? That’s the exact opposite of the other objections; I think it’s both quite disingenuous and also a plausible way things might backfire (e.g. if this is the one phrase that the headlines run with).
It’s not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here’s why:
My guess is that the distinction between “X company’s frontier AI models are unsafe” vs. “X company’s policy on frontier models is unsafe” isn’t actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like “X company’s AI work = bad” And that’s really all the nuance that you need to create public pressure for X company to do something. Then, in more strategic contexts like legislative work and corporate outreach, message precision becomes more important. (When I worked in animal advocacy, we had a lot of success campaigning for nuanced policies with protests that had much vaguer messaging).
Also, I don’t think the news media is “likely” going to twist an activist’s words. It’s always a risk, but in general, the media seems to have a really healthy appetite for criticizing tech companies and isn’t trying to work against activists here. If anything, not mentioning the dangers of the current models (which do exist) might lead to media backlash of the “X-risk is a distraction” sort. So I really don’t think Holly saying “Meta’s frontier AI models are fundamentally unsafe” is evidence of a lack of careful consideration re: messaging here.
I do agree with the Open Source issue though. In that case, it seems like the message isn’t just imprecise, but instead pointing in the wrong direction altogether.
I think the distinctions Richard highlights are essential for us to make in our public advocacy—in particular, polls show that there’s already a significant chunk of voters who seem persuadable by AI notkilleveryoneism, so it’s a good time to argue for that directly. I don’t think there’s anything gained by hiding under the banner of fearing moderate harms from abuse of today’s models, and there’s much to be lost if we get policy responses that protect us from those but not from the actual x-risk.
I’m also heartened by recent polling, and spend a lot of time time these days thinking about how to argue for the importance of existential risks from artificial intelligence.
I’m guessing the main difference in our perspective here is that you see including existing harms in public messaging as “hiding under the banner” of another issue. In my mind, (1) existing harms are closely related to the threat models for existential risks (i.e. how do we get these systems to do the things we want and not do the other things); and (2) I think it’s just really important for advocates to try to build coalitions between different interest groups with shared instrumental goals (e.g. building voter support for AI regulation). I’ve seen a lot of social movements devolve into factionalism, and I see the early stages of that happening in AI safety, which I think is a real shame.
Like, one thing that would really help the safety situation is if frontier models were treated like nuclear power plants and couldn’t just be deployed at a single company’s whim without meeting a laundry list of safety criteria (both because of the direct effects of the safety criteria, and because such criteria literally just buys us some time). If it is the case that X-risk interest groups can build power and increase the chance of passing legislation by allying with others who want to include (totally legitimate) harms like respecting intellectual property in that list of criteria, I don’t see that as hiding under another’s banner. I see it as building strategic partnerships.
Anyway, this all goes a bit further than the point I was making in my initial comment, which is that I think the public isn’t very sensitive to subtle differences in messaging — and that’s okay because those subtle differences are much more important when you are drafting legislation compared to generally building public pressure.
Suggestion: be very clear that you’re protesting the policy that Meta has of releasing model weights because of future capabilities that models could have, rather than the previous decisions they made of releasing model weights.
They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do.
> This basically just seems like a grab-bag accusation… you’re accusing them of not being open-source enough?
It’s more like people think “open source” is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta’s ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress.
“They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do.”
I think using “unsafe” in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists. I do not want to end up in a position where, in 5 or 10 years’ time, policy proposals aimed at reducing existential risk come with 5 or 10 years worth of baggage in the form of previous claims about model harms that have turned out to be false. I expect that the direct effects of the Llama models that have been released so far will be net positive by a significant margin (for all the standard reasons that open source stuff is net positive). Maybe you disagree with this, but a) it seems better to focus on the more important claim, for which there’s a consensus in the field, and b) even if you’re going to make both claims, using the same word (“unsafe”) in these two very different senses is effectively a motte and bailey.
It’s more like people think “open source” is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta’s ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress.
The policy you are suggesting is far further away from “open source” than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.
The policy you are suggesting is far further away from “open source” than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.
Suppose meta was claiming that their models were curing cancer. It probably is the case that their work is more likely to cure cancer than if they took Holly’s preferred policy, but nonetheless it feels legitimate to object to them generating goodwill by claiming to cure cancer.
In your hypothetical, if Meta says “OK you win, you’re right, we’ll henceforth take steps to actually cure cancer”, onlookers would assume that this is a sensible response, i.e. that Meta is responding appropriately to the complaint. If the protester then gets back on the news the following week and says “no no no this is making things even worse”, I think onlookers would be very confused and say “what the heck is wrong with that protester?”
I think using “unsafe” in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists.
I agree that when there’s no memetic fitness/calibration trade-off, it’s always better to be calibrated. But here there is a trade-off. How should we take it?
My sense is that there’s never been any epistemically calibrated social movement and so that it would be playing against odds to impose that constraint. Even someone like Henry Spira who was very thoughtful personally used very unnuanced communication to achieve social change.
Richard, do you think that being miscalibrated has hurt or benefited the ability of past movements to cause social change? E.g. climate change and animal welfare.
My impression is that probably not? They caused entire chunks of society to be miscalibrated on climate change (maybe less in the US but in Europe it’s pretty big), and that’s not good, but I would guess that the alarmism helped them succeed? As long as there also exists a moderate faction & and there still exists background debates on the object-level, I feel like having a standard social activism movement wd be overall very welcome.
Curious if anyone here knows the relevant literature on the topic, e.g. details in the radical flank literature.
The analogy here would be climate scientists and climate protesters. Afaik climate protesters have not delegitimised climate scientists or made them seem like miscalibrated alarmists (perhaps even the opposite).
Protests are by nature adversarial and high-variance actions prone to creating backlash, so I think that if you’re going to be organizing them, you need to be careful to actually convey the right message (and in particular, way more careful than you need to be in non-adversarial environments—e.g. if news media pick up on this, they’re likely going to twist your words). I don’t think this post is very careful on that axis. In particular, two things I think are important to change:
“Meta’s frontier AI models are fundamentally unsafe.”
I disagree; the current models are not dangerous on anywhere near the level that most AI safety people are concerned about. Since “current models are not dangerous yet” is one of the main objections people have to prioritizing AI safety, it seems really important to be clearer about what you mean by “safe” so that it doesn’t sound like the protest is about language models saying bad things, etc.
Suggestion: be very clear that you’re protesting the policy that Meta has of releasing model weights because of future capabilities that models could have, rather than the previous decisions they made of releasing model weights.
“Stop free-riding on the goodwill of the open-source community. Llama models are not and have never been open source, says the Open Source Initiative.”
This basically just seems like a grab-bag accusation… you’re accusing them of not being open-source enough? That’s the exact opposite of the other objections; I think it’s both quite disingenuous and also a plausible way things might backfire (e.g. if this is the one phrase that the headlines run with).
It’s not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here’s why:
My guess is that the distinction between “X company’s frontier AI models are unsafe” vs. “X company’s policy on frontier models is unsafe” isn’t actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like “X company’s AI work = bad” And that’s really all the nuance that you need to create public pressure for X company to do something. Then, in more strategic contexts like legislative work and corporate outreach, message precision becomes more important. (When I worked in animal advocacy, we had a lot of success campaigning for nuanced policies with protests that had much vaguer messaging).
Also, I don’t think the news media is “likely” going to twist an activist’s words. It’s always a risk, but in general, the media seems to have a really healthy appetite for criticizing tech companies and isn’t trying to work against activists here. If anything, not mentioning the dangers of the current models (which do exist) might lead to media backlash of the “X-risk is a distraction” sort. So I really don’t think Holly saying “Meta’s frontier AI models are fundamentally unsafe” is evidence of a lack of careful consideration re: messaging here.
I do agree with the Open Source issue though. In that case, it seems like the message isn’t just imprecise, but instead pointing in the wrong direction altogether.
I think the distinctions Richard highlights are essential for us to make in our public advocacy—in particular, polls show that there’s already a significant chunk of voters who seem persuadable by AI notkilleveryoneism, so it’s a good time to argue for that directly. I don’t think there’s anything gained by hiding under the banner of fearing moderate harms from abuse of today’s models, and there’s much to be lost if we get policy responses that protect us from those but not from the actual x-risk.
I’m also heartened by recent polling, and spend a lot of time time these days thinking about how to argue for the importance of existential risks from artificial intelligence.
I’m guessing the main difference in our perspective here is that you see including existing harms in public messaging as “hiding under the banner” of another issue. In my mind, (1) existing harms are closely related to the threat models for existential risks (i.e. how do we get these systems to do the things we want and not do the other things); and (2) I think it’s just really important for advocates to try to build coalitions between different interest groups with shared instrumental goals (e.g. building voter support for AI regulation). I’ve seen a lot of social movements devolve into factionalism, and I see the early stages of that happening in AI safety, which I think is a real shame.
Like, one thing that would really help the safety situation is if frontier models were treated like nuclear power plants and couldn’t just be deployed at a single company’s whim without meeting a laundry list of safety criteria (both because of the direct effects of the safety criteria, and because such criteria literally just buys us some time). If it is the case that X-risk interest groups can build power and increase the chance of passing legislation by allying with others who want to include (totally legitimate) harms like respecting intellectual property in that list of criteria, I don’t see that as hiding under another’s banner. I see it as building strategic partnerships.
Anyway, this all goes a bit further than the point I was making in my initial comment, which is that I think the public isn’t very sensitive to subtle differences in messaging — and that’s okay because those subtle differences are much more important when you are drafting legislation compared to generally building public pressure.
They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do.
> This basically just seems like a grab-bag accusation… you’re accusing them of not being open-source enough?
It’s more like people think “open source” is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta’s ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress.
“They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do.”
I think using “unsafe” in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists. I do not want to end up in a position where, in 5 or 10 years’ time, policy proposals aimed at reducing existential risk come with 5 or 10 years worth of baggage in the form of previous claims about model harms that have turned out to be false. I expect that the direct effects of the Llama models that have been released so far will be net positive by a significant margin (for all the standard reasons that open source stuff is net positive). Maybe you disagree with this, but a) it seems better to focus on the more important claim, for which there’s a consensus in the field, and b) even if you’re going to make both claims, using the same word (“unsafe”) in these two very different senses is effectively a motte and bailey.
The policy you are suggesting is far further away from “open source” than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.
Suppose meta was claiming that their models were curing cancer. It probably is the case that their work is more likely to cure cancer than if they took Holly’s preferred policy, but nonetheless it feels legitimate to object to them generating goodwill by claiming to cure cancer.
In your hypothetical, if Meta says “OK you win, you’re right, we’ll henceforth take steps to actually cure cancer”, onlookers would assume that this is a sensible response, i.e. that Meta is responding appropriately to the complaint. If the protester then gets back on the news the following week and says “no no no this is making things even worse”, I think onlookers would be very confused and say “what the heck is wrong with that protester?”
It is a confusing point, maybe too subtle for a protest. I am learning!
It was a difficult point to make and we ended up removing it where we could.
This is a good point and feels persuasive, thanks!
I agree that when there’s no memetic fitness/calibration trade-off, it’s always better to be calibrated. But here there is a trade-off. How should we take it?
My sense is that there’s never been any epistemically calibrated social movement and so that it would be playing against odds to impose that constraint. Even someone like Henry Spira who was very thoughtful personally used very unnuanced communication to achieve social change.
Richard, do you think that being miscalibrated has hurt or benefited the ability of past movements to cause social change? E.g. climate change and animal welfare.
My impression is that probably not? They caused entire chunks of society to be miscalibrated on climate change (maybe less in the US but in Europe it’s pretty big), and that’s not good, but I would guess that the alarmism helped them succeed?
As long as there also exists a moderate faction & and there still exists background debates on the object-level, I feel like having a standard social activism movement wd be overall very welcome.
Curious if anyone here knows the relevant literature on the topic, e.g. details in the radical flank literature.
How much do you anticipate protests characterizing the AI Safety community, and why is that important to you?
The analogy here would be climate scientists and climate protesters. Afaik climate protesters have not delegitimised climate scientists or made them seem like miscalibrated alarmists (perhaps even the opposite).