But even if we could be confident that entertainment would hypothetically outweigh sex crimes on pure utilitarian grounds, in the real world with real politics and EA critics, I do not think this position would be tenable.
Isn’t this basically society’s revealed position on, say, cameras? People can and do use cameras for sex crimes (e.g. voyeurism) but we don’t regulate cameras in order to reduce sex crimes.
I agree that PR-wise it’s not a great look to say that benefits outweigh risks when the risks are sex crimes but that’s because PR diverges wildly from reality. (And if cameras were invented today, I’d expect we’d have the same PR arguments about them.)
None of this is to imply a position on deepfakes—I don’t know nearly enough about them. My position is just that it should in fact come down to a cost/benefit calculation.
I could also very easily list a lot of non-entertainment uses of film involving stuff like education, communication, etc.
Random nitpick, but text-to-image models seem plausibly very useful for education and communication. I would love for people’s slide decks with pages and pages of text to be replaced by images that convey the same points better. Maybe imagine Distill-like graphics / papers, except that it no longer takes 5x as long to produce them relative to a normal paper.
We agree for sure that cost/benefit ought be better articulated when deploying these models (see the What Do We Want section on Cost-Benefit Analysis). The problem here really is the culture of blindly releasing and open-sourcing models like this, using a Go Fast And Break Things mentality, without at least making a case for what the benefits are, what the harms are, and not appealing to any existing standard when making these decisions.
Again, it’s possible (but not our position) that the specifics of DALLE-2 don’t bother you as much, but certainly the current culture we have around such models and their deployment seems an unambiguously alarming development.
The text-to-image models for education + communication here seems like a great idea! Moreover, I think it’s definitely consistent with what we’ve put forth here too, since you could probably fine-tune on graphics contained in papers related to your task at hand. The issue here really is that people are incurring unnecessary amounts of risk by making, say, an automatic Distill-er by using all images on the internet or something like that, when training on a smaller corpora would probably suffice, and vastly reduce the amount of possible risk of a model intended originally for Distill-ing papers. The fundamental position we advance that better protocols are needed before we start mass-deploying these models, and not that NO version of these models / technologies could be beneficial, ever.
Isn’t this basically society’s revealed position on, say, cameras? People can and do use cameras for sex crimes (e.g. voyeurism) but we don’t regulate cameras in order to reduce sex crimes.
I agree that PR-wise it’s not a great look to say that benefits outweigh risks when the risks are sex crimes but that’s because PR diverges wildly from reality. (And if cameras were invented today, I’d expect we’d have the same PR arguments about them.)
None of this is to imply a position on deepfakes—I don’t know nearly enough about them. My position is just that it should in fact come down to a cost/benefit calculation.
Random nitpick, but text-to-image models seem plausibly very useful for education and communication. I would love for people’s slide decks with pages and pages of text to be replaced by images that convey the same points better. Maybe imagine Distill-like graphics / papers, except that it no longer takes 5x as long to produce them relative to a normal paper.
We agree for sure that cost/benefit ought be better articulated when deploying these models (see the What Do We Want section on Cost-Benefit Analysis). The problem here really is the culture of blindly releasing and open-sourcing models like this, using a Go Fast And Break Things mentality, without at least making a case for what the benefits are, what the harms are, and not appealing to any existing standard when making these decisions.
Again, it’s possible (but not our position) that the specifics of DALLE-2 don’t bother you as much, but certainly the current culture we have around such models and their deployment seems an unambiguously alarming development.
The text-to-image models for education + communication here seems like a great idea! Moreover, I think it’s definitely consistent with what we’ve put forth here too, since you could probably fine-tune on graphics contained in papers related to your task at hand. The issue here really is that people are incurring unnecessary amounts of risk by making, say, an automatic Distill-er by using all images on the internet or something like that, when training on a smaller corpora would probably suffice, and vastly reduce the amount of possible risk of a model intended originally for Distill-ing papers. The fundamental position we advance that better protocols are needed before we start mass-deploying these models, and not that NO version of these models / technologies could be beneficial, ever.