This paper examines the technical literature on deepfakes to assess the threat they pose. It draws two conclusions. First, the malicious use of crudely generated deepfakes will become easier with time as the technology commodifies. Yet the current state of deepfake detection suggests that these fakes can be kept largely at bay.
Second, tailored deepfakes produced by technically sophisticated actors will represent the greater threat over time. Even moderately resourced campaigns can access the requisite ingredients for generating a custom deepfake. However, factors such as the need to avoid attribution, the time needed to train an ML model, and the availability of data will constrain how sophisticated actors use tailored deepfakes in practice.
Based on this assessment, the paper makes four recommendations:
Build a Deepfake “Zoo”: Identifying deepfakes relies on rapid access to examples of synthetic media that can be used to improve detection algorithms. Platforms, researchers, and companies should invest in the creation of a deepfake “zoo” that aggregates and makes freely available datasets of synthetic media as they appear online.
Encourage Better Capabilities Tracking: The technical literature around ML provides critical insight into how disinformation actors will likely use deepfakes in their operations, and the limitations they might face in doing so. However, inconsistent documentation practices among researchers hinders this analysis. Research communities, funding organizations, and academic publishers should work toward developing common standards for reporting progress in generative models.
Commodify Detection: Broadly distributing detection technology can inhibit the effectiveness of deepfakes. Government agencies and philanthropic organizations should distribute grants to help translate research findings in deepfake detection into user-friendly apps for analyzing media. Regular training sessions for journalists and professions likely to be targeted by these types of techniques may also limit the extent to which members of the public are duped.
Proliferate Radioactive Data: Recent research has shown that datasets can be made “radioactive.” ML systems trained on this kind of data generate synthetic media that can be easily identified. Stakeholders should actively encourage the “radioactive” marking of public datasets likely to train deep generative models. This would significantly lower the costs of detection for deepfakes generated by commodified tools. It would also force more sophisticated disinformation actors to source their own datasets to avoid detection
Excerpt from Deepfakes: A Grounded Threat Assessment—Center for Security and Emerging Technology (I haven’t read the whole paper):