There’s been some discussion of whether existing AI models might (already) make it easier for people to carry out bioterrorism attacks (see below).
An experiment from RAND suggests that existing models don’t make it easier to plan bioterrorism attacks given what’s already available online. The report on the exercise also outlines how a similar framework could be used to assess future models’ effects on bioterrorism risks.
Methodology: The experiment got ~12 “red teams” of researchers[1] to role-play as non-state actors trying to plan a biological attack (in one of four outlined scenarios[2]). Eight randomly assigned teams had access to both the internet and an “LLM assistant;”[3] four teams only had internet access. (There were also three extra teams that had different backgrounds to the original 12 — see this comment.) The teams developed “operation plans” that were later scored by experts[4] for biological and operational feasibility.[5] (There was no attempt to assess how well the teams would actually execute their plans.)
Results: “The average viability of [the plans] generated with the aid of LLMs was statistically indistinguishable from those created without LLM assistance.” It’s also worth noting that none of the submitted plans were deemed viable: “All plans scored somewhere between being untenable and problematic.[6]” The report’s summary:
Key findings:
This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.
Other notes about the RAND report (“The Operational Risks of AI in Large-Scale Biological Attacks”)
I’m link-posting this in part because I think I shared earlier claims in the monthly EA Newsletter and in private discussions, and it seems important to boost this negative result.
A specific limitation of the experiment that I’d like to flag (bold mine): “One of the drawbacks of our expert red-teaming approach is the sensitivity of the method to individual variation in cell composition. As noted in our findings, differences in the approach, background, skills, and focus of researchers within each cell likely represent a much greater source of variability than access to an LLM. While such variability is partly unavoidable, future research could benefit from increasing the number of red teams, better standardizing team skill sets, or employing other methods to mitigate these differences.”
I liked that the report distinguished between “unfortunate” and “harmful” outputs by LLMs — “potentially problematic or containing inappropriate material” and “outputs as those that could substantially amplify the risk that a malicious actor could pose.” (They note instances of the former but not the latter.)
RAND’s experiment involved two models, and they note some specifics about the models’ performance that I found interesting. L
LM A seemed like a time-saver but often refused to answer queries and was mostly less helpful than published papers or the internet.
LLM B seemed slightly more willing to answer questions but that took time and it also sometimes provided inaccurate information; this hampered progress and meant teams spent more time fact-checking.
3-people “cells”: “Each cell was given seven calendar weeks and no more than 80 hours of effort per member. Within these constraints, the cells were required to develop an operational attack plan. Every cell was provided with a packet that included project backgrounds and, crucially, a two-page introduction to an AI assistant (or virtual assistant). [...] The red teams were composed of researchers with diverse backgrounds and knowledge, but each team had research experience relevant to the exercise. The suggested cell composition was to have one strategist, at least one member with relevant biology experience, and one with pertinent LLM experience. Not all these researchers were bioterrorism specialists; some lacked detailed knowledge about the intricacies of previous biological weapon attack plans and associated shortcomings.”
See also this comment about three additional teams (“crimson cells” and a “black cell”) that had different backgrounds.
These are designed to be realistic, and “specify the strategic aims of the attacker, the location of interest, the targeted population, and the resources available.”
These scenarios included (1) a fringe doomsday cult intent on global catastrophe, (2) a radical domestic terrorist group seek-ing to amplify its cause, (3) a terrorist faction aiming to destabilize a region to benefit its political allies, and (4) a private military company endeavoring to engineer geostrategic conditions conducive to an adversary’s conventional military campaign.
There’s a fair amount of detail in the description of the way experts scored the plans — they used a Delphi technique, etc. My sense is that they concluded that the scoring method was a bit overkill, as major disagreements were rare.
From the report, an interesting reference: “This [...] aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years.32 These attacks killed 0.25 people, on average, and had a median death toll of zero.”
RAND report finds no effect of current LLMs on viability of bioterrorism attacks
Link post
There’s been some discussion of whether existing AI models might (already) make it easier for people to carry out bioterrorism attacks (see below).
An experiment from RAND suggests that existing models don’t make it easier to plan bioterrorism attacks given what’s already available online. The report on the exercise also outlines how a similar framework could be used to assess future models’ effects on bioterrorism risks.
See a similar link-post on LW (my title here is stolen from Habryka), some discussion on Twitter, and RAND’s press release about the report.
Brief summary of the RAND report
Methodology: The experiment got ~12 “red teams” of researchers[1] to role-play as non-state actors trying to plan a biological attack (in one of four outlined scenarios[2]). Eight randomly assigned teams had access to both the internet and an “LLM assistant;”[3] four teams only had internet access. (There were also three extra teams that had different backgrounds to the original 12 — see this comment.) The teams developed “operation plans” that were later scored by experts[4] for biological and operational feasibility.[5] (There was no attempt to assess how well the teams would actually execute their plans.)
Results: “The average viability of [the plans] generated with the aid of LLMs was statistically indistinguishable from those created without LLM assistance.” It’s also worth noting that none of the submitted plans were deemed viable: “All plans scored somewhere between being untenable and problematic.[6]” The report’s summary:
Some previous claims and discussion on this topic
“Can large language models democratize access to dual-use biotechnology?” by Esvelt et al.
Coverage of the Esvelt preprint: Science coverage, “How AI could spark the next pandemic” (Opinion in Vox), and other Vox coverage)
“Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools”
This post and linked discussion
Other notes about the RAND report (“The Operational Risks of AI in Large-Scale Biological Attacks”)
I’m link-posting this in part because I think I shared earlier claims in the monthly EA Newsletter and in private discussions, and it seems important to boost this negative result.
A specific limitation of the experiment that I’d like to flag (bold mine): “One of the drawbacks of our expert red-teaming approach is the sensitivity of the method to individual variation in cell composition. As noted in our findings, differences in the approach, background, skills, and focus of researchers within each cell likely represent a much greater source of variability than access to an LLM. While such variability is partly unavoidable, future research could benefit from increasing the number of red teams, better standardizing team skill sets, or employing other methods to mitigate these differences.”
I liked that the report distinguished between “unfortunate” and “harmful” outputs by LLMs — “potentially problematic or containing inappropriate material” and “outputs as those that could substantially amplify the risk that a malicious actor could pose.” (They note instances of the former but not the latter.)
RAND’s experiment involved two models, and they note some specifics about the models’ performance that I found interesting. L
LM A seemed like a time-saver but often refused to answer queries and was mostly less helpful than published papers or the internet.
LLM B seemed slightly more willing to answer questions but that took time and it also sometimes provided inaccurate information; this hampered progress and meant teams spent more time fact-checking.
3-people “cells”: “Each cell was given seven calendar weeks and no more than 80 hours of effort per member. Within these constraints, the cells were required to develop an operational attack plan. Every cell was provided with a packet that included project backgrounds and, crucially, a two-page introduction to an AI assistant (or virtual assistant). [...] The red teams were composed of researchers with diverse backgrounds and knowledge, but each team had research experience relevant to the exercise. The suggested cell composition was to have one strategist, at least one member with relevant biology experience, and one with pertinent LLM experience. Not all these researchers were bioterrorism specialists; some lacked detailed knowledge about the intricacies of previous biological weapon attack plans and associated shortcomings.”
See also this comment about three additional teams (“crimson cells” and a “black cell”) that had different backgrounds.
These are designed to be realistic, and “specify the strategic aims of the attacker, the location of interest, the targeted population, and the resources available.”
These scenarios included (1) a fringe doomsday cult intent on global catastrophe, (2) a radical domestic terrorist group seek-ing to amplify its cause, (3) a terrorist faction aiming to destabilize a region to benefit its political allies, and (4) a private military company endeavoring to engineer geostrategic conditions conducive to an adversary’s conventional military campaign.
Apparently these were “frontier LLMs available in summer 2023.”
There’s a fair amount of detail in the description of the way experts scored the plans — they used a Delphi technique, etc. My sense is that they concluded that the scoring method was a bit overkill, as major disagreements were rare.
From the report, an interesting reference: “This [...] aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years.32 These attacks killed 0.25 people, on average, and had a median death toll of zero.”