Executive summary: Exploratory forecast arguing that general-purpose AI systems are approaching—and may soon cross—key biorisk capability thresholds: basic support for experts is likely already here, novices may be enabled to create known threats by late 2025, expert novel-threat assistance could emerge by late 2026, and novice novel-threat assistance near early 2028; the author emphasizes substantial uncertainty, patchy evaluations, and the need for clearer thresholds.
Key points:
Four thresholds, mapped to lab frameworks: The post distinguishes basic vs. advanced support for experts vs. novices, aligning them with OpenAI’s High/Critical and Anthropic’s CBRN-3/4 (ASL-3/4) levels; these are the crux capabilities that would materially change misuse risk.
Timelines (median): Now—basic support for experts; late 2025—basic support for novices (CBRN-3/High); late 2026—advanced support for experts (CBRN-4/Critical); early 2028—advanced support for novices (near-AGI). These are working estimates, not firm predictions.
Evidence synthesis is mixed and noisy: On VCT troubleshooting, top models (e.g., OpenAI o3) beat 94% of expert virologists, suggesting strong tacit help; uplift trials show partial novice assistance (Anthropic ~2.5×) but are hard to interpret; some red-teaming finds no significant novice uplift (e.g., Llama 3), and expert-uplift tests for novel threats remain inconclusive; DNA-screening evasion and assembly successes do not yet coincide.
“Basic for experts” likely already crossed: Indicators include VCT performance, widespread researcher use of AI assistants, and lab statements that models can help experts reproduce known threats; open-source systems may already clear this bar, expanding access risk.
Risk implications and cruxes: If novices reach CBRN-3-like capabilities (e.g., influenza synthesis help, attack planning competence, easier dual-use DNA acquisition), expert surveys suggest annual catastrophe risk jumps roughly an order of magnitude (to ~2–3%); key uncertainties are where to set rule-in thresholds, transparency of uplift trials, and how to measure “significant” support.
Forecasting logic beyond bio evals: Because expert-uplift data are thin, the author triangulates from broader capability trends (e.g., METR time-horizon scaling to 8-hour tasks by ~2026) to argue late-2026 plausibility for expert-level advanced support; recommends clearer red lines, better expert-uplift evaluations, and monitoring of open-source capability diffusion; follow-ups will extend to actor counts and protective measures (e.g., Esvelt’s “Delay, Detect, Defend”).
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: Exploratory forecast arguing that general-purpose AI systems are approaching—and may soon cross—key biorisk capability thresholds: basic support for experts is likely already here, novices may be enabled to create known threats by late 2025, expert novel-threat assistance could emerge by late 2026, and novice novel-threat assistance near early 2028; the author emphasizes substantial uncertainty, patchy evaluations, and the need for clearer thresholds.
Key points:
Four thresholds, mapped to lab frameworks: The post distinguishes basic vs. advanced support for experts vs. novices, aligning them with OpenAI’s High/Critical and Anthropic’s CBRN-3/4 (ASL-3/4) levels; these are the crux capabilities that would materially change misuse risk.
Timelines (median): Now—basic support for experts; late 2025—basic support for novices (CBRN-3/High); late 2026—advanced support for experts (CBRN-4/Critical); early 2028—advanced support for novices (near-AGI). These are working estimates, not firm predictions.
Evidence synthesis is mixed and noisy: On VCT troubleshooting, top models (e.g., OpenAI o3) beat 94% of expert virologists, suggesting strong tacit help; uplift trials show partial novice assistance (Anthropic ~2.5×) but are hard to interpret; some red-teaming finds no significant novice uplift (e.g., Llama 3), and expert-uplift tests for novel threats remain inconclusive; DNA-screening evasion and assembly successes do not yet coincide.
“Basic for experts” likely already crossed: Indicators include VCT performance, widespread researcher use of AI assistants, and lab statements that models can help experts reproduce known threats; open-source systems may already clear this bar, expanding access risk.
Risk implications and cruxes: If novices reach CBRN-3-like capabilities (e.g., influenza synthesis help, attack planning competence, easier dual-use DNA acquisition), expert surveys suggest annual catastrophe risk jumps roughly an order of magnitude (to ~2–3%); key uncertainties are where to set rule-in thresholds, transparency of uplift trials, and how to measure “significant” support.
Forecasting logic beyond bio evals: Because expert-uplift data are thin, the author triangulates from broader capability trends (e.g., METR time-horizon scaling to 8-hour tasks by ~2026) to argue late-2026 plausibility for expert-level advanced support; recommends clearer red lines, better expert-uplift evaluations, and monitoring of open-source capability diffusion; follow-ups will extend to actor counts and protective measures (e.g., Esvelt’s “Delay, Detect, Defend”).
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.