Executive summary: OpenAI’s tests to determine whether its o1-preview model can help novices create chemical and biological weapons are inconclusive and do not definitively establish the model’s risk level.
Key points:
OpenAI conducted three multiple-choice tests to assess the model’s potential for assisting in CBRN (Chemical, Biological, Radiological, and Nuclear) weapon creation
The model performed comparably to or near expert levels on two of the three tests, contrary to OpenAI’s claim that it cannot meaningfully help novices
The Cloning Scenarios test showed the largest performance gap, but even this result is potentially improvable through iterative problem-solving or tool integration
OpenAI raised the CBRN risk level from “low” to “medium” based on these tests, but the evidence does not clearly support this classification
The company plans further wet lab evaluations but released o1-preview before completing comprehensive testing
More research is needed to definitively determine the model’s potential for assisting in dangerous scientific tasks
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: OpenAI’s tests to determine whether its o1-preview model can help novices create chemical and biological weapons are inconclusive and do not definitively establish the model’s risk level.
Key points:
OpenAI conducted three multiple-choice tests to assess the model’s potential for assisting in CBRN (Chemical, Biological, Radiological, and Nuclear) weapon creation
The model performed comparably to or near expert levels on two of the three tests, contrary to OpenAI’s claim that it cannot meaningfully help novices
The Cloning Scenarios test showed the largest performance gap, but even this result is potentially improvable through iterative problem-solving or tool integration
OpenAI raised the CBRN risk level from “low” to “medium” based on these tests, but the evidence does not clearly support this classification
The company plans further wet lab evaluations but released o1-preview before completing comprehensive testing
More research is needed to definitively determine the model’s potential for assisting in dangerous scientific tasks
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.