Executive summary: Despite claims, AI companies are not consistently using external evaluators to assess their models for dangerous capabilities before deployment, which could improve risk assessment and provide public accountability.
Key points:
External evaluators like METR, UK AISI, and Apollo are not being given sufficient pre-deployment access to models to effectively evaluate risks.
Evaluators should be given deeper access than end users, including versions without safety filters or fine-tuning, to better assess threats from both deployment and potential leaks.
The costs and challenges of providing external evaluator access are unclear, but likely surmountable.
Some AI labs have committed to external “red teaming” but this is not a substitute for comprehensive model evaluation by dedicated experts.
AI labs should also provide deeper model access to safety researchers to support their work, potentially without fully releasing weights.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: Despite claims, AI companies are not consistently using external evaluators to assess their models for dangerous capabilities before deployment, which could improve risk assessment and provide public accountability.
Key points:
External evaluators like METR, UK AISI, and Apollo are not being given sufficient pre-deployment access to models to effectively evaluate risks.
Evaluators should be given deeper access than end users, including versions without safety filters or fine-tuning, to better assess threats from both deployment and potential leaks.
The costs and challenges of providing external evaluator access are unclear, but likely surmountable.
Some AI labs have committed to external “red teaming” but this is not a substitute for comprehensive model evaluation by dedicated experts.
AI labs should also provide deeper model access to safety researchers to support their work, potentially without fully releasing weights.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.