Overall a nice system card for Opus 4!
It’s a strange choice to contract Apollo to evaluate sabotage, get feedback that “[early snapshot] schemes and deceives at such high rates that we advise against deploying this model....” and then not re-contracting Apollo for final evals
Another relevant comment: