AI safety milestones?

Optimal governance interventions depend on progress in technical AI safety. For example, two rough technical safety milestones are have metrics to determine how scary a model’s capabilities are and have a tool to determine whether a model is trying to deceive you. Our governance plans should adapt based on whether these milestones have been achieved (or when it seems they will be achieved) (and for less binary milestones, adapt based on partial progress).

What are more possible milestones in technical AI safety (that might be relevant to governance interventions)?

Crossposted from LessWrong (7 points, 5 comments)
No comments.