Executive summary: Propensity evaluations assess an AI system’s tendency to prioritize certain behaviors over others, using non-capability-related criteria, and differ from capability evaluations in key ways that impact their implementation and interpretation.
Key points:
Propensity evaluations measure relative priorities between behavior clusters, using at least some non-capability features, unlike capability evaluations.
Current propensity evaluations often have high capability-dependence, potentially misrepresenting alignment trends as models scale up.
Propensity evaluations differ from capability evaluations in elicitation modes, predictability of scaling laws, abstraction levels, and sources of truth used.
Various propensities are currently evaluated, including truthfulness, harmlessness, and power-seeking, using black-box, white-box, and no-box approaches.
Game theory is proposed as a tool to create more rigorous behavioral clusters for propensity evaluations.
Propensity evaluations are important for AI safety research, risk assessment, and governance, but are not yet well-understood or standardized.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: Propensity evaluations assess an AI system’s tendency to prioritize certain behaviors over others, using non-capability-related criteria, and differ from capability evaluations in key ways that impact their implementation and interpretation.
Key points:
Propensity evaluations measure relative priorities between behavior clusters, using at least some non-capability features, unlike capability evaluations.
Current propensity evaluations often have high capability-dependence, potentially misrepresenting alignment trends as models scale up.
Propensity evaluations differ from capability evaluations in elicitation modes, predictability of scaling laws, abstraction levels, and sources of truth used.
Various propensities are currently evaluated, including truthfulness, harmlessness, and power-seeking, using black-box, white-box, and no-box approaches.
Game theory is proposed as a tool to create more rigorous behavioral clusters for propensity evaluations.
Propensity evaluations are important for AI safety research, risk assessment, and governance, but are not yet well-understood or standardized.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.