Executive summary: The distinction between “alignment research” and “capabilities research” is problematic, and should be replaced with a focus on worst-case scenarios and cognitive understanding of AI systems.
Key points:
Categorizing research as “alignment” or “capabilities” based on impacts is difficult due to unpredictable effects and disagreements about threat models.
Most valuable alignment research should focus on worst-case scenarios rather than average performance.
A scientific, cognitivist approach to understanding AI systems is more useful for alignment than a behaviorist one.
The author proposes a two-dimensional categorization of AI research based on focus (average-case to worst-case) and approach (engineering to cognitivist science).
“Alignment research” should refer to work closer to worst-case, cognitivist science, while “capabilities research” refers to average-case engineering.
This framework may evolve as the field progresses towards a unified science of artificial and biological cognition.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: The distinction between “alignment research” and “capabilities research” is problematic, and should be replaced with a focus on worst-case scenarios and cognitive understanding of AI systems.
Key points:
Categorizing research as “alignment” or “capabilities” based on impacts is difficult due to unpredictable effects and disagreements about threat models.
Most valuable alignment research should focus on worst-case scenarios rather than average performance.
A scientific, cognitivist approach to understanding AI systems is more useful for alignment than a behaviorist one.
The author proposes a two-dimensional categorization of AI research based on focus (average-case to worst-case) and approach (engineering to cognitivist science).
“Alignment research” should refer to work closer to worst-case, cognitivist science, while “capabilities research” refers to average-case engineering.
This framework may evolve as the field progresses towards a unified science of artificial and biological cognition.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.