What role does he expect increasing intelligence and agency of AIs to play for the difficulty of them reaching their goal, i.e. achieving alignment of ASI (in less than 4 years)? What chance does he estimate for them achieving this? What is the most likely reason for the approach to fail? How could one assess the odds that their approach works? Doesn’t one at a minimum need to know both the extent to which the alignment problem gets more difficult for increasingly intelligent & agentic AI, as well as the extent to which automated alignment research scales current safety efforts?
What role does he expect increasing intelligence and agency of AIs to play for the difficulty of them reaching their goal, i.e. achieving alignment of ASI (in less than 4 years)?
What chance does he estimate for them achieving this? What is the most likely reason for the approach to fail?
How could one assess the odds that their approach works?
Doesn’t one at a minimum need to know both the extent to which the alignment problem gets more difficult for increasingly intelligent & agentic AI, as well as the extent to which automated alignment research scales current safety efforts?