Rohin Shah comments on Will we get automated alignment research before an AI Takeoff?

Rohin Shah 29 Jan 2026 17:47 UTC
4 points
0 ∶ 0
Or is this a stronger claim that safety work is inherently a more short-time horizon thing?
It is more like this stronger claim.
I might not use “inherently” here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The “natural” test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.