calebp comments on We won’t solve post-alignment problems by doing research

calebp 13 Mar 2026 7:43 UTC
5 points
0 ∶ 0
I found this post hard to engage with, I’m not quite sure why. I think it’s pointing at some important areas so I’ve tried to write our some of my confusions.

I don’t understand why you believe that these problems won’t be solved by “ASI” or “human-level AI”—presumably, if they are tractable for humans, they’ll be tractable for human-level AIs. Agree that making sure that these systems are used for other problems is important and a lot of that work is “solving the alignment problem”.

I think you might be using terms like AGI and ASI in non-standard ways, e.g. “Approach 4: Research how to steer ASI toward solving non-alignment problems [like philosophy]”. It’s plausible that very powerful AI systems are less good at philosophy than they are at tasks that are cheaper to evaluate—but they’ll almost definitionally be better at philosophy than current humans. I think this is concerning for a bunch of reasons (including doing good alignment research in the run up to ASI) , but I’m not very worried about situations where we succeed at aligning ASI and then can’t get good philosophy research out of it (at least by human standards of good) for capability reasons.

Also, approach 3 (pause at human-level AI) probably does help with misalignment risks relative to the counterfactual of just proceeding to ASI. For reasons like AI control, and having much stronger evidence for our ability to control human-level intelligences than super-intelligences.

I agree with some of the early parts of the post—I definitely feel the community has a lot of researchers and not enough people doing other things though, I suspect that many of the other things people imagine reading this post, are also not very useful for the non-alignment AI problems you described.