I think that a corollary of the first point is that we can learn a lot about alignment by looking at humans who seem unusually aligned to human values (although I think more generally to the interests of all conscious beings), e.g. highly attained meditators with high integrity, altruistic motivations, rationality skills, and a healthy balance of sytematizer and empathizer mindsets. From phenomenological reports, their subagentic structures seem quite unlike anything most of us experience day to day. That, plus a few core philosophical assumptions, can get you a really long way in deducing e.g. Anthropic’s constitutional AI principles from first principles.
I think that a corollary of the first point is that we can learn a lot about alignment by looking at humans who seem unusually aligned to human values (although I think more generally to the interests of all conscious beings), e.g. highly attained meditators with high integrity, altruistic motivations, rationality skills, and a healthy balance of sytematizer and empathizer mindsets. From phenomenological reports, their subagentic structures seem quite unlike anything most of us experience day to day. That, plus a few core philosophical assumptions, can get you a really long way in deducing e.g. Anthropic’s constitutional AI principles from first principles.