I mostly back-chain from a goal that I’d call “make the future go well”. This usually maps to value-aligning AI with broad human values, so that the future is full of human goodness and not tainted by my own personal fingerprints. Actually, ideally we first build an AI that we have the kind of control over so that the operators can make it do something that is less drastic than determining the entire future of humanity, e.g. slowing down AI progress to a halt until humanity pulls itself together and figures out more safe alignment techniques. That usually means making it corrigible or tool-like, instead of letting it maximize its aligned values.
So I guess I ultimately want (ii) but really hope we can get a form of (i) as an intermediate step.
When I talk about the “alignment problem” I usually refer to the problem that we by default get neither (i) nor (ii).
I mostly back-chain from a goal that I’d call “make the future go well”. This usually maps to value-aligning AI with broad human values, so that the future is full of human goodness and not tainted by my own personal fingerprints. Actually, ideally we first build an AI that we have the kind of control over so that the operators can make it do something that is less drastic than determining the entire future of humanity, e.g. slowing down AI progress to a halt until humanity pulls itself together and figures out more safe alignment techniques. That usually means making it corrigible or tool-like, instead of letting it maximize its aligned values.
So I guess I ultimately want (ii) but really hope we can get a form of (i) as an intermediate step.
When I talk about the “alignment problem” I usually refer to the problem that we by default get neither (i) nor (ii).