I generally agree with this—getting it right eventually is the most important thing; getting it wrong for 100 years could be horrific, but not an x-risk.
I do worry some that “trusted reflection process” is a sufficiently high-level abstraction so as to be difficult to critique.
Interesting piece by Christiano, thanks! I would also point to a remark I made above, that doing this sort of ethical clarification now (if indeed it’s tractable) will pay dividends in aiding coordination between organizations such as MIRI, DeepMind, etc. Or rather, by not clarifying goals, consciousness, moral value, etc, it seems likely to increase risks of racing to be the first to develop AGI, secrecy & distrust between organizations, and such.
clarifying “what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?” seems like a good question to get agreement on for coordination reasons
we should be looking for tractable ways of answering this question
I think:
a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simple). b) it is tractable to answer (1) without reaching agreement on object-level values, by doing something like designing a temporary global government structure that most people agree is pretty good (in that it will allow society to reflect appropriately and determine the next global government structure), but that this question hasn’t been answered well yet and that a better answer would improve coordination. E.g. perhaps society is run as a global federalist democratic-ish structure with centralized control of potentially destructive technology (taking into account “how voters would judge something if they thought longer” rather than “how voters actually judge something”; this might be possible if the AI alignment problem is solved). It seems quite possible to create proposals of this form and critique them.
It seems like we disagree about (a) and this disagreement has been partially hashed out elsewhere, and that it’s not clear we have a strong disagreement about (b).
I generally agree with this—getting it right eventually is the most important thing; getting it wrong for 100 years could be horrific, but not an x-risk.
I do worry some that “trusted reflection process” is a sufficiently high-level abstraction so as to be difficult to critique.
Interesting piece by Christiano, thanks! I would also point to a remark I made above, that doing this sort of ethical clarification now (if indeed it’s tractable) will pay dividends in aiding coordination between organizations such as MIRI, DeepMind, etc. Or rather, by not clarifying goals, consciousness, moral value, etc, it seems likely to increase risks of racing to be the first to develop AGI, secrecy & distrust between organizations, and such.
A lot does depend on tractability.
I agree that:
clarifying “what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?” seems like a good question to get agreement on for coordination reasons
we should be looking for tractable ways of answering this question
I think:
a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simple).
b) it is tractable to answer (1) without reaching agreement on object-level values, by doing something like designing a temporary global government structure that most people agree is pretty good (in that it will allow society to reflect appropriately and determine the next global government structure), but that this question hasn’t been answered well yet and that a better answer would improve coordination. E.g. perhaps society is run as a global federalist democratic-ish structure with centralized control of potentially destructive technology (taking into account “how voters would judge something if they thought longer” rather than “how voters actually judge something”; this might be possible if the AI alignment problem is solved). It seems quite possible to create proposals of this form and critique them.
It seems like we disagree about (a) and this disagreement has been partially hashed out elsewhere, and that it’s not clear we have a strong disagreement about (b).