Thanks for the response; I’ve found this discussion useful for clarifying and updating my views.
However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it’s tractable, based on work already done.)
Suppose we live under the wrong moral theory for 100 years. Then we figure out the right moral theory, and live according to that one for the rest of time. How much value is lost in that 100 years? It seems very high but not an x-risk. It seems like we only get x-risks if somehow we don’t put a trusted reflection process (e.g. human moral philosophers) in control of the far future.
It seems quite sensible for people who don’t put overwhelming importance on the far future to care about resolving moral uncertainty earlier. The part of my morality that isn’t exclusively concerned with the far future strongly approves of things like consciousness research that resolve moral uncertainty earlier.
Based on my understanding, I don’t think Act-based agents or Task AI would help resolve these questions by default, although as tools they could probably help.
Act-based agents and task AGI kick the problem of global governance to humans. Humans still need to decide questions like how to run governments; they’ll be able to use AGI to help them, but governing well is still a hard problem even with AI assistance. The goal would be that moral errors are temporary; with the right global government structure, moral philosophers will be able to make moral progress and have their moral updates reflected in how things play out.
It’s possible that you think that governing the world well enough that the future eventually reflects human values is very hard even with AGI assistance, and would be made easier with better moral theories made available early on.
One factor that bears mentioning is whether an AGI’s ontology & theory of ethics might be path-dependent upon its creators’ metaphysics in such a way that it would be difficult for it to update if it’s wrong. If this is a plausible concern, this would imply a time-sensitive factor in resolving the philosophical confusion around consciousness, valence, moral value, etc.
I agree with this but place low probability on the antecedent. It’s kind of hard to explain briefly; I’ll point to this comment thread for a good discussion (I mostly agree with Paul).
But now that I think about it more, I don’t put super low probability on the antecedent. It seems like it would be useful to have some way to compare different universes that we’ve failed to put in control of trusted reflection processes, to e.g. get ones that have less horrific suffering or more happiness. I place high probability on “distinguishing between such universes is as hard as solving the AI alignment problem in general”, but I’m not extremely confident of that and I don’t have a super precise argument for it. I guess I wouldn’t personally prioritize such research compared to generic AI safety research but it doesn’t seem totally implausible that resolving moral uncertainty earlier would reduce x-risk for this reason.
I generally agree with this—getting it right eventually is the most important thing; getting it wrong for 100 years could be horrific, but not an x-risk.
I do worry some that “trusted reflection process” is a sufficiently high-level abstraction so as to be difficult to critique.
Interesting piece by Christiano, thanks! I would also point to a remark I made above, that doing this sort of ethical clarification now (if indeed it’s tractable) will pay dividends in aiding coordination between organizations such as MIRI, DeepMind, etc. Or rather, by not clarifying goals, consciousness, moral value, etc, it seems likely to increase risks of racing to be the first to develop AGI, secrecy & distrust between organizations, and such.
clarifying “what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?” seems like a good question to get agreement on for coordination reasons
we should be looking for tractable ways of answering this question
I think:
a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simple). b) it is tractable to answer (1) without reaching agreement on object-level values, by doing something like designing a temporary global government structure that most people agree is pretty good (in that it will allow society to reflect appropriately and determine the next global government structure), but that this question hasn’t been answered well yet and that a better answer would improve coordination. E.g. perhaps society is run as a global federalist democratic-ish structure with centralized control of potentially destructive technology (taking into account “how voters would judge something if they thought longer” rather than “how voters actually judge something”; this might be possible if the AI alignment problem is solved). It seems quite possible to create proposals of this form and critique them.
It seems like we disagree about (a) and this disagreement has been partially hashed out elsewhere, and that it’s not clear we have a strong disagreement about (b).
Thanks for the response; I’ve found this discussion useful for clarifying and updating my views.
Suppose we live under the wrong moral theory for 100 years. Then we figure out the right moral theory, and live according to that one for the rest of time. How much value is lost in that 100 years? It seems very high but not an x-risk. It seems like we only get x-risks if somehow we don’t put a trusted reflection process (e.g. human moral philosophers) in control of the far future.
It seems quite sensible for people who don’t put overwhelming importance on the far future to care about resolving moral uncertainty earlier. The part of my morality that isn’t exclusively concerned with the far future strongly approves of things like consciousness research that resolve moral uncertainty earlier.
Act-based agents and task AGI kick the problem of global governance to humans. Humans still need to decide questions like how to run governments; they’ll be able to use AGI to help them, but governing well is still a hard problem even with AI assistance. The goal would be that moral errors are temporary; with the right global government structure, moral philosophers will be able to make moral progress and have their moral updates reflected in how things play out.
It’s possible that you think that governing the world well enough that the future eventually reflects human values is very hard even with AGI assistance, and would be made easier with better moral theories made available early on.
I agree with this but place low probability on the antecedent. It’s kind of hard to explain briefly; I’ll point to this comment thread for a good discussion (I mostly agree with Paul).
But now that I think about it more, I don’t put super low probability on the antecedent. It seems like it would be useful to have some way to compare different universes that we’ve failed to put in control of trusted reflection processes, to e.g. get ones that have less horrific suffering or more happiness. I place high probability on “distinguishing between such universes is as hard as solving the AI alignment problem in general”, but I’m not extremely confident of that and I don’t have a super precise argument for it. I guess I wouldn’t personally prioritize such research compared to generic AI safety research but it doesn’t seem totally implausible that resolving moral uncertainty earlier would reduce x-risk for this reason.
I generally agree with this—getting it right eventually is the most important thing; getting it wrong for 100 years could be horrific, but not an x-risk.
I do worry some that “trusted reflection process” is a sufficiently high-level abstraction so as to be difficult to critique.
Interesting piece by Christiano, thanks! I would also point to a remark I made above, that doing this sort of ethical clarification now (if indeed it’s tractable) will pay dividends in aiding coordination between organizations such as MIRI, DeepMind, etc. Or rather, by not clarifying goals, consciousness, moral value, etc, it seems likely to increase risks of racing to be the first to develop AGI, secrecy & distrust between organizations, and such.
A lot does depend on tractability.
I agree that:
clarifying “what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?” seems like a good question to get agreement on for coordination reasons
we should be looking for tractable ways of answering this question
I think:
a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simple).
b) it is tractable to answer (1) without reaching agreement on object-level values, by doing something like designing a temporary global government structure that most people agree is pretty good (in that it will allow society to reflect appropriately and determine the next global government structure), but that this question hasn’t been answered well yet and that a better answer would improve coordination. E.g. perhaps society is run as a global federalist democratic-ish structure with centralized control of potentially destructive technology (taking into account “how voters would judge something if they thought longer” rather than “how voters actually judge something”; this might be possible if the AI alignment problem is solved). It seems quite possible to create proposals of this form and critique them.
It seems like we disagree about (a) and this disagreement has been partially hashed out elsewhere, and that it’s not clear we have a strong disagreement about (b).