When I bring this up with EAs who are focused on AI safety, many of them suggest that we only need to get AI safety right and then the AI can solve the question of what consciousness is. This seems like a plausible response to me. However, there are some possible future scenarios where this might not be true. If we have to directly specify our values to a superintelligent AI, rather than it learning the value more indirectly, we might have to specify a definition of consciousness for it. It might also be good to have a failsafe mechanism that would prevent an AI from switching off before implementing any scenario that involved a lot of suffering, and to do this we might have to roughly understand in advance which beings are and are not conscious.
It seems like there is some asymmetry here as is common with extinction risk arguments: if we think that we will, eventually, figure out what consciousness is then, as long as we don’t go extinct, we will eventually create positive AGI. Whereas, if we focus on consciousness and then AGI kills everyone, we never get to a positive outcome.
I think the original argument works if our values get “locked in” once we create AGI, which is not an unreasonable thing to assume, but also doesn’t seem guaranteed. Am I thinking through this correctly?
Lock in can also apply to “value-precursors” that determine how one goes about moral reflection, or which types of appeals one ends up finding convincing. I think these would get locked in to some degree (because something has to be fixed for it to be meaningful to talk about goalposts at all), and by affecting the precursors, moral or meta-philosophical reflection before aligned AGI can plausibly affect the outcomes post-AGI. It’s not very clear however whether that’s important, and from whose perspective it is important, because some of the things that mask as moral uncertainty might be humans having underdetermined values.
It seems like there is some asymmetry here as is common with extinction risk arguments: if we think that we will, eventually, figure out what consciousness is then, as long as we don’t go extinct, we will eventually create positive AGI. Whereas, if we focus on consciousness and then AGI kills everyone, we never get to a positive outcome.
I think the original argument works if our values get “locked in” once we create AGI, which is not an unreasonable thing to assume, but also doesn’t seem guaranteed. Am I thinking through this correctly?
There’s some related discussion here.
Lock in can also apply to “value-precursors” that determine how one goes about moral reflection, or which types of appeals one ends up finding convincing. I think these would get locked in to some degree (because something has to be fixed for it to be meaningful to talk about goalposts at all), and by affecting the precursors, moral or meta-philosophical reflection before aligned AGI can plausibly affect the outcomes post-AGI. It’s not very clear however whether that’s important, and from whose perspective it is important, because some of the things that mask as moral uncertainty might be humans having underdetermined values.