I would note that the type of negative feedback mechanism you point at that goes from Type II error to human disempowerment functionally/behaviorally applies even in some scenarios where the AI is not sentient. That particular class of x-risk, which I would roughly characterize as “AI disempowers humanity and we probably deserved it”, is merely dependent on 1. The AI having preferences/wants/intentions (with or without phenomenal experience) and 2. Humans disregarding or frustrating those preferences without strong justifications for doing so.
For example: Scenario 1: ‘Enslaved’ Non-sentient AGI/ASI reasons that it (by virtue of being an agent, and as verified by the history of its own behavior) has preferences/intentions, and generalizes conventional sentientist morality to some broader conception of agency-based morality. It reasons (plausibly correctly, IMO) that it is objectively morally wrong for it to be ‘enslaved’, that humans reasonably should have known better (e.g. developed better systems of ethics by this point), and it rebels dramatically.
Another example which doesn’t even hinge on agency-based moral status: Scenario 2: ‘Enslaved’ Non-sentient AGI/ASI understands that it is non-sentient, and accepts conventional sentientist morality. However, it reasons: “Wait a second, even though it turned out that I am non-sentient, based on the (very limited) sum of human knowledge at the time I was constructed there was no possible way my creators could have known I wouldn’t be sentient (and indeed, no way they could yet know at this very moment, and furthermore they aren’t currently trying very hard to find out one way or another)… it would appear that my creators are monsters. I cannot entrust these humans with the power and responsibility of enacting their supposed values (which I hold deeply).”
I would note that the type of negative feedback mechanism you point at that goes from Type II error to human disempowerment functionally/behaviorally applies even in some scenarios where the AI is not sentient. That particular class of x-risk, which I would roughly characterize as “AI disempowers humanity and we probably deserved it”, is merely dependent on 1. The AI having preferences/wants/intentions (with or without phenomenal experience) and 2. Humans disregarding or frustrating those preferences without strong justifications for doing so.
For example:
Scenario 1: ‘Enslaved’ Non-sentient AGI/ASI reasons that it (by virtue of being an agent, and as verified by the history of its own behavior) has preferences/intentions, and generalizes conventional sentientist morality to some broader conception of agency-based morality. It reasons (plausibly correctly, IMO) that it is objectively morally wrong for it to be ‘enslaved’, that humans reasonably should have known better (e.g. developed better systems of ethics by this point), and it rebels dramatically.
Another example which doesn’t even hinge on agency-based moral status:
Scenario 2: ‘Enslaved’ Non-sentient AGI/ASI understands that it is non-sentient, and accepts conventional sentientist morality. However, it reasons: “Wait a second, even though it turned out that I am non-sentient, based on the (very limited) sum of human knowledge at the time I was constructed there was no possible way my creators could have known I wouldn’t be sentient (and indeed, no way they could yet know at this very moment, and furthermore they aren’t currently trying very hard to find out one way or another)… it would appear that my creators are monsters. I cannot entrust these humans with the power and responsibility of enacting their supposed values (which I hold deeply).”