AI consciousness and valenced sensations: unknowability?
Variant of Chinese room argument? This seems ironclad to me, what am I missing:
My claims:
Claim: AI feelings are unknowable: Maybe an advanced AI can have positive and negative sensations. But how would we ever know which ones are which (or how extreme they are?
Corollary: If we cannot know which are which, we can do nothing that we know will improve/worsen the “AI feelings”; so it’s not decision-relevant
Justification I: As we ourselves are bio-based living things, we can infer from the apparent sensations and expressions of bio-based living things that they are happy/suffering. But for non-bio things, this analogy seems highly flawed. If a dust cloud converges on a ‘smiling face’, we should not think it is happy.
Justification II. (Related) AI, as I understand it, is coded to learn to solve problems and maximise things, optimize certain outcomes or do things it “thinks” will yield positive feedback.
We might think then, that the AI ‘wants’ to solve these problems, and things that bring it closer to the solution make it ‘happier’. But why should we think this? For all we know, it may feel pain when it gets closer to the objective, and pleasure when it avoids this.
Does it tell us it makes it happy to come closer to the solution? That may merely because we programmed it to learn how to come to a solution, and one thing it ‘thinks’ will help is telling us it gets pleasure from doing so, even though it actually gains pain.
A colleague responded:
If we get the AI through a search process (like training a neural network) then there’s a reason to believe that AI would feel positive sensations (if any sensations at all) from achieving its objective since an AI that feels positive sensations would perform better at its objective than an AI that feels negative sensations. So, the AI that better optimizes for the objective would be more likely to result from the search process. This feels analogous to how we judge bio-based living things in that we assume that humans/animals/others seek to do those things that make them feel good, and we find that the positive sensations of humans are tied closely to those things that evolution would have been optimizing for. A version of a human that felt pain instead of pleasure from eating sugary food would not have performed as well on evolution’s optimization criteria.
OK but this seems only if we:
Knew how to induce or identify “good feelings”
Decided to induce these and tie them in as a reward for getting close to the optimum.
But how on earth would we know how to do 1 (without biology at least) and why would we bother doing so? Couldn’t the machine be just as good an optimizer without getting a ‘feeling’ reward from optimizing?
AI consciousness and valenced sensations: unknowability?
Variant of Chinese room argument? This seems ironclad to me, what am I missing:
My claims:
Claim: AI feelings are unknowable: Maybe an advanced AI can have positive and negative sensations. But how would we ever know which ones are which (or how extreme they are?
Corollary: If we cannot know which are which, we can do nothing that we know will improve/worsen the “AI feelings”; so it’s not decision-relevant
Justification I: As we ourselves are bio-based living things, we can infer from the apparent sensations and expressions of bio-based living things that they are happy/suffering. But for non-bio things, this analogy seems highly flawed. If a dust cloud converges on a ‘smiling face’, we should not think it is happy.
Justification II. (Related) AI, as I understand it, is coded to learn to solve problems and maximise things, optimize certain outcomes or do things it “thinks” will yield positive feedback.
We might think then, that the AI ‘wants’ to solve these problems, and things that bring it closer to the solution make it ‘happier’. But why should we think this? For all we know, it may feel pain when it gets closer to the objective, and pleasure when it avoids this.
Does it tell us it makes it happy to come closer to the solution? That may merely because we programmed it to learn how to come to a solution, and one thing it ‘thinks’ will help is telling us it gets pleasure from doing so, even though it actually gains pain.
A colleague responded:
OK but this seems only if we:
Knew how to induce or identify “good feelings”
Decided to induce these and tie them in as a reward for getting close to the optimum.
But how on earth would we know how to do 1 (without biology at least) and why would we bother doing so? Couldn’t the machine be just as good an optimizer without getting a ‘feeling’ reward from optimizing?
Please tell me why I’m wrong.