My current takes on AI Welfare
This is a post for debate week. Feel especially free to disagree and/or ask for clarifications.
I’m starting off the debate week weakly disagreeing with the debate statement. I’m at a point where I have too many strong uncertainties to think that work on AI Welfare would be positive or negative. I’m not very confident about any of these, and hope to change my mind for good reasons this week! Where possible I am identifying cruxes so that you can tell me why I’m wrong in a way which will change my mind.
Below are some reasons for my current position, in no particular order:
I’m unsure whether we can in principle ascertain whether a digital mind is conscious.
I’m ready to accept the chance of consciousness in animals because of extensive analogies to conscious human behaviour (like anhedonia, avoiding negative stimuli, activating differently given anaesthetics, etc…) plus a shared evolutionary history. Digital minds (or any AI systems that we would have some reason to suspect are conscious) would be developed very differently to human minds, with very different incentives (such as acting in ways that humans prefer). This could lead to behaviour analogous to conscious behaviour in humans, but with a very different mechanism or purpose, which does not actually produce qualia. I do not know how we would be able to tell the difference, even in theory.
A crux here is that philosophy of mind doesn’t really make much progress, and additionally, that we are unlikely to find a convincing science of consciousness.
AI welfare success could mean existential failure
Putting money into AI welfare research and or activism increases the chance of a future where we respect (at least some) AI systems as having moral value, comparable to humans. If we are wrong about this, and they are not in fact conscious, this could be a disaster:
In the shorter term, because treating the AI systems nicely might cost resources which could otherwise be used to accelerate technological progress, helping conscious humans and animals.
In the longer term, because a world full of professedly happy digital minds which are in fact non-conscious is a world devoid of value.
The worlds where EA involvement in this issue is useful may be very few
The world where EA research and advocacy for AI welfare is most crucial is one where the reasons to think that AI systems are conscious are non-obvious, such that we require research to discover them, and require advocacy to convince the broader public of them. But I think that world where this is true, and the advocacy succeeds, is a pretty unlikely one.
If we are in a world where advocacy for AI Welfare succeeds, then I think it is very likely we are in a world where AI systems which are used by the majority of the population are incentivised to act as if they were conscious, and form close relationships with their users. In this world, the important features of AI systems which advocates for their rights/ welfare would mention would be surface level/ very visible. I.e. we would not require research or openness to weird ideas in order to convince people to consider AI rights/ welfare.
Alternatively, if we are in a world where the signs of true AI consciousness are not visible without research (i.e. they are not isomorphic to features of the AI such as the text it outputs), then 1) research is not likely to change people’s minds if they find EA consciousness very implausible already 2) it is also not likely to change their minds if they find it very plausible, and the research argues that it AI is not in fact conscious. So whether the public is convinced or unconvinced on AI consciousness and welfare, research won’t be a factor.
A crux that I have here is that research that takes a while to explain is not going to inspire a popular movement. This links to another crux, that AI welfare would have to be popular in order to be enforced.
Okay, what comes to mind for me here is quantum mechanics and how we’ve come up with some pretty good analogies to explain parts of it.
Do we really need to communicate the full intricacies of AI sentience to say that an AI is conscious? I guess that this isn’t the case.
I think this is creating a potential false dichotomy?
Here’s what I believe might happen in AI Sentience without any intervention as an example:
1. Consciousness is IIT (Integrated Information Theory) or GWT (Global Workspace Theory) based in some way or another. In other words, we have some sort of underlying field of sentience like the electromagnetic field and when parts of the field interact in specific ways then “consciousness” appears as a point load in that field.
2. Consciousness is then only verifiable if this field has consequences on the other fields of reality; otherwise, it is non-popperian, like the Multiverse theory.
3. Number 2 is really hard to prove and so we’re left with very correlational evidence. It is also tightly connected to what we think of as metaphysics, meaning that we’re going to be quite confused about it.
4. Therefore, general legislators and researchers leave this up to chance and do not compute any complete metrics, as it is too difficult a problem. They hope that AIs don’t have sentience.
In this world, adding some AI sentience research from the EA Direction could have the consequences of:
1. Making AI labs have consciousness researchers on board so that they don’t torture billions of iterations of the same AI.
2. Make governments create consciousness legislation and think tanks for the rights of AI.
3. Create technical benchmarks and theories about what is deemed to be conscious (See this initial, really good report for example)
You don’t have to convince the general public; you have to convince the major stakeholders of tests that check for AI consciousness. It honestly seems kind of similar to what we have done for the safety of AI models but instead for the consciousness of them?
I’m quite excited for this week as it is a topic I’m very interested in but something that I also feel that I can’t really talk about that much or take seriously as it is a bit fringe so thank you for having it!
Thanks! I’m also excited about this week- it’s really cool to see how many people have already voted- goes well beyond my expectations.
I think this is a great point, and might change my mind. However, if these consciousness evals become burdensome for AI companies, I would imagine we would need a public push in support of them in order for them to be enforced, especially through legislation. Then we get back to my dichotomy, where if people think AI is obviously conscious (whether or not it is) we might get legislation, and if they don’t, I can only imagine some companies doing it half-heartedly/ voluntarily until it becomes too costly (as is, arguably, the current state of safety evals).
Yeah, I guess the crux here is to what extent we actually need public support or at least what type of public support that we need for it to become legislation?
If we can convince 80-90% of the experts, then I believe that this has cascading effects on the population, and it isn’t like AI being conscious is something that is impossible to believe either.
I’m sure millions of students have had discussions about AI sentience for fun, and so it isn’t like fully out of the Overton window either.
I’m curious to know if you disagree with the above or if there is another reason why you think research won’t cascade to public opinion? Any examples you could point towards?
I don’t have an example to mind exactly, but I’d expect you could find one in animal welfare. Where there are agricultural interests pushing against a decision, you need a public campaign to counter them. We don’t live in technocracies—representatives need to be shown that there is a commensurate interest in favour of the animals. On less important issues/ legislation which can be symbolic but isn’t expected to be used- experts can have a more of a role. I’d expect that the former category is the more important one for digital minds. Does that make sense? I’m aware its a bit too stark of a dichotomy to be true.
There’s this idea of the truth as an asymmetric weapon; I guess my point isn’t necessarily that the approach vector will be something like:
Expert discussion → Policy change
but rather something like
Experts discussion → Public opinion change → Policy Change
You could say something about memetics and that it is the most understandable memes that get passed down rather than the truth, which is, to some extent, fair. I guess I’m a believer that the world can be updated based on expert opinion.
For example, I’ve noticed a trend in the AI Safety debate: the quality seems to get better and more nuanced over time (at least, IMO). I’m not sure what this entails for the general public’s understanding of this topic but it feels like it affects the policy makers.
I think this is a good description of the kind of scepticism I’m attracted to, perhaps to an irrational degree. Thanks for describing it!
I like your point about AI Safety. It seems at least a bit true.
I’ll update my vote on the banner to be a bit less sceptical- I think my scepticism of the potential for us to know whether AI is conscious is a major part of my disagreement with the debate statement. I don’t endorse the level of scepticism I hold. Thanks!
How about: Remove all text of humans discussing their conscious experiences (or even the existence of consciousness) from the AI’s training set. See if it still claims to have internal experiences.
I don’t think this is a perfect method:
If it still talks about internal experiences, maybe it was able to extrapolate the ability to discuss internal experiences from text that wasn’t removed.
If it doesn’t talk about internal experiences, maybe it has them and just lacks the ability to talk about them. Some animals are probably like this.
Finally, in principle I can imagine that ingesting text related to internal experiences is actually what causes an AI to learn to have them.
Could you expand on what you regard as a key difference in our epistemic position with respect to animals vs. even in theory with respect to AI systems? Could this difference be put in terms of a claim you accept when applied to animals but not even in theory when applied to AI systems?
In connection with evaluating animal/AI consciousness, you mention behavior, history, incentives, purpose, and mechanism. Do you regard any of these factors as most directly relevant to consciousness? Are any of these only relevant as proxies for, say, mechanisms?
(My hunch is that more information on these points would make it easier for me or other readers to try to change your mind!)
Re “A crux here is that philosophy of mind doesn’t really make much progress”: for what it’s worth, from the inside of the field, it feels to me like philosophy of mind makes a lot of progress, but (i) the signal-to-noise ratio in the field is bad, (ii) the field is large, sprawling, and uncoordinated, (iii) an impact-focused mindset is rare within the field, and (iv) only a small percentage of the effort in the field has been devoted to producing research that is directly relevant to AI welfare. This suggest to me that even if there isn’t a lot of relevant, discernible-from-the-outside progress in philosophy of mind, relevant progress may be fairly tractable.