Nice little Claude summary of the debate so far, which might help identify the missing points:
The debate centres on whether human-aligned AGI would automatically benefit animals, or whether animal-specific interventions are needed.
The pessimistic case is well-represented. Jim Buhler argues we have no good reason to assume AI safety work helps animals â saving humans preserves factory farming, and the claim that empowered humans would improve wild animal welfare rests on untenable assumptions. Simon Eckerström Liedholm (Wild Animal Initiative) estimates only ~30% probability of good animal outcomes conditional on good human outcomes, largely because the most likely alignment path locks in current human values, which permit enormous animal suffering. Hannah McKay (Rethink Priorities) argues that cultivated meat wonât be automatically solved by AGI â regulatory, political, and consumer barriers form a sequential chain where the combined probability of resolution is low.
The bridge position comes from Aidan Kankyoku, who thinks it probably (~70%) goes well for animals but that this isnât sufficient certainty to neglect animal-specific alignment. He argues animal welfare is now functionally a subsidiary of the âMake AI Go Wellâ movement.
MichaelDickens contributed three posts: a taxonomy of alignment research by animal-friendliness, a cost-effectiveness model finding alignment-to-animals only marginally more cost-effective than general alignment, and a meditation on how current alignment paradigms (unlike CEV) give him roughly 50â50 odds on animal outcomes.
The discussion thread (~58 comments) skews disagree, though with real spread. The most common argument for disagreement is historical precedent: technological and economic progress has been bad for animals so far, with factory farming as the central exhibit. Value lock-in is the second recurring worry â that alignment to current human values would freeze in a set of preferences that are largely indifferent to animal suffering (SimonM_, Babel, Dylan Richardson, Tristan Katz). Several voters also flag the risk of spreading wild animal suffering to new planets. On the agree side, the strongest argument is economic: post-scarcity conditions erode factory farmingâs viability because alternatives become cheaper (OscarD, Erich_Grunewald, Brad West, JDBauman). A few voters (Ronak Mehta at 100% Agree, Ligeia, ArtĆ«rs KaĆepÄjs) argue that a genuinely superintelligent system would recognise animal sentience as morally relevant. A notable cluster sits at or near 0% Agree not because theyâre confident things go badly, but because they think the question is unanswerable given the number of branching futures (NickLaing, Seth Ariel Green, Jim Buhler). Peter Wildeford offers a useful split: on a causal reading (alignment mechanisms also help animals) heâs pessimistic; on an evidential reading (conditional on good human outcomes, what world are we in?) heâs somewhat more optimistic.
For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I donât know, but this would massively affect my estimation of the tractability here.
This is definitely a hard debate to disentangle, because I would personally reject the question of alignment as a crux. For now, I strongly believe that the total welfare of animals has been entirely uncorrelated with our moral intentions toward animals. Total welfare has mostly changed because of land use, due to human interests.
I agree that in AGI-transformed futures that go well for humans, human desires may start playing a larger role. However, I expect that whether we mean well for animals (or donât care much about them) will not be cleanly correlated with outcomes for them.
There are worlds where we mean well for a large part of animals, stop intentionally killing them, and help certain wild animals. But that world could very well end up having a large population of animals living bad lives.
On the other hand, out of apathy and even negative feeling toward wild animals, we may decide to limit their spread and use resources in a way that optimizes for human flourishing, over animal abundance. That world could end up being much better for animal welfare.
Maybe some extreme scenarios tip the scales, for example if we bred incredibly happy genetically modified animals due to positive feelings toward them. But Iâm not confident on putting any weight on such utilitarian-leaning scenarios when assessing post-AGI futures. Because part of the reason human moral intentions are not correlated with total animal welfare is that humans are not scope-sensitive utilitarians.
What kinds of values will humans have post-AGI, if AGI goes well for us? We donât need to be scope-sensitive utilitarians to want to adopt even radical preferences like ending animal exploitation and solving WAS, no? (Most humans donât like factory farming or the idea of cute animals being eaten alive.)
Solving WAS intuitively seems too niche for people to deliberately change their mind on that, but I could be wrong. After all, the Bible says that the Lion will lie down with the lamb and eat straw like the ox, so it could be that human preferences tend to come back to the idea that animal suffering can be bad even when it doesnât depend on human actions.
I guess the causal mechanism Iâm thinking of here is:
Most humans feel at least a little sad when they see a baby gazelle being eaten alive by hyenas
AGI is so powerful that humans can order it to do things like âstop baby gazelles being eaten alive whilst retaining the beauty of nature and the complexity of ecosystemsâ and then itâll just go away and do it somehow
Maybe this is foolish and naive on my part! And maybe Iâm wrong to think our moral preferences/âintuitions will be so robust to the disruption of AGI, even if AGI goes well for us.
Toby, would you be more optimistic for animals if we can align AGI to specific values rather than just making it corrigible to humansâ preferences and commands?
My impression is that pro-animal views are (dramatically?) overrepresented at Anthropic relative to the rest of society. If Anthropic gets to AGI first and instils/âlocks in pro-animal values in/âto that AGI, that seems better for animals than if whoever gets to AGI first just makes it purely corrigible, because most humans who operate the purely corrigible AGI wonât be as pro-animal.
I think in the long-run Iâd be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because Iâm pretty clueless and think our current values are likely to be wrong, and Iâd rather we had more time to improve them.
I havenât thought enough about the relationship between power concentration and corrigibility thoughâI expect that could change my mind.
Oh yes but I made the above comment more to represent the view that Iâve seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/âgood/âpost-long-reflection values in AGI/âASI down the line.
Yep fair, thatâs what I mean by âpower concentration and corrigibilityâ. AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).
Nice little Claude summary of the debate so far, which might help identify the missing points:
For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I donât know, but this would massively affect my estimation of the tractability here.
This is definitely a hard debate to disentangle, because I would personally reject the question of alignment as a crux. For now, I strongly believe that the total welfare of animals has been entirely uncorrelated with our moral intentions toward animals. Total welfare has mostly changed because of land use, due to human interests.
I agree that in AGI-transformed futures that go well for humans, human desires may start playing a larger role. However, I expect that whether we mean well for animals (or donât care much about them) will not be cleanly correlated with outcomes for them.
There are worlds where we mean well for a large part of animals, stop intentionally killing them, and help certain wild animals. But that world could very well end up having a large population of animals living bad lives.
On the other hand, out of apathy and even negative feeling toward wild animals, we may decide to limit their spread and use resources in a way that optimizes for human flourishing, over animal abundance. That world could end up being much better for animal welfare.
Maybe some extreme scenarios tip the scales, for example if we bred incredibly happy genetically modified animals due to positive feelings toward them. But Iâm not confident on putting any weight on such utilitarian-leaning scenarios when assessing post-AGI futures. Because part of the reason human moral intentions are not correlated with total animal welfare is that humans are not scope-sensitive utilitarians.
What kinds of values will humans have post-AGI, if AGI goes well for us? We donât need to be scope-sensitive utilitarians to want to adopt even radical preferences like ending animal exploitation and solving WAS, no? (Most humans donât like factory farming or the idea of cute animals being eaten alive.)
Solving WAS intuitively seems too niche for people to deliberately change their mind on that, but I could be wrong. After all, the Bible says that the Lion will lie down with the lamb and eat straw like the ox, so it could be that human preferences tend to come back to the idea that animal suffering can be bad even when it doesnât depend on human actions.
I guess the causal mechanism Iâm thinking of here is:
Most humans feel at least a little sad when they see a baby gazelle being eaten alive by hyenas
AGI is so powerful that humans can order it to do things like âstop baby gazelles being eaten alive whilst retaining the beauty of nature and the complexity of ecosystemsâ and then itâll just go away and do it somehow
Maybe this is foolish and naive on my part! And maybe Iâm wrong to think our moral preferences/âintuitions will be so robust to the disruption of AGI, even if AGI goes well for us.
PS- looks like Michael Dickens just posted on this.
Toby, would you be more optimistic for animals if we can align AGI to specific values rather than just making it corrigible to humansâ preferences and commands?
My impression is that pro-animal views are (dramatically?) overrepresented at Anthropic relative to the rest of society. If Anthropic gets to AGI first and instils/âlocks in pro-animal values in/âto that AGI, that seems better for animals than if whoever gets to AGI first just makes it purely corrigible, because most humans who operate the purely corrigible AGI wonât be as pro-animal.
I think in the long-run Iâd be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because Iâm pretty clueless and think our current values are likely to be wrong, and Iâd rather we had more time to improve them.
I havenât thought enough about the relationship between power concentration and corrigibility thoughâI expect that could change my mind.
Oh yes but I made the above comment more to represent the view that Iâve seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/âgood/âpost-long-reflection values in AGI/âASI down the line.
Yep fair, thatâs what I mean by âpower concentration and corrigibilityâ. AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).