Toby Tremlett🔹 comments on AGI & Animals Symposium (Thursday 5-7pm UK)

Toby Tremlett🔹 26 Mar 2026 17:11 UTC
3 points
0 ∶ 0
For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I don’t know, but this would massively affect my estimation of the tractability here.
- Jo_🔸 26 Mar 2026 17:30 UTC
  3 points
  0 ∶ 0
  Parent
  This is definitely a hard debate to disentangle, because I would personally reject the question of alignment as a crux. For now, I strongly believe that the total welfare of animals has been entirely uncorrelated with our moral intentions toward animals. Total welfare has mostly changed because of land use, due to human interests.
  I agree that in AGI-transformed futures that go well for humans, human desires may start playing a larger role. However, I expect that whether we mean well for animals (or don’t care much about them) will not be cleanly correlated with outcomes for them.
  There are worlds where we mean well for a large part of animals, stop intentionally killing them, and help certain wild animals. But that world could very well end up having a large population of animals living bad lives.
  On the other hand, out of apathy and even negative feeling toward wild animals, we may decide to limit their spread and use resources in a way that optimizes for human flourishing, over animal abundance. That world could end up being much better for animal welfare.
  Maybe some extreme scenarios tip the scales, for example if we bred incredibly happy genetically modified animals due to positive feelings toward them. But I’m not confident on putting any weight on such utilitarian-leaning scenarios when assessing post-AGI futures. Because part of the reason human moral intentions are not correlated with total animal welfare is that humans are not scope-sensitive utilitarians.
  - Alistair Stewart 26 Mar 2026 17:48 UTC
    1 point
    0 ∶ 0
    Parent
    What kinds of values will humans have post-AGI, if AGI goes well for us? We don’t need to be scope-sensitive utilitarians to want to adopt even radical preferences like ending animal exploitation and solving WAS, no? (Most humans don’t like factory farming or the idea of cute animals being eaten alive.)
    - Jo_🔸 26 Mar 2026 18:10 UTC
      1 point
      0 ∶ 0
      Parent
      Solving WAS intuitively seems too niche for people to deliberately change their mind on that, but I could be wrong. After all, the Bible says that the Lion will lie down with the lamb and eat straw like the ox, so it could be that human preferences tend to come back to the idea that animal suffering can be bad even when it doesn’t depend on human actions.
      - Alistair Stewart 26 Mar 2026 18:20 UTC
        1 point
        0 ∶ 0
        Parent
        I guess the causal mechanism I’m thinking of here is:
        Most humans feel at least a little sad when they see a baby gazelle being eaten alive by hyenas
        AGI is so powerful that humans can order it to do things like “stop baby gazelles being eaten alive whilst retaining the beauty of nature and the complexity of ecosystems” and then it’ll just go away and do it somehow
        Maybe this is foolish and naive on my part! And maybe I’m wrong to think our moral preferences/intuitions will be so robust to the disruption of AGI, even if AGI goes well for us.
- Toby Tremlett🔹 26 Mar 2026 17:54 UTC
  2 points
  0 ∶ 0
  Parent
  PS- looks like Michael Dickens just posted on this.
- Alistair Stewart 26 Mar 2026 17:29 UTC
  2 points
  0 ∶ 0
  Parent
  Toby, would you be more optimistic for animals if we can align AGI to specific values rather than just making it corrigible to humans’ preferences and commands?
  My impression is that pro-animal views are (dramatically?) overrepresented at Anthropic relative to the rest of society. If Anthropic gets to AGI first and instils/locks in pro-animal values in/to that AGI, that seems better for animals than if whoever gets to AGI first just makes it purely corrigible, because most humans who operate the purely corrigible AGI won’t be as pro-animal.
  - Toby Tremlett🔹 26 Mar 2026 17:33 UTC
    3 points
    0 ∶ 0
    Parent
    I think in the long-run I’d be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I’m pretty clueless and think our current values are likely to be wrong, and I’d rather we had more time to improve them.
    
    I haven’t thought enough about the relationship between power concentration and corrigibility though—I expect that could change my mind.
    - Toby Tremlett🔹 26 Mar 2026 17:34 UTC
      3 points
      0 ∶ 0
      Parent
      Oh yes but I made the above comment more to represent the view that I’ve seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
    - Alistair Stewart 26 Mar 2026 17:44 UTC
      1 point
      0 ∶ 0
      Parent
      This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/good/post-long-reflection values in AGI/ASI down the line.
      - Toby Tremlett🔹 26 Mar 2026 17:49 UTC
        3 points
        0 ∶ 0
        Parent
        Yep fair, that’s what I mean by “power concentration and corrigibility”. AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).