Tristan Katz comments on Free agents

Tristan Katz 19 Jul 2025 8:31 UTC
2 points
0 ∶ 1
Commenting quite late, but better than never:

Firstly, thanks for writing this! I think this work might be important—at the least, it least seems like one important avenue for aligning agents.
But I am equally concerned that this might be an easy way to misalign agents. If we let them develop their own moral thinking, there is a high chance that they will develop in different ways to us, and come to very different (and not necessarily better) moral conclusions! Although I agree that we can develop our goals through reasoning, I think that such reasoning always stems from our starting values, which have been developed through evolution (desire for relationships, sex, food, freedom from physical harm etc—by ‘values’ here I mean something very primal, just what we instinctually desire). Those instinctual values are already rather complex and hard to input into machines—so if we wanted an AI to develop a good moral understanding, it would need to understand those values that we start out with, and recognize them as valuable. If we make any mistake there, the AI might then develop a fundamentally different understanding of morality, through reasoning based on those mistaken starting values and its interactions with the world.

I suppose that having AI analyse what’s already been written about ethics, and proceed from our modern (and therefore already somewhat developed) morality might be a shortcut, or a way of identifying or correcting an incorrect understanding of our starting values. Yet still, if it doesn’t understand what we value instinctually/prior to reasoning, there is the risk it might reason from our developed moral principles to other ideas that we wouldn’t accept. So, I’m not saying that the alignment of free agents is impossible, but it seems easy to get wrong.

I liked your characterisation of a ‘free’ agent. But I noticed you avoided the term “consciousness”, and I wonder why? What you described as a “sketchpad” I couldn’t help but understand as consciousness—or maybe, the part of consciousness that is independent of our sense of the world. So maybe this is worth defining more precisely, and showing how it overlaps with or differs from consciousness.
- Michele Campolo 21 Jul 2025 20:48 UTC
  1 point
  0 ∶ 0
  Parent
  But I am equally concerned that this might be an easy way to misalign agents.
  I understand your concern, but I think it’s hard to evaluate both whether this is true (because no one has made that kind of experiments yet) and how much of a problem it is: the alternative is other alignment methods, which have different pros and cons, so I guess that the discussion could get very long.
  If we let them develop their own moral thinking, there is a high chance that they will develop in different ways to us, and come to very different (and not necessarily better) moral conclusions!
  I disagree with this, in particular regarding the high chance. This intuition seems to be based on the belief that morality strongly depends on what you call our starting instinctual values, which are given by evolution; but this belief is questionable. Below I’m quoting the section Moral thinking in Homo sapiens:
  In Symbolic Thought and the Evolution of Human Morality [14], Tse analyses this difference extensively and argues that “morality is rooted in both our capacities to symbolize and to generalize to a level of categorical abstraction.” I find the article compelling, and Tse’s thesis is supported also by work in moral psychology — see for example Moral Judgement as Categorization (MJAC) by McHugh et al. [12] — but here I’d like to point out a specific chain of abstract thoughts related to morality.
  As we learn more about the world, we also notice patterns about our own behaviour. We form beliefs like “I did this because of that”. Though not all of them are correct, we nonetheless realise that our actions can be steered in different directions, towards different goals, not necessarily about what satisfies our evolutionary drives. At that point, it comes natural to ask questions such as “In what directions? Which goals? Is any goal more important than others? Is anything worth doing at all?”
  Asking these questions is, I think, what kickstarts moral and ethical thinking.
  Let’s also give an example, so that we don’t just think in theoretical terms (and also because Tse’s article is quite long). Today, some people see wild animal suffering as a problem. To put it simply, we humans care about animals such as snakes and insects. But these animals actually give us aversive instinctive gut reactions, so in this case our moral beliefs go directly against some of our evolutionary instincts. You could reply that our concern for snakes and insects is grounded in empathy, given by evolution. But then, would you stop caring about snakes and insects if you lost your empathy? Would you stop caring about anyone? I think caring could become harder, but you wouldn’t stop completely. And I think the reason for this is that our concern for wild animals doesn’t depend only on our evolutionary instincts, but it also heavily relies on our capacity to reason and generalise.
  If you haven’t read it yet, the section Moral realism and anti-realism in the Appendix contains some info related to this, maybe you’ll find it interesting.
  In sum, the high chance you bring up in your comment is very open to debate and might not be high.
  I liked your characterisation of a ‘free’ agent. But I noticed you avoided the term “consciousness”, and I wonder why? What you described as a “sketchpad” I couldn’t help but understand as consciousness—or maybe, the part of consciousness that is independent of our sense of the world. So maybe this is worth defining more precisely, and showing how it overlaps with or differs from consciousness.
  Yes the sketchpad is indeed inspired by how our consciousness works. There are a few reasons why I didn’t use consciousness instead:
  - An AI could have something that does similar things to what our sketchpad does, but be unconscious
  - As I wrote in the post, free refers to freedom of thought / independent thinking, which is the main property the free agent design is about. Maybe any agent that can think independently is also conscious, but this implication doesn’t seem easy to show, see also previous point
  - The important part of the design is learnt reasoning that changes the agent’s evaluation of different world states. This reasoning happens (at least in part) consciously in humans, but again we don’t know that consciousness is necessary to carry it out.
  Thanks for your comment! I hope that this little discussion we’ve had can help others who read the post.