Michael Townsend🔸 comments on AI character is a big deal

Michael Townsend🔸 25 Mar 2026 15:23 UTC
5 points
0 ∶ 0
Thanks for this post!
Curious what you think of the following objection: AI character work has to grapple with the question of whether we want AI systems to be “obedient” or “ethical”. Yet, it seems non-obvious which is better for the long-term future because training them to be ethical might make alignment risk worse, and training them to be obedient makes misuse and other risks worse. So if you’re uncertain about which is better — which I think you probably should be — then on expectation, the impact of affecting AI character is reduced proportional to how uncertain you are. I expect there are some ways you can affect AI character work that doesn’t have this “obedience vs ethical” structure, but I also suspect that the argument made in this report about the importance of such work apply less to these interventions.
- Tom_Davidson 26 Mar 2026 8:54 UTC
  4 points
  1 ∶ 0
  Parent
  This is a real trade-off when thinking about what AI character should look like. I think that more work to think about that trade-off and where we should be along it, and ways to get the best of both—the biggest benefits of being ethical and of being obedient—is hugely valuable.
- shauryachandravanshi 27 Mar 2026 13:10 UTC
  1 point
  0 ∶ 0
  Parent
  It may seem non-obvious at first which one of these two structures is better, but that is because we are considering AI Character as a set of very broad recommendations about very broad situations. If given substantively more information about these scenarios, a non-trivial number would likely have clearer resolutions to support.
  I would guess that AI constitutions—much like actual national constitutions—would likely be more fine grained and specific with their moral directives for AIs.
  
  Naturally, there’s a level of specificity that would solve fewer cases through clarity than it would hurt by imposing inapplicable rigid standards. But this sounds analogous to the actual legal systems that strive to iteratively resolve this exact issue. Legal tests, fictions, and precedent might find its analogies in the ethical “training” of future AIs.
  
  Still, whether trained, prescribed/directed, or highly fine-grained, AI character would create a non-trivial effect on the decisions made in these cases.
  
  Edit: If I understood her correctly, it seems like Amanda Askell says this is quite similar to how Claude was actually trained to follow its constitution