Tristan Katz comments on Criticism of the main framework in AI alignment

Tristan Katz 22 Jul 2025 9:08 UTC
1 point
0 ∶ 0
Regarding your last point: I see. I thought this was an argument for “alignment via moral reasoning as an addition to alignment via control”, not “alignment via moral reasoning instead of alignment via control.” So you would hope that alignment via moral reasoning would displace or replace alignment via control.
In that case, your argument is plausible but… quite hopeful? I’m sure many people will pursue control methods regardless. I suppose you might argue that, if enough people buy your argument, then research on AI that is merely controlled will advance more slowly, and research on AI that does its own moral reasoning, and is therefore harder to misuse, would advance faster or at least in parallel. Then I would accept that this might reduce the chance of malevolent misuse, but that’s quite a hopeful scenario! In less hopeful scenarios, I am unsure if people concerned with malevolent misuse ought to pursue this kind of work, or if they wouldn’t be better off simply advocating for a pause/slow down.
- Michele Campolo 22 Jul 2025 13:46 UTC
  1 point
  0 ∶ 0
  Parent
  In short, I am not hoping for a specific outcome, and I can’t take into account every single scenario. If someone starts giving more credit to research on moral reasoning in AI after reading this, that’s already enough, considering that the topic doesn’t seem to be popular within AI alignment, and it was even more niche at the time I wrote this post.
  - Tristan Katz 22 Jul 2025 14:50 UTC
    2 points
    0 ∶ 0
    Parent
    Sure! And like I said, I do think this is valuable: it just seems more obviously valuable as a way to ensure the best outcomes (aligned AI), rather than as a means to avoid the worst outcomes.