Elliott Thornley (EJT) comments on Can AI make advancements in moral philosophy by writing proofs?

Elliott Thornley (EJT) 19 Apr 2026 13:39 UTC
8 points
0 ∶ 0
Nice post! Miscellaneous thoughts:
if individuals have VNM utility functions, and if the Pareto principle holds over groups, then a version of utilitarianism must be true.
Harsanyi’s theorem also requires that the social planner’s preferences satisfy the VNM axioms.
Not many philosophical proofs have been written
I think this all depends on what you mean by ‘many’. I’d guess maybe 10% of analytic philosophy papers include a proof of some kind, so that at least hundreds of proofs are published every year. And in a sense, every valid (spelled-out) argument is a proof.
I agree that the Claude proofs are pretty bad. The Arrhenius point is fairly obvious: what Arrhenius means by ‘theories’ in that paper is weak orders on populations, so if after taking into account moral uncertainty you still have a weak order, then the impossibility theorem still applies. (And later Arrhenius theorems relax both completeness and transitivity, so even departing from a weak order doesn’t get you off the hook.)
Claude makes this kind of point, but first it introduces an Agreement axiom that the proof never uses. Claude later comes close to admitting this (‘Agreement plays almost no role’), tries to walk it back (‘But Agreement rules out the escape route...’), and then fully admits it (‘the fundamental impossibility holds regardless’).
Which Claude model did you use? Did you use extended thinking? The flip-flopping above makes me think there was no extended thinking, and maybe a model with extended thinking would do better. (Though not much better I’d guess. I’ve found LLMs to be surprisingly bad at philosophy, even just the ‘understanding the view and its implications’ parts.)
I didn’t bother checking the second population ethics proof but it looks sloppy:
Axiom (Sufficient Comparability). For any pair of populations A, B that differ by at most some fixed bounded amount (e.g., adding or removing one person, or changing one person’s welfare level by a small amount), M(μ) must rank A and B (no incomparability for “local” comparisons).”
Don’t any pair of populations “differ by at most some fixed bounded amount”? What is Claude doing including ‘e.g.’s in its formal statement of axioms?
With some additional effort, present-day LLMs might be capable of coming up with a good novel proof. If not, then it will likely be possible soon. Most kinds of moral philosophy might be difficult for AIs, but proofs are one area where AI assistance seems promising.
Yes, you’d think so given that they’ve gotten so good at math! But when I’ve tried using LLMs to help with formal philosophy, I’ve found them to be really surprisingly bad, even at parts that seem very math-loaded (e.g. inventing proofs, following arguments, grasping views and their implications, coming up with counterexamples, etc.). I’m not sure why this is. I guess part of it is that it’s hard to do RLVR on philosophy in the same way that you can do RLVR on math, but naively I’d expect more generalization from math to formal philosophy. Maybe the following is a factor: pretraining data doesn’t contain that much bad mathematical reasoning, but it contains a huge amount of bad philosophical reasoning.