idea21 comments on What I Learned by Making Four AIs Debate Human Ethics

idea21 15 Oct 2025 3:15 UTC
1 point
0 ∶ 1
Which cultural or moral assumptions am I missing?
I think something very obvious but extremely important is missing in your ” six-pillar Gold Standard of Human Values” if we want to approach morality as a process of behavioral improvement: the control of aggression.
We should view morality as a strategy for fostering efficient human cooperation. Controlling aggression and developing mutual trust is equivalent to a culture of benevolence. We can observe that today there are (“national”) cultures that are less aggressive and more benevolent than others; it has therefore been demonstrated that such patterns of social behavior are manipulable and improvable.
Just as Marxists said that “what leads to a classless society is good,” we should also say “what leads to a non-aggressive, benevolent, and enlightened society is good.” I add the word “enlightened” because it seems true that, based on religious traditions,some largely non-aggressive and benevolent societies can already be achieved; however, irrationalism entails a general detriment to the common good.
- Frankle Fry 15 Oct 2025 12:07 UTC
  0 points
  0 ∶ 0
  Parent
  Thanks for this thoughtful critique, idea21. You’ve identified something important.
  You’re right that explicit aggression control isn’t front-and-center in the six pillars, though I think it’s implicit in several places:
  - Pillar II (Empathy) includes “minimize suffering” and “balance outcomes with rights”—which would prohibit aggressive harm
  - Pillar III (Dignity) emphasizes “boundaries of non-harm” and accountability for those wielding power
  - Pillar VI (Integrity) focuses on aligning actions with values and moral courage
  But you’re pointing to something deeper: cooperation and trust-building as foundational to moral progress, not just constraints against harm.
  I’m curious how you’d operationalize “control of aggression” as a distinct pillar or principle. Would it be:
  - A prohibition (like the inviolable limits in Article VII: “no torture, genocide, slavery”)?
  - A positive virtue (cultivating non-aggressive communication, de-escalation)?
  - A systems-level design principle (institutions structured to prevent violent conflict)?
  - Something else?
  Also, your point about “enlightened” (rational + benevolent) vs. just benevolent is interesting. Where do you see the framework falling on that spectrum? I tried to ground it in evidence-based reasoning (Pillar I) while leaving room for diverse meaning-making (spiritual paths, etc.). Does that balance work, or does it risk the irrationalism problem you mention?
  This feels like it might connect to the “moral innovation vs. moral drift” distinction in Pillar V—rationality as a guard against drift even when cultural evolution moves toward benevolence.
  Would love to hear more about how you’d integrate this.
  - idea21 15 Oct 2025 19:01 UTC
    1 point
    1 ∶ 0
    Parent
    Thank you very much for the interest shown in your comment and for the opportunity you’ve given me to explore new perspectives to explain an issue that, in my opinion, could be extremely important and is not being addressed even in an environment that challenges conventions like the EA Community.
    I’m curious how you’d operationalize “control of aggression” as a distinct pillar or principle. Would it be:
    A prohibition (like the inviolable limits in Article VII: “no torture, genocide, slavery”)?
    A positive virtue (cultivating non-aggressive communication, de-escalation)?
    A systems-level design principle (institutions structured to prevent violent conflict)?
    Something else?
    Moral values are the foundation of an “ethics of principles,” but the problem with an “ethics of principles” is that it is unrealistic in its ability to influence human behavior. In theory, all moral principles contemplate the control of aggression, but their effectiveness is limited.
    Since the beginning of the Enlightenment, the problem has been raised that moral, political, and educational principles lack the power to affect moral behavior that religions do. We must admit, for example, that, despite the commendable efforts of educators, scholars, and politicians, whether liberalism’s values of democratic tolerance and respect for the individual can effectively prevail in a given society depends not so much on proposing impeccable moral principles… but on whether that particular society has a particular sociological foundation that makes the psychological implementation of such benevolent and enlightened principles viable in the minds of its citizens. In the end, it turns out that liberal principles only work well in societies with a tradition of Reformed Christianity.
    I believe that the emergence for the first time of a social movement like EA, apolitical, enlightened, and focused on developing an unequivocally benevolent human behavioral tendency such as altruism, represents an opportunity to definitively transform the human community in the direction of aggression control, benevolence, and enlightenment.
    The answer, in my view, would have to lie in tentatively developing non-political strategies for social change. Two hundred years ago, many Enlightenment thinkers considered creating “secular religions” (what I would call “behavioral ideologies”), but they always remained superficial (rituals, temples, collectivism). A scholar of religions, Professor Loyal Rue, believes that religion is basically “educating emotions.” It’s about using strategies to internalize “moral values.”
    In my view, if EA utilitarians want more altruistic works, what they need to do is create more altruistic people. Altruism isn’t attractive enough today. Religions are attractive.
    In my view, there are a multitude of psychological strategies that, through trial and error, could eventually give rise to a non-political social movement for the spread of non-aggressive, benevolent, and enlightened behavior (a “behavioral ideology”). The example I always have at hand is Alcoholics Anonymous, a movement that emerged a hundred years ago through trial and error, and was carried out by highly motivated individuals seeking behavioral change.
    A first step for the EA community would be to establish a social network to support donors in facing the inevitable sacrifices that come with practicing altruism. This same forum already contains accounts of emotional problems (“burnout,” for example) among people who practice altruism without the proper psychological support.
    But, logically, altruism can be made much more attractive if we frame it within the broader scope of benevolent behavior. The practice of empathy, mutual care, affection, and the development of social skills in the area of aggression control can yield results equal to or better than those found in congregations of the well-known “compassionate religions”… and without any of the drawbacks derived from the irrationalism of ancient religious traditions (evolution is “copy plus modification”). An “influential minority” could then be created capable of affecting moral evolution at a general level.
    Considering the current productivity of human labor, a social movement of this type, even if it reached just 0.1% of the world’s population, would more than achieve the most ambitious goals of the EA movement. But so far, only 10,000 people have signed the GWWC Pledge.
    - Frankle Fry 15 Oct 2025 22:58 UTC
      1 point
      0 ∶ 0
      Parent
      This is a fascinating critique that I think identifies a real distinction I hadn’t made explicit enough.
      You’re pointing out that principles don’t automatically change behavior—they need psychological/social infrastructure. That’s absolutely true for humans.
      But I think this actually clarifies what my framework is for:
      My framework is primarily designed for AI alignment and institutional design—contexts where we can directly encode principles into systems. Constitutional AI doesn’t need emotional motivation or community support to follow its training. Institutions can be structured with explicit rules and incentives.
      For human moral development, you’re right that we need something different—what you call “behavioral ideology.” The AA analogy is perfect: the 12 steps alone don’t change behavior; it’s the community, ritual, accountability that make it work.
      But here’s an interesting thought: What if solving AI alignment could actually help with human behavioral change?
      If we successfully align AI systems with principles like empathy, integrity, and non-aggression—and those AI systems become deeply integrated into daily life—humans will be constantly interacting with entities that model those behaviors. Children growing up with AI tutors that consistently demonstrate benevolent reasoning. Workers collaborating with AI that handles conflicts through de-escalation rather than dominance. Communities using AI mediators that prioritize mutual understanding.
      The causality might work both ways:
      We need to figure out how to encode human values in AI (my framework’s focus)
      But once AI systems consistently embody those values, they might shape human behavior in return
      This doesn’t replace the need for what you’re describing—the community support, emotional education, ritual. But it could be complementary. Aligned AI could be part of the cultural infrastructure that makes benevolent behavior more natural.
      So I think the scope question is:
      Should my framework try to be both (AI alignment + human behavioral change)?
      Or should it focus on AI/institutions, acknowledging that human implementation requires different mechanisms (like what you’re describing)?
      I lean toward the latter—not because human behavior change isn’t important, but because:
      AI alignment is where I have the most to contribute
      Creating “secular religions” or behavioral movements requires different expertise
      Trying to be both might dilute effectiveness at either
      That said, your vision of EA evolving to provide emotional/social infrastructure for altruistic behavior is compelling. And perhaps successfully aligning AI is actually a prerequisite for that vision—because misaligned AI could actively work against benevolent human culture.
      My question for you: Do you see frameworks like mine as useful inputs to the kind of movement you’re describing? Even if AI alignment alone isn’t sufficient, could it be necessary? If we get AI right, does that make the human behavioral transformation more achievable?
      - idea21 16 Oct 2025 2:39 UTC
        1 point
        0 ∶ 0
        Parent
        Do you see frameworks like mine as useful inputs to the kind of movement you’re describing? Even if AI alignment alone isn’t sufficient, could it be necessary? If we get AI right, does that make the human behavioral transformation more achievable?
        I’ve done a bit like you and asked an artificial intelligence about the social goals of behavioral psychology. I’ve proposed two options: either using our knowledge of human behavior to adapt the individual to the society in which they can achieve personal success; or using that knowledge to achieve a less aggressive and more cooperative society.
        “”within the framework of radical behavioral psychology applied to society, the goal is closer to:
        Improving society (through environmental and behavioral design) to expand social efficient cooperation and reduce harmful behaviors like aggression.
        The first option, “Adapting to the mainstream society in order to get individual success,” aligns more closely with general concepts of socialization and adaptation found across various fields of psychology (including social psychology and developmental psychology), but is not the distinct, prescriptive social goal proposed by the behaviorist project for an ideal society.”″ (This is “Gemini”)
        Logically, AI, which lacks prejudice and uses only logic, opts for social improvement… because it starts from the knowledge that human behavior can be improved based on fairly logical and objective criteria: controlling aggression and encouraging efficient cooperation.
        Would AI favor a “behavioral ideology” as a strategy for social improvement?
        The Enlightenment authors two hundred years ago considered that if astrology had given rise to astronomy and alchemy to chemistry… religion could also give rise to more sophisticated moral strategies for social improvement. What I call “behavioral ideology” is probably what the 19th-century scholar Ernest Renan called “pure religion.”
        If, starting with an original movement for non-political social change like EA, a broader social movement were launched to design altruistic strategies for improving behavior, it would probably proceed in a similar way to what Alcoholics Anonymous did in its time: through trial and error, once the goals to be achieved (aggression control, benevolence, enlightenment) were firmly established.
        Limiting myself to fantasizing, I find such a diversity of available strategies that it is impossible for me to calculate which ones would ultimately be selected. To give an example: the Anabaptist community of the “Amish” is made up of 400,000 people who manage to organize themselves socially without laws, without government, without physical coercion, without judges, without fines, without prisons, or police… (the dream of a Bakunin or a Kropotkin!) How do they do it? Another example is the one Marc Ian Barasch mentions in his book “The Compassionate Life” about the usefulness of a biofeedback program to stimulate benevolent behaviors.
        The main contribution I find in AI is that, although you yourself have detected cognitive biases in its various forms, operating on the basis of logical reasoning stripped of prejudices (far from flawed human rationality… laden with heuristics) can facilitate the achievement of effective social goals.
        AI isn’t concerned with the future of humanity, but with solving problems. And the human problem is quite simple (as long as we don’t prejudge): we are social mammals; Homo sapiens, who, like all social mammals, have been genetically programmed to be competitive and aggressive in the dispute over scarce economic resources (hunting territories, availability of females, etc.). The problem arises when, thanks to technological development… we now have potentially infinite economic resources… What role do instinctive behaviors like aggression, tribalism, or superstition play now? They are now merely handicaps.
        Sigmund Freud made it clear in his book: “Civilization is the control of instinct.”
        However, what would probably be perfectly logical for an Artificial Intelligence may be shocking for today’s Westerner: the solution to the human problem will closely resemble the old Christian strategies of “saintliness.” (but rationalist). As psychologist Jonathan Haidt has written, “The ancients may not have known much about science, but they were good psychologists.”
        Frankle Fry 16 Oct 2025 11:44 UTC
        −1 points
        0 ∶ 0
        Parent
        This has been a genuinely valuable exchange—thank you for pushing me to think more carefully about the relationship between principles and practice.
        You’ve helped me clarify something important: my framework is primarily designed for AI alignment and institutional design—contexts where we CAN directly encode principles into systems. Constitutional AI doesn’t need emotional motivation or community support to follow its training. Institutions can be structured with explicit rules and incentives.
        For human moral development, you’re absolutely right that we need something different—the community, ritual, and accountability structures you’re describing as “behavioral ideology.” The AA analogy is perfect: the 12 steps alone don’t change behavior; it’s the ecosystem around them that makes it work.
        After reflecting on this conversation (and discussing it with others), I think the key insight is about complementarity rather than competition:
        Your focus: Building the human communities and psychological infrastructure for altruistic behavior
        My focus: Ensuring AI systems embody the right values as they become integrated into life
        The bridge: Successfully aligned AI could actually help humans practice better behavior by consistently modeling benevolent reasoning. But only if we get the alignment right first.
        You’ve also highlighted a cultural assumption I’m still wrestling with: whether my “6 pillars” framework reflects universal human values or carries specific Western philosophical commitments. The process of arriving at values (democratic deliberation, wisdom traditions, divine command) might matter as much as the content itself.
        I’m going to keep working on the technical alignment side—that’s where I can contribute most directly. But I’ll be watching with genuine interest as behavioral approaches like yours develop. The Amish example (400,000 people organizing without coercion) is exactly the kind of existence proof we need that alternatives to current social organization are possible.
        Perhaps we can reconnect once both projects have more empirical results to compare. I suspect we’ll need both approaches—aligned AI providing consistent modeling of good values AND human communities providing the emotional/social infrastructure to actually live those values.
        Thanks again for the thought-provoking exchange. You’ve given me useful frames for thinking about where my work fits in the larger project of human flourishing.