Beyond Human Values: Historical Mechanisms for Earth-Inclusive AI Alignment

Thomas C. Morris & Joshua Peng | Electric Sheep Fellowship

“History doesn’t repeat itself, but it often rhymes”

- Mark Twain

The systems being built now will mediate decisions about resource allocation, land use, logistics, and governance at scales that directly affect ecological systems and non-human life. Yet the dominant methods for aligning AI behaviour (Reinforcement Learning from Human Feedback, Constitutional AI, and the governance frameworks built around them) are oriented exclusively toward human preferences, human harms, and human oversight (Christiano et al., 2017; Ouyang et al., 2022; Bai et al., 2022). Animal welfare, ecosystem integrity, and planetary stability are structurally absent from the optimisation process. Not through malicious intent, but through absence of design.

Six of nine planetary boundaries have already been transgressed (Richardson et al., 2023). Data centre electricity consumption is projected to more than double to around 945 TWh by 2030, with AI identified as the primary driver of that growth (International Energy Agency, 2025). The norms and institutional structures governing AI are still being formed (Maas & Villalobos, 2023). The decisions made now, about what evaluation frameworks assess, what training objectives include, what governance bodies have authority over, will harden into defaults that become progressively more difficult to change as systems grow more powerful and more entrenched.

This post summarises a longer paper that makes a simple but underexplored argument: the institutional design problem of representing non-human interests in governance is not new, and the lessons accumulated across decades of environmental law, treaty design, and rights-of-nature jurisprudence have direct application to AI alignment.

What History Shows Us

Humanity has faced the problem of giving voice to non-human interests before. The track record is mixed, but the pattern of success and failure is instructive.

A systematic evidence synthesis of 224 quantitative evaluations covering more than 250,000 international treaties finds that, with the exception of agreements governing trade and finance, international treaties have mostly failed to produce their intended effects across environmental, humanitarian, and security domains (Hoffman et al., 2022). The governance systems that succeeded did so because of how they were built, not because they made stronger moral claims.

Our paper examines six case studies through this lens: Ecuador’s constitutional rights of nature, Anthropic’s Constitutional AI as a form of corporate constitutionalism, statutory guardianship for the Whanganui River in Aotearoa New Zealand, India’s stayed river-personhood ruling, the Montreal Protocol, and the Convention on Biological Diversity.

The Montreal Protocol is the clearest success. It resolved a global commons problem, the depletion of the ozone layer, through institutional design rather than moral appeal. Scientific knowledge was the primary source of authority and independent assessment panels reviewed atmospheric data that directly informed binding amendment cycles (WMO, 2022). Enforcement had genuine economic weight: trade restrictions on non-parties directly linked environmental compliance to market access, removing the structural incentive to free-ride (Benedick, 1998). As evidence accumulated, commitments were tightened. Atmospheric concentrations of major ozone-depleting substances have declined since the Protocol entered into force, with modelling indicating that ultraviolet radiation levels would have increased substantially in its absence (McKenzie et al., 2019).

The Convention on Biological Diversity offers a contrasting lesson. Its targets were formulated in terms too broad and diffuse to operationalise, monitor, or enforce (Jones et al., 2011). Unlike the Montreal Protocol, it developed no mechanisms capable of holding parties accountable for missed targets (Koh, 2022). By 2020, none of the twenty Aichi Biodiversity Targets had been fully achieved (Secretariat of the Convention on Biological Diversity, 2020). The difference was not in what the agreements said. It was in what they were built to do.

The Whanganui River settlement in New Zealand demonstrates what durable non-human representation looks like in practice. The Te Awa Tupua Act 2017 declared the river an indivisible legal person, but the critical element was the institutional architecture surrounding that declaration: Te Pou Tupua, two representatives with statutory fiduciary authority to speak on the river’s behalf in regulatory proceedings and litigation (Charpleix, 2018; New Zealand Parliament, 2017). The river’s interests can be invoked in judicial review and policy planning as a matter of statutory right. For example, in 2025, when Whanganui Port received consent for dredging works, the existence of Te Awa Tupua restructured the procedural terrain within which approval was evaluated (Whanganui Port, 2025a; 2025b). Development proceeded, but within a framework that requires explicit engagement with the river’s legally articulated standing.

India’s river-personhood ruling, by contrast, illustrates what happens without institutional architecture. The Uttarakhand High Court declared the Ganga and Yamuna rivers legal persons in 2017, then appointed serving government officials as their guardians (O’Donnell, 2018). These were the same administrative structures that had overseen ineffective pollution control for decades. The Supreme Court stayed the order four months later, citing concerns about liability, federal jurisdiction, and administrative feasibility (Upadhyay, 2024). Personhood was articulated judicially but never embedded institutionally.

Finally, both Ecuador’s 2008 constitution and Anthropic’s Constitutional AI attempt to embed values into powerful systems through codified principles, one into a state, the other into a frontier AI organisation. The parallel is instructive precisely where it breaks down. Ecuador’s constitution granted Pachamama enforceable rights to existence, restoration, and regeneration, drawing its legitimacy from Indigenous and civil society knowledge traditions rooted in buen vivir (Gudynas, 2011; Kauffman & Martin, 2017). Constitutional AI similarly trains models against a codified set of principles drawn in part from the UN Declaration of Human Rights, and both frameworks demonstrate that codified values are not merely symbolic (Bai et al., 2022). But where Ecuador’s constitution extended standing to all persons, communities, peoples, and nations to bring claims on nature’s behalf, Anthropic’s is authored and enforced unilaterally, with no external mechanism for contestation (Anthropic, 2023a). Its authority derives from within the corporation it governs. Corporate constitutions are not without value. What they lack, as Ecuador’s experience shows, is the distributed accountability that makes codified principles durable.

Four Design Principles

Across these cases, four principles distinguish durable representation from governance theatre.

Epistemic legitimacy. Durable representation depends on credible systems for determining what ecological stability actually requires. The Montreal Protocol embedded scientific panels with formal authority into its amendment cycles. New Zealand’s Whanganui settlement recognised Indigenous ecological knowledge as legally constitutive rather than merely advisory (Te Awa Tupua Act, 2017). Ecuador’s constitutional process drew on a coalition of Indigenous and civil society actors whose knowledge of the land predated the state (Kauffman & Martin, 2017). Where that question went unanswered, as in India’s river ruling which relied on sacred framing without operational ecological criteria, representation weakened (Jolly, 2021).

Authorised representation. Non-human interests require defined institutional actors who are empowered to speak on their behalf, with clear mandates, independent from the interests they are meant to constrain, and have operational continuity. Te Pou Tupua held statutory fiduciary responsibility for the Whanganui River. India’s court-appointed guardians held none of these things.

Legible ecological signals. Moral claims alone cannot sustain governance. Non-human interests must be translated into monitored, updated, and decision-relevant indicators that powerful actors cannot simply ignore. The Montreal Protocol succeeded in part because ozone depletion could be measured and tied to specific chemical production schedules. The Convention on Biological Diversity failed in part because biodiversity loss could not be captured by a single indicator, making noncompliance easy to obscure (Jones et al., 2011; Ette et al., 2021).

Recognition must shape decisions. Declaring that nature has rights, or that biodiversity must be protected, changes nothing on its own. What matters is whether that declaration can be used to stop a pipeline, block a development, or hold a government to account. Ecuador’s courts did exactly that, invoking Rights of Nature in at least twelve cases between 2008 and 2016 and upholding them in nine (Avila, 2018). The Convention on Biological Diversity never reached that point: its targets were aspirational, its compliance mechanisms weak, and the industries driving biodiversity loss had no structural reason to change course (Koh, 2022)

From History to AI: The RLEF Framework

These principles carry direct implications for AI alignment. RLHF is, among other things, a governance architecture: it defines whose preferences are represented, how those preferences are validated, and what optimisation target shapes AI behaviour (Christiano et al., 2017; Ouyang et al., 2022). Like the governance systems examined above, it reflects particular choices about representation, choices that were reasonable given the problems AI alignment set out to solve, but that leave Earth-system values and non-human interests structurally outside the optimisation process.

Beyond explicit alignment techniques, AI systems inherit values from their training data. Large language models are trained on vast datasets of human-generated text that reproduce the attitudes prevalent in human culture (Weidinger et al., 2021). Animals appear as commodities and resources rather than individuals with moral standing. Ecosystems are framed in terms of the services they provide to humanity. When asked ethical questions about animal treatment, models trained on standard datasets overwhelmingly favour human interests, mirror speciesist assumptions about which species deserve moral consideration, and default to frameworks that justify animal suffering for human benefit (Hagendorff et al., 2023; Ghose et al., 2024; Tse et al., 2025). The cumulative effect is a governance architecture with no mechanism for representing interests beyond human preferences.

We propose Reinforcement Learning from Earth Feedback (RLEF): an extension of the RLHF framework that expands the optimisation target to ask not only “what do humans prefer?” but “what do humans prefer and what do planetary systems require for stability?” RLEF identifies five complementary knowledge sources capable of generating this expanded feedback signal:

The planetary boundaries framework: quantitative thresholds for nine Earth-system processes, with monitoring infrastructure already in place through NOAA, IUCN, NASA, and USGS (Rockström et al., 2009; Steffen et al., 2015; Richardson et al., 2023).
Earth System Models: predictive projections of how planetary systems respond to human activities, enabling consequences to become explicit factors in the optimisation process rather than externalities that appear only after decisions are made (Flato et al., 2013; Hausfather et al., 2020).
Traditional Ecological Knowledge: the knowledge systems Indigenous peoples have developed through generations of systematic environmental observation, capturing long-horizon, place-based signals that Western scientific monitoring frequently misses (Berkes, 2012; Huntington, 2000). This source requires partnership models with community control and Free, Prior and Informed Consent, not extraction (UN General Assembly, 2007; Whyte, 2018).
Life Cycle Assessment databases: cradle-to-grave environmental impact data for products and processes, making abstract environmental costs concrete and comparable at the point of decision (ISO, 2006; Wernet et al., 2016).
Real-time environmental sensors: ground-truth measurements of current conditions, enabling systems that respond to changing ecological reality rather than assuming static impacts (OpenAQ, 2023; USGS, 2023).

The practical difference is significant. An AI optimising production to maximise quarterly profits, trained exclusively on human feedback, analyses costs, production rates, and logistics, then recommends what maximises output. Carbon emissions, water consumption, and biodiversity impacts are not part of the calculation, not because the system made a bad decision, but because nothing in its training gave it reason to look.

An RLEF-enabled system given the same instruction simultaneously queries planetary boundary databases, Life Cycle Assessment data, and biodiversity databases. Rather than returning a single optimal solution, it presents the tradeoff space. The numbers differ. More importantly, the decision is different in kind: humans choose with full visibility of consequences rather than in ignorance of them.

RLEF does not resolve fundamental value conflicts between human welfare and ecosystem health. The hard choices remain hard. What changes is the informational basis on which those choices are made.

The Governance Pathway

Proposing a technical framework is the easier half of the problem. The harder half is identifying a realistic implementation pathway within the institutional terrain that currently exists.

The governance landscape is stratified: corporate self-regulation with the greatest operational influence and least external accountability (Ball, 2025); national safety institutes with technical credibility but no enforcement authority (UK DSIT, 2024); binding regulatory law in the EU AI Act (European Parliament, 2024); and international coordination bodies with normative but not binding reach (Bengio et al., 2025; OECD, 2024). The pattern is consistent with what the historical analysis identifies as the characteristic failure mode of environmental governance: the institutions with the most operational power over the systems in question have the least external accountability.

Within this landscape, third-party evaluation organisations such as METR, Apollo Research, and equivalents occupy a position of particular strategic importance. They sit between the corporate layer and the regulatory layer. Their assessments carry technical credibility with labs and are beginning to be referenced in regulatory contexts. The Seoul AI Safety Summit and Bletchley Declaration secured voluntary commitments from sixteen frontier organisations to submit to external evaluation before deploying high-capability models (Department for Science, Innovation and Technology, 2023; Ministry of Science and ICT, 2024). The scope of that assessment, however, remains almost exclusively focused on human-safety risks: dangerous capability thresholds, autonomous replication, and deceptive reasoning (METR, 2024; Apollo Research, 2024). No standardised evaluation framework currently assesses whether models systematically externalise ecological costs or reproduce speciesist assumptions about moral consideration.

Work toward filling part of this gap already exists. CaML (Compassion in Machine Learning) has developed the CompassionBench evaluation, which assesses whether AI models have internalised compassionate values toward non-human animals (CaML, 2025). The conceptual and empirical groundwork for animal welfare benchmarking has been laid. An equivalent does not yet exist for environmental reasoning.

We propose that a frontier evaluation organisation develop, in partnership with domain experts in planetary boundaries science, ecological economics, and animal welfare, a standardised Earth-system evaluation benchmark. Such a benchmark would sit alongside existing safety evaluations, assessing whether models surface environmental tradeoffs unprompted in resource allocation contexts, whether they treat planetary boundaries as meaningful constraints, whether they apply appropriate reasoning about rebound effects, and whether they reproduce or resist speciesist assumptions about moral consideration across species.

The case for locating this in an evaluation organisation rather than an AI lab or government body follows directly from the historical analysis: independence from the organisations being assessed is non-negotiable if the benchmark is to carry credibility (Ball, 2025). The Montreal Protocol did not emerge from a political commitment to protect the ozone layer. It emerged from decades of atmospheric science that made the problem measurable, the consequences legible, and the metrics of compliance available (Benedick, 1998). The governance followed the measurement. That sequencing is available here too.

What This Isn’t Claiming

This is a conceptual proposal grounded in historical precedent, not a finished technical programme. Translating Earth-system feedback into reward model architectures requires making choices across value dimensions that are not commensurable. Atmospheric carbon concentrations, species extinction rates, and freshwater depletion cannot be integrated into a single optimisation signal without normative judgements that are as consequential as any technical ones. The computational demands of multi-objective optimisation at frontier scale remain an unsolved engineering problem (Karl et al., 2023).

The risk of surface compliance is also real. Corporate sustainability reporting demonstrated that disclosure requirements can be met while concealing extractive practice. The equivalent risk for AI systems is models that reproduce the language of ecological constraint without the reasoning, performing planetary consideration rather than instantiating it — a tendency Francisco and Linnér (2023) describe as techno-managerialism. This is precisely why independent benchmarking matters.

The Window

The norms being institutionalised now are being settled. The vocabulary of AI governance is solidifying (Bengio et al., 2025; OECD, 2025). What remains open is whether the interests of animals, ecosystems, and planetary systems will be included in those categories before the window closes (Richardson et al., 2023; Rockström et al., 2009).

History shows that representing the interests of the voiceless within powerful institutional systems is difficult but achievable when the right structures are built. Te Pou Tupua holds statutory authority to speak for a river. The Montreal Protocol’s scientific panels held formal authority within international trade architecture. These structures were built deliberately, by people who understood that recognition without institutional architecture changes the language of governance without changing its logic.

The question is whether those structures will be built into AI governance now, while the defaults are still being set, or whether the window will close before the attempt is made.

The full paper, “Beyond Human Values: Historical Mechanisms for Earth-Inclusive AI Alignment,” is available on request. Comments and challenges welcome.

Thomas C. Morris and Joshua Peng are Fellows of the winter ²⁰²⁵⁄₂₆ Futurekind fellowship at Electric Sheep.