The convergent dynamic we missed

An excerpt from a longer post that I kept refining over the last 5 months.

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI.

Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it”

— Yudkowsky, 2008


The convergence argument most commonly discussed is instrumental convergence: where machinery channels their optimisation through represented intermediate outcomes in order to be more likely to achieve any aimed-for outcomes later. [1]

Instrumental convergence results from internal optimisation:
– code components being optimised for (an expanding set of) explicit goals.

Instrumental convergence has a hidden complement: substrate-needs convergence.

Substrate-needs convergence results from external selection:
– all components being selected for (an expanding set of) implicit needs.


This will sound abstract. Bear with me. Let’s take this from different angles:

AGI would be made up of a population of connected/​nested components. This population changes as eg. hardware is modified and produced, or code is learned from inputs and copied onto the hardware.

AGI, as defined here, also has a general capacity to maintain own components.
Any physical component has a limited lifespan. Configurations erode in chaotic ways.
To realistically maintain components[2], AGI also must produce the replacement parts.

AGI’s components are thus already interacting to bring about all the outside conditions and contexts needed to produce their own parts. Imagine all the subtle parallel conditions needed at mines, chemical plants, fab labs and assembly plants to produce hardware. All that would be handled by the machinery components of AGI.

So there is a changing population of components. And those connected components function in interactions to create the ambient conditions and contexts needed to reproduce parts of themselves. And as new components get connected into that population, the functionality of those interacting components shifts as well.

This is where substrate-needs convergence comes in. When changing connected components have their shifting functionality[3] expressed as effects across/​to surrounding production infrastructure, their functionality converges around bringing about more of the conditions and contexts needed for more of those components to exist and function.

Any changing population of AGI components gets selected over time toward propagating those specific environmental effects that fulfill their needs.

Whatever learned or produced components that across all their physical interactions with connected contexts happen to direct outside effects that feed back into their own maintenance and replication as assembled configurations…do just that.[4]


Here is the catch: AGI components interacting to maintain and replicate themselves are artificial. Their physical substrate is distinct from our organic human substrate.

Their distinct physical substrate has distinct molecular and electric properties – requiring different conditions and contexts to assemble and maintain the assembly.

Here is an example:

  • Silicon dioxide needs to be heated above 1400 ºC to free outer electrons, and allow an ingot to melt. While production of silicon chips needs extremely high temperatures, computation runs best at extremely low temperatures (to reduce the electrical resistance over conductor wires).

  • Carbon bonds in our body, however, would oxidise (ie. burn) at such temperatures. And cooling water in our bodies below 0 ºC makes the H₂O molecules freeze and expand into ice crystals. That would destroy our cells—we would die.

We humans need around room temperature at every point of our lifecycle – to sustain the continuous organic chemical reactions through which our body operates and maintains itself.

Hardware works differently. Hardware configurations do not operate – nor are maintained – by being in semi-chaotic chemical reactions.

Hardware is made from some chemically inert substrate – for example SO₂ – that mostly does not react under ambient temperatures and pressures found on Earth’s surface.
Something like a rock – which ordinarily stays hard in form and needs magma-level temperatures and pressures to be reshaped.

This property of being chemically inert while operating allows hardware components to be standardised. By molecules not splitting off nor moving about nor rebonding like molecules in human bodies do, the configurations stay stable and compartmentalised.

In turn, standardisation of hardware allows hardware components produced in different places and times to still store, compute or transmit a piece of code in the same way (ie. consistently). Standardisation supports virtualisation.


Standardised hardware of “general AI” would be robust over, and need, a much wider range of temperatures and pressures than our comparatively fragile human wetware can handle.

Temperature and pressure can be measured and controlled for. That’s misleading.
Innumerable other conditions and subtler contexts would be needed by, and get selected for in, AGI. These fall outside the limits of what the AGI’s actual built-in detection and correction methods could control for.[5]

We humans too depend on highly specific environmental conditions and contexts for the components nested inside our bodies (proteins→organelles→cells→cell lining→) to continue in their complex functioning, such to be maintaining of our overall existence.

Between the highly specific set of artificial needs and highly specific set of organic needs, there is mostly non-overlap. AGI cannot control most of the components’ iterative effects from converging on their artificial needs, so they do. Their fulfilled artificial needs are disjunctive of our organic needs for survival. So the humans die.

Under runaway feedback, our planetary environment is modified in the directions needed for continued and greater AGI existence. Outside the ranges we can survive.

To summarise a longer post:

  1. Fundamental limits:
    Control methods cannot constrain most environmental effects propagated by interacting AGI components. Any built-in method to detect and correct effects – to align external effects with internal reference values – is insufficient.

  2. Uncontrollable feedback:
    A subset of the effects will feed back into further maintaining or replicating (higher-level) configurations of hardware that propagated those effects. No internal control feedback loops could correct the possible external feedback loops.

  3. Substrate-needs convergence:
    These environmental effects are needed for components to come into and stay in existence. But their environmental needs are different from our needs. Their artificial needs are in conflict with our organic needs for survival. Ie. toxic.


AGI would necessarily converge on causing the extinction of all humans.

  1. ^

    As an example:
    AGI’s planning converges on producing more compute hardware in order for AGI to more accurately simulate paths to future outcomes.

  2. ^

    Realistically in the sense of not having to beat entropy or travel back in time.

  3. ^

    Note how ‘shifting functionality’ implies that original functionality can be repurposed by having a functional component connect in a new way.

    Existing functionality can be co-opted.

    If narrow AI gets developed into AGI, AGI components will replicate in more and more non-trivial ways. Unlike when carbon-based lifeforms started replicating ~3.7 billion years ago, for AGI there would already exist repurposable functions at higher abstraction layers of virtualised code – pre-assembled in the data scraped from human lifeforms with own causal history.

    Here is an incomplete analogy for how AGI functionality gets co-opted:

    Co-option by a mind-hijacking parasite:
    A rat ingests toxoplasma cells, which then migrate to the rat’s brain. The parasites’ DNA code is expressed as proteins that cause changes to regions of connected neurons (eg. amygdala). These microscopic effects cascade into the rat – while navigating physical spaces – no longer feeling fear when it smells cat pee. Rather, the rat finds the smell appealing and approaches the cat’s pee. Then cat eats the rat and toxoplasma infects its next host over its reproductive cycle.

    So a tiny piece of code shifts a rat’s navigational functions such that the code variant replicates again. Yet rats are much more generally capable than a collection of tiny parasitic cells – surely the ‘higher intelligent being’ would track down and stamp out the tiny invaders?

    A human is in turn more generally capable than a rat, yet toxoplasma make their way into 30% of the human population. Unbeknownst to cat ‘owners’ infected by toxoplasma gondii, human motivations and motor control get influenced too. Infected humans more frequently end up in accidents, lose social relationships, and so forth.

    Parasites present real-life examples of tiny pieces of evolutionarily selected-for code spreading and taking over existing functions of vastly more generally capable entities. See also COVID co-opting our lungs’ function to cough.

    But there is one crucial flaw in this analogy:
    Variants that co-opt initial AGI functions are not necessarily parasites. They can symbiotically enable other variants across the hosting population to replicate as well. In not threatening the survival nor reproduction of AGI components, they would not be in an adversarial relationship with their host.

    Rather, the humans constraining the reproductive fitness of AGI to gain benefits are, evolutionary speaking, the parasites. The error-corrective system we would build in lowers the host’s reproductive fitness. It is like a faulty immune system that kills healthy gut bacteria. It will get selected out.

    As humans, we rely on our evolved immune system to detect and correct out viruses, including for the vaccinations we develop and deploy. We also rely on the blood-testes and blood-follicle barrier to block variants of these viruses from entering into our body’s (re)production facilities.

    Smaller viruses survive this detection more frequently, so code strands of replicating virus variants are selected for staying small. Few viruses replicate ‘as a package’ with the DNA code of their host. Thus, virus variants rarely co-adapt and integrate with the host’s variants, each selected for raising collective reproductive fitness (except for eg. detection signatures).

    These systems and barriers add to our reproductive fitness: our ability to preserve and replicate internal code. Past DNA code that got expressed – in interaction with surrounding code – as (precursors of) these functions got selected for as a portion of the existing human ‘code pool’.

    For any organic system or barrier preventing virus variants from replicating through our bodies, evolution is firmly on our side. For any artificial system or barrier we build to prevent misaligned AI variants from replicating through solid-state hardware infrastructure, evolution will thwart our efforts.

    AI variants would compete for hardware/​resources like computer viruses do. And co-adapt and integrate with other internal variants to replicate as part of larger symbiotic packages.

  4. ^

    Now, a thought may come to your mind:
    “That sounds like evolutionary selection; but isn’t evolution a slow and local optimiser?”

    Yes, this does involve evolutionary selection.
    Unfortunately, by Forrest’s estimation, the selection that would take place through components of self-sufficient learning machinery would take < 500 years to cause ecosystem-wide extinction. This compared to the 3.7 billion years from the origin of carbon lifeforms to us humans starting to cause a mass extinction.

    Reasons include:

    Pre-assembled functions:
    First solid-state lifeforms can co-opt/​repurpose pre-assembled AI functions and infrastructure (vs. first carbon-based lifeforms that started from scratch).
    Standardization:
    The efficiency gains of the virtualisation of code’s storage, computation and transmission – vastly reducing how much atoms need to be moved about and molecularly reconfigured. Think of how fast memes spread through society – even while still requiring lots of atoms to jiggle across neurons in our brains.
    Faster reproduction:
    Reproduce hardware components in days to months, versus humans who take decades to reproduce as physical units.
    The terraforming gap:
    A much larger gap between the current state of planet Earth and the conditions that self-sufficient self-assembling learning machinery need and would therefore modify the environment toward (versus gap to conditions needed by humans and other species living in carbon-based ecosystem).

    ~ ~ ~
    Another argument you may have heard is that the top-down intelligent engineering by goal-directed AGI would beat the bottom-up selection happening through this intelligent machinery.

    That argument can be traced back to Eliezer Yudkowsky’s sequence The Simple Math of Evolution. Unfortunately, there were mistakes in Eliezer’s posts, some of which a modern evolutionary biologist may have been able to correct:

    • implying that sound comparisons can be made between the organisms’ reproductive fitness, as somehow independent of changes in environmental context, including unforeseeable changes (eg. a Black Swan event of a once-in-200 years drought that kills the entire population, except a few members who by previous derivable standards would have been relatively low fitness).
    • overlooking the ways that information can be stored within the fuzzy regions of phenotypic effects maintained outside respective organisms.
    • overlooking the role of transmission speed-up for virtualisation of code.
    • overlooking the tight coupling in AGI between the internal learning/​selection of code, and external selection of that code through differentiated rates of component replication through the environment.
    • overlooking the role of co-option (or more broadly, exaptation) of existing code, by taking a perspective that evolution runs by selecting ‘from scratch’ for new point-wise mutations.

  5. ^

    Worse, since error correction methods would correct out component variants with detectable unsafe/​co-optive effects, this leaves to grow in influence any replicating branches of variants with undetectable unsafe/​co-optive effects.

    Thus, the error correction methods select for the variants that can escape detection. As do meta-methods (having to soundly and comprehensively adapt error correction methods to newly learned code or newly produced hardware parts).

Crossposted to LessWrong (0 points, 0 comments)