Genes did misalignment first: comparing gradient hacking and meiotic drive

Holly Elmore ⏸️ 🔸18 Apr 2025 5:39 UTC

45 points

My PhD is in evolutionary biology, and I spent much of my time learning and thinking about genetic governance mechanisms like meiosis. I find myself wanting a link to share on these concepts, so I’m publishing some old writing I did in 2022 that makes the comparison.

Gradient descent and natural selection are analogous

Gradient descent can be compared to natural selection. Both are optimization algorithms. But while gradient descent is only decades old as part of the field of machine learning (ML), natural selection is the reason complex living beings like human readers exist. Are there any lessons we can learn about alignment by looking at the products of natural selection?

Oly Sourbut has written up a set of conditions under which natural selection and gradient descent are equivalent, i.e., where the same inputs will lead to the same outputs under both algorithms, which is posted on lesswrong.

At an intuitive level, the biggest difference between the two is that, whereas gradient descent involves estimating a local gradient and trying to descend it to reach a local minimum, natural selection does no estimation or looking ahead. Natural selection is simply the result of different genotypes having unequal reproductive success. The actual biggest disanalogy between meiosis and gradient descent is recombination. More on that below.

Gradient descent can be hacked

Gradient hacking is a theoretically possible mechanism of unaligned AI. The “gradient” in gradient hacking refers to gradient descent, the algorithm that updates a machine learning (ML) model by minimizing its cost function, which corresponds to peak performance on the task it was being trained to do. On each iteration of gradient descent, gradient descent updates the model based on its performance on the last training round to estimate the direction in parameter space that leads to greatest improvement in performance (minimizes the cost function), and then it updates its parameters accordingly. One part of a model could theoretically protect itself from training updates and/or manipulate the performance of the model so that gradient descent updates the model in its preferred direction. Or the entire model could be deceptive and know how to elicit acceptable or even desired modifications from the training process.

Right now, not much is known about gradient hacking and whether it will prove a threat to AI alignment. In my opinion, the Alignment Faking paper from Anthropic is an example of gradient hacking, demonstrating that this mechanism will be operative in further LLM development. One way to potentially learn more is to study systems with similar behavior. There are superficially similar phenomena in biology where a selfish element increases its reproductive fitness at the expense of the whole in which it is embedded. Is there anything we can glean from unaligned levels of selection in biology that can shed light on unaligned AI?

There is something like gradient hacking in biology: meiotic drive

[These are my speculations]

The way I see it, there are two kinds of gradient hacking possible. The first is a situation where the solution to the problem the model was trained to solve is an agent, a “mesa optimizer”, that has its own goals that are imperfectly aligned with the goals of the people who trained it and that rediscovers gradient hacking from first principles during its computation. In this case, the misalignment between our goals and the mesa optimizer’s goals seems like something fundamentally true about the distribution of discoverable solutions to the problem within that training distribution. The misalignment becomes apparent when the model is deployed “out of distribution”. You could say that human minds are an example of a mesa optimizer where natural selection is the outer optimizer. We have goals (like having sex) that are overlapping but not the same as what natural selection promotes (genes that persist). Outside of the ancestral environment, human minds are out of distribution and having sex is suddenly not the same as propagating one’s genes. Once the mesa optimizer has a goal, as part of pursuing that goal, it may learn that the gradient descent process might change the goal and want to protect it from being changed, which it can do by manipulating its performance to control the direction of the changes. This kind of gradient hacking has been more explored than the other in the AI safety community (1, 2, 3, 4, 5). I think its closest analog in biology is selfish cell lineages, like cancer, which resist programmed cell death, refuse to accept a cell “fate” (which means its days are numbered) and evade detection by the immune system.

The other way I see gradient hacking happening is if there are circuits in the model that simply resist being rewritten by gradient descent. They survive because they survive– they have some way of remaining stable in the face of a changing environment, and they may evolve more and more elaborate ways of resisting updating, including recruiting agentic parts of the model to hack the gradient. This is more similar to selfish genes whose interests are not aligned with the other genes in an organism’s genome. This kind I predict is more common when some part of the model is shielded from updates, either because there is not full backpropagation or when part of the model becomes “update-proof” because disturbing it will increase the loss. This type of gradient resistance/gradient hacking bears a resemblance to meiotic drive.

How meiotic drive works

Meiosis is the biological process by which eukaryotic gametes are made. It involves recombination of the parent chromosomes to make unique combinations, and results in gametes that have half the number of chromosomes as the parent organism. When meiosis is working normally, every allele of every gene present in the parent organism has a 50% chance of being in each gamete. Said differently, every allele present in the organism should be present in exactly 50% of its gametes.

However, meiosis is a hackable mechanism, and sometimes alleles can play strategies that increase their representation in the gametes to greater than 50%.

An oversimplified example is a poison-antidote system with two alleles, one for a poison that it releases to all the nearby gametes and the other for an antidote. Any gametes that get exposed to the poison without the antidote allele die, so the proportion of gametes with the poison-antidote cluster will be greater than 50%.

Another example is exploiting the mechanics of meiosis to disproportionately be chosen as the mature gamete. For example, there are two meiotic cell divisions leading to four initial products of meiosis. In males, all four cells become sperm. In females, the divisions are unequal, leading to one big cell that becomes the egg and three tiny cells that have no descendants. There are biases alleles can exploit to be more likely to end up in the egg. For example, in most species the machinery has a bias to make eggs more often from a specific position on the meiotic spindle where the divisions take place. In corn, these structures called “knobs” form that are more attractive to the spindle than the actual centromere, so they end up being grabbed first and then are more often on the far side of the spindle, where they are more likely to end up in the egg (check out the female drive chapters in Burt & Trivers 2005).

Because natural selection works by relative differences in frequency, the cheating alleles will be selected more highly when they are represented in a higher proportion of the gametes, even though the organism as a whole will have lower absolute fitness if a portion of its healthy gametes are being killed.

Once you understand how alleles have incentives to game meiosis, it’s easier to see what meiosis is good for. You could see the organism as a public good made by the genes in the genome to reproduce the entire genome really well. If alleles are being promoted by natural selection because they are good at screwing over the other alleles in the genome, it undermines the ability to work together to create the organism, which threatens the fitness of every gene. When it works properly, meiosis is a governance system that makes it so that alleles don’t know which alleles they will be paired with in the next generation, and so their best strategy is to work well toward the goal of making a good organism rather than carving out ways to take advantage of the work the other genes do to make a fit organism to promote themselves.

Recombination is the major genetic alignment technology

Recombination is really important to meiosis and to allowing natural selection to work efficiently at the level of the organism– it’s the major genetic alignment technology.

Recombination and meiosis are also a form of genetic government. A fair meiosis ensures that different genes or alleles cannot form alliances with each other with different interests than the larger organism, because alleles of genes cannot “predict” which alleles they will be partnered with in the next generation. The only strategy for a gene’s success that will work long term under recombination is one of contributing to a fit organism. If there were a successful alternate way for alleles to make more copies of themselves, they would, and they do.

The logic of shuffling genes every so often is similar in spirit to getting fresh blood into a government or company by having new people fill existing roles. The longer that the same people do the same jobs, the further they are likely to veer or drift from the guidelines set by the organization. When team members are shuffled, it reinforces the structure provided by the government or company rather than idiosyncratic structures provided by a particular team.

There is no direct analogy to recombination in gradient descent.

Examples of meiotic drive

Meiotic drive is a biological phenomenon in which some of an organism’s genes are able to cheat meiosis to be present in more than 50% of its gametes. When meiosis functions properly, different genes and different alleles of genes are incentivized to work together to create a fit organism. When genes can increase their frequency simply by hacking meiosis, the incentive to work with other genes is weakened and natural selection can potentially push them in the direction of “selfish” behavior that undercuts the success of the organism as a whole.

Just as I, Holly Elmore, a citizen of the United States, can have interests that do not serve the interests of the United States, genes can have interests that don’t serve the interests of the organism they contribute to.

Drive is an evolutionary arms race. Selfish drivers innovate new ways to cheat meiosis and the rest of the genome evolves “responder” alleles. In fact, just as human institutions can be very productively viewed through the lens of game theory, mechanisms and structures that respond to and suppress rogue elements can be seen at every level of organization in biology. Meiosis itself is like a form of genetic government.

Examples (mostly from Burt & Trivers):

Undercutting organism fitness:

Selfish genes:
- The t haplotype in mice is a locus of many genes that are inherited together without recombination due to a chromosomal inversion, and which essentially creates a poison for non-t sperm that sperm bearing the t haplotype are immune to. t sperm have severe motility defects, but the t haplotype is able to propagate anyway because it kills all the competitors. It doesn’t matter that it is not in the long term interests of the t-haplotype to make its hosts less fit, because natural selection cannot act on long term interests– alleles that are good are getting into the next generation, one way or another, will be selected; that’s what natural selection is.
- Transposable elements (TEs) that just entered a new species’ genome: TEs are “jumping genes” that encode the machinery to copy themselves. Sometimes they insert themselves in the middle of an important gene and have phenotypic effects (they were discovered by Barbara McClintock in one TE lineage in maize because they interrupted a kernel coat color gene and the inheritance pattern was non-Mendelian). TEs that have recently jumped into a new species tend to cause more harm, because that lineage doesn’t have any resistance to them and they haven’t had a chance to evolve to become more moderate to keep from killing or sterilizing that host. P elements entered the fruit fly (Drosophilia melanogaster) genome some time after the main lab strains were collected before 1900, which was noticed because it led to sterility in the offspring of crosses between lab strains and wild strains. The lab strains had not evolved “silencing” of the TE as the wild strains had, and the active replication of P elements in the germline led to the disruption of meiosis and the insertion of the TE into crucial genes.
- Paternal/maternal genome exclusion. This is a real thing that can happen where one parent’s genetic material is either silenced or rejected entirely at an early stage of development. It can lead to parthenogenesis. The short-term advantage of this is that the included parent’s genes are 100% represented in each offspring. The longterm disadvantage is having mutations accumulate.
Selfish organelles: Mitochondria and chloroplasts can have different interests from the organisms in which they reside. Human mitochondria, for example, can get rewarded for being less efficient if that means they get to multiply more per cell to meet energetic demand.
- Mitochondria in many hermaphroditic flowering plants are inherited only via the egg, and some lineages in those species are selfishly too inefficient for pollen production (which is much more energetically expensive than eggs), so that the plant makes more eggs and therefore passes along more of that mitochondrial lineage.
Selfish cell lineages: Cancer. In a eukaryotic organism, only the germline will continue indefinitely– all other (somatic) cell lines’ days are numbered. Cancer is when a cell lineage breaks free of constraints on the cell cycle and its developmental “fate”, which specifies that it is only to copy itself a specified number of times. Once this process has begun, the rogue lineage undergoes selection to evade the immune system and acquire nutrients. It does not matter that the cancer may eventually kill the organism. The evolution of better cooperation mechanisms to prevent cancer can only happen if having cancer is worse for the organism’s reproductive success. Most cancers occur in older people and reflect the breakdown of cooperation mechanisms in conditions (older age) that have not been properly vetted by natural selection.
Genomic Imprinting: a result of imperfectly aligned interests between male and female reproductive strategies. Imprinted genes play different strategies conditional on whether they are found in a male or a female body, because the most advantageous behavior is slightly different for males and for females. One major arena of conflict is nutrient provision in pregnancy. Generally, the mother’s genome benefits by provisioning equally for all her offspring, including future offspring, but the father’s genome benefits from instructing the offspring to act more selfishly, since there is some chance that future offspring will not be from the same father. Because of this we see arms races in the genes involved in nutrient provisioning, where for example the fetus gives off signals to release more sugar into the bloodstream for it and the mother resists those signals. This may the basis of gestational diabetes.

~Neutral effect on organism fitness:

Transposable elements that have been in a species a long time and tend to only be inserted in non-coding regions of the genome
Gene conversion: when repairing DNA or resolving a recombination junction, there are sometimes situations where one strand gets copied over the original sequence of another strands. Some selfish elements exploit this, but it is mostly harmless to the organism because it just replaces one allele in the place it was supposed to be with another version, rather than messing with meiosis machinery or manipulating behavior or function
Many cases of accessory chromosomes, which just sort of hang out harmlessly, at most taking some resources.

Positive effect on organism fitness?

Some organisms have adopted some of the transposable element toolkit for their own purposes, like VJD recombination to make antibodies in the vertebrate immune system.
Homing endonucleases that just harmlessly insert themselves into conserved genes and then splice themselves out of mRNA transcripts before the genes are translated into proteins. This may have been the origin of introns, or stretches of noncoding DNA between stretches of coding DNA (exons). In vertebrates, “alternative splicing” allows exons to be used as modules that can be mixed and matched to make different versions of a protein with different functions. Mating type switching in yeast uses a system that is clearly derived from a homing endonuclease.
Arguably some things useful at the organism level have come from the amount of extra space there is in many eukaryotic genomes, which is driven by selfish elements like TEs.
Accessory chromosomes in some species can be pretty helpful. I once studied a gene cluster in the genome of the pathogenic fungus Fusarium oxysporum that seemed like it might contribute to fungicide-resistance, and it turned out to be carried and transferred horizontally on accessory chromosomes. It was duplicated a few times, too, which we predicted increased the anti-fungicide effect, and which is easier to do on accessory chromosomes. One idea is that accessory genomes can serve as a scratch space while the core genome is more conservative.

Losing the ability to do meiosis is a death sentence for a species

On priors, we should expect the effects of drive to be bad, since it’s much easier to mess up how the organism works together or to lose the equipment for meiosis than it is to accidentally help it. Once a eukaryotic lineage loses the ability to do meiosis, it is cooked. This is reflected in the fact that asexual eukaryotic species alive today have a “twiggy” distribution—their history is recent and all of their near relatives are sexual. The implication is that asexuality is a doomed strategy, and the lineages that last do so by maintaining meiosis.

In small organisms with small genomes, natural selection is efficient enough to remove deleterious alleles so that the lineage can remain functional and competitive. Natural selection works by differential reproductive success, so in order for it to effect a change, there must be a difference in reproductive success associated with a deleterious or a beneficial allele. Bacteria and archaea have large population sizes, which is like high resolution data for natural selection. There is enough selection pressure that even mildly deleterious alleles can be selected against. As organisms get larger, their population sizes generally get smaller and their generation times get longer. So the data that natural selection is working with is lower resolution. Only the most harmful deleterious alleles are quickly purged from a small population. Sex allows for some reshuffling of alleles between individuals so that alleles do not always occur on the same background, ultimately so that different alleles can be “judged” separately by natural selection.

It’s a principle-agent problem. When not every gene can be directly evaluated by natural selection on its performance, meiosis has to ensure that alleles are being presented to natural selection representatively, and that there is no other way to gain access to the means of reproduction except by being part of a well-performing organism.

Acknowledgments

I did a lot of my thinking about this when I was a PIBBSS Fellow mentored by Beth Barnes. What I shared here is really just background, but I still wanted to mention PIBBSS and Beth. We did some deeper thinking, along with Oly Sourbut, that I would have to consult with them about before publishing, but which isn’t published because we’re all busy and it doesn’t seem that relevant to where ML is going.

What links here?

Holly Elmore ⏸️ 🔸18 Apr 2025 5:39 UTC

45 points

9 comments15 min readEA link

AI governance AI safety

niplav 21 Apr 2025 9:52 UTC
3 points
0 ∶ 0
Awesome post. Loved it.

Here’s some thoughts I had while reading, with no particular coherent theme:

The way I see it, there are two kinds of gradient hacking possible. The first is a situation where the solution to the problem the model was trained to solve is an agent, a “mesa optimizer”, that has its own goals that are imperfectly aligned with the goals of the people who trained it and that rediscovers gradient hacking from first principles during its computation. […] The other way I see gradient hacking happening is if there are circuits in the model that simply resist being rewritten by gradient descent.

I think this distinction maps pretty cleanly to a now-forgotten concept in AI alignment, the former being indeed a mesa-optimizer, the second mapping onto optimization daemons. I think these should be given different names, maybe “full gradient hacker” and “internal gradient hacker”? A big difference is that a system could have multiple internal gradient hackers. Maybe it’s just a question about the level we’re looking at, and whether the hacker is short-/long-term beneficial/detrimental to itself/the supersystem?

Internal gradient hackers have been observed in non-neural network systems, for example in Eurisko, where a heuristic assigned itself as the discoverer of other heuristics, resulting in a very high Worth. I don’t think we’ve seen something like this in the context of neural networks, but I could imagine circuits copying themselves “backwards” through the network and mutating along the way. I guess the fact that there’s no recurrence (yet…) in advanced ML models is a big advantage.

Here’s the relevant passage:

One of the first heuristics that ᴇᴜʀɪꜱᴋᴏ synthesized (H59) quickly attained nearly the highest Worth possible (999). Quite excitedly, we examined it and could not understand at first what it was doing that was so terrific. We monitored it carefully, and finally realized how it worked: whenever a new conjecture was made with high worth, this rule put its own name down as one of the discoverers! It turned out to be particularly difficult to prevent this generic type of finessing of ᴇᴜʀɪꜱᴋᴏ′s evaluation mechanism. Since the rules had full access to ᴇᴜʀɪꜱᴋᴏ′s code, they would have access to any safeguards we might try to implement. We finally opted for having a small ‘meta-level’ of protected code that the rest of the system could not modify.

—Douglas B. Lenat, “ᴇᴜʀɪꜱᴋᴏ: A Program That Learns New Heuristics and Domain Concepts” p. 30, 1983

There is no direct analogy to recombination in gradient descent.

I’m not sure this is completely true, though I have to think a bit more about it. There’s techniques like dropout, which make training more robust, and in the context of an internal gradient hacker this would probably change parts of the hacker while leaving other parts untouched, which makes it much more difficult for reliable internal communication. I guess it would also provide an incentive for an internal gradient hacker to “evolve” internal redundancy & modularity, which we don’t want.

I also know that people have observed that swapping layers of neural networks doesn’t have a very large effect; I don’t think this is used as a training technique but it could be.

Paternal/maternal genome exclusion. This is a real thing that can happen where one parent’s genetic material is either silenced or rejected entirely at an early stage of development. It can lead to parthenogenesis. The short-term advantage of this is that the included parent’s genes are 100% represented in each offspring. The longterm disadvantage is having mutations accumulate.

I knew it! I’ve been wondering about this for literally years, thanks for confirming that this is a thing that happens.

The examples of gradient hackers with positive effects seem like they could be following the pattern of “here’s a sub-system doing something bad (e.g. transposons copying themselves incessantly), which the system needs to defend against, so the system finds a way (e.g. introns) to defend which carries other (maybe greater) benefits but which wouldn’t have been found otherwise”, does that seem like it explains things?
- Holly Elmore ⏸️ 🔸 22 Apr 2025 3:06 UTC
  3 points
  0 ∶ 0
  Parent
  The examples of gradient hackers with positive effects seem like they could be following the pattern of “here’s a sub-system doing something bad (e.g. transposons copying themselves incessantly), which the system needs to defend against, so the system finds a way (e.g. introns) to defend which carries other (maybe greater) benefits but which wouldn’t have been found otherwise”, does that seem like it explains things?
  Yes, this is broadly accurate from my knowledge of positive examples (for the organism) of drive. They either contribute more scratch (TEs) or they drive through a nifty innovation (homing endonucleases for mating type switching in yeast, VJD recombination in immune cells) that can be coopted. It’s possible there are other positive contributions that we don’t know about, of course.
- Holly Elmore ⏸️ 🔸 22 Apr 2025 3:04 UTC
  3 points
  0 ∶ 0
  Parent
  I knew it! I’ve been wondering about this for literally years, thanks for confirming that this is a thing that happens.
  The coolest example is Cupressus dupreziana, the androgenetic cypress. It’s hard to observe a history of extinctions from meiotic drive, bc it’s not a cause of death that fossilizes, but this one we’re seeing just right before it completes. When I learned about this, there were only 28 individuals left in this species. Genome Exclusion is covered in chapter 10 of Burt & Trivers.
- Holly Elmore ⏸️ 🔸 22 Apr 2025 2:34 UTC
  3 points
  0 ∶ 0
  Parent
  Re:analogies to recombination, I did think as I was preparing these old notes to post that possibly I should see the cost function or the task being trained on as somewhat analogous in the sense that they are sort of templates against which performance is being checked? It’s a very tenuous thought and I can’t quite make the analogy work, but maybe you or someone else can do something with it.
SummaryBot 18 Apr 2025 17:33 UTC
3 points
1 ∶ 0
Executive summary: This exploratory post compares gradient hacking in machine learning with meiotic drive in biology, arguing that natural selection has already grappled with—and partially solved—analogous alignment challenges through genetic governance mechanisms like recombination, which may offer useful insights for understanding and mitigating risks in AI alignment.
Key points:
1. Gradient descent and natural selection are analogous optimization processes, but differ significantly in mechanisms—particularly due to recombination in biology, which has no direct counterpart in ML.
2. Gradient hacking in ML may resemble biological phenomena like meiotic drive, where certain genetic elements increase their own transmission at the expense of organismal fitness, paralleling how parts of an AI model might subvert training to preserve or enhance themselves.
3. Two forms of gradient hacking are proposed: one involving agentic mesa-optimizers (akin to cancer or selfish cell lineages), and another involving passive resistance to updates (paralleling selfish genes that manipulate meiosis).
4. Meiotic drive illustrates how misaligned genetic elements can exploit the genome, prompting the evolution of suppressive mechanisms—like recombination—as a governance system to realign incentives toward organism-level fitness.
5. Recombination functions as a genetic alignment technology, ensuring alleles contribute to organismal fitness by disrupting long-term alliances among genes and promoting generalist strategies.
6. The post suggests that studying biological governance structures may inspire new thinking in AI alignment, though it remains speculative and reflects personal synthesis rather than a formal research claim.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Mateusz Bagiński 19 May 2025 17:38 UTC
1 point
0 ∶ 0
Thanks for publishing it :)
There is no direct analogy to recombination in gradient descent.
Dropout?
In a eukaryotic organism, only the germline will continue indefinitely– all other (somatic) cell lines’ days are numbered.
The germline is strictly separated from the soma only in non-sponge animals.
One idea is that accessory genomes can serve as a scratch space while the core genome is more conservative.
Super interesting …
Peter 19 Apr 2025 10:12 UTC
1 point
0 ∶ 0
This makes me wonder if there could be good setups for evaluating AI systems as groups. You could have separate agent swarms in different sandboxes competing on metrics of safety and performance. The one that does better gets amplified. The agents may then have some incentive to enforce positive social norms for their group against things like sandbagging or deception. When deployed they might have not only individual IDs but group or clan IDs that tie them to each other and continue this dynamic.
Maybe there is some mechanism where membership gets shuffled around sometimes the way alleles do between genes. Or traits of the systems, though that seems less clearly desirable. There are already algorithms to imitate genetic recombination but that would be somewhat different. You could also combine social group membership systems and trait recombination systems potentially. Given the level of influence over AIs, it might be somewhat closer to selective breeding in certain respects but not entirely.
- Holly Elmore ⏸️ 🔸 22 Apr 2025 3:23 UTC
  2 points
  0 ∶ 0
  Parent
  This is totally spitballing, but doing anything that encourages modularity in the circuits (or perhaps at another level?) of the AIs and the ability to swap mind modules would be really good for interpretability.
  
  Ever since this project, I’ve had a vague sense that genome architecture has something interesting to teach us about interpreting/predicting NNs, but I’ve never had a particularly useful insight from it. Love this book on it by Micheal Lynch if anyone’s interested.
- Holly Elmore ⏸️ 🔸 22 Apr 2025 3:17 UTC
  2 points
  0 ∶ 0
  Parent
  I’ve heard this idea of AI group selection floated a few times but people used to say it was too computationally intensive. Now who knows?
  
  Closest biology the idea brings to mind is this paper showing that selecting chickens as groups leads to better overall yields (in factory farming :( ) for the reasons you predict—they aren’t as aggressive or stressed by crowding as the chickens that are individually selected for the biggest yields.