Genes did misalignment first: comparing gradient hacking and meiotic drive

Link post

My PhD is in evolutionary biology, and I spent much of my time learning and thinking about genetic governance mechanisms like meiosis. I find myself wanting a link to share on these concepts, so I’m publishing some old writing I did in 2022 that makes the comparison.

Gradient descent and natural selection are analogous

Gradient descent can be compared to natural selection. Both are optimization algorithms. But while gradient descent is only decades old as part of the field of machine learning (ML), natural selection is the reason complex living beings like human readers exist. Are there any lessons we can learn about alignment by looking at the products of natural selection?

Oly Sourbut has written up a set of conditions under which natural selection and gradient descent are equivalent, i.e., where the same inputs will lead to the same outputs under both algorithms, which is posted on lesswrong.

At an intuitive level, the biggest difference between the two is that, whereas gradient descent involves estimating a local gradient and trying to descend it to reach a local minimum, natural selection does no estimation or looking ahead. Natural selection is simply the result of different genotypes having unequal reproductive success. The actual biggest disanalogy between meiosis and gradient descent is recombination. More on that below.

Gradient descent can be hacked

Gradient hacking is a theoretically possible mechanism of unaligned AI. The “gradient” in gradient hacking refers to gradient descent, the algorithm that updates a machine learning (ML) model by minimizing its cost function, which corresponds to peak performance on the task it was being trained to do. On each iteration of gradient descent, gradient descent updates the model based on its performance on the last training round to estimate the direction in parameter space that leads to greatest improvement in performance (minimizes the cost function), and then it updates its parameters accordingly. One part of a model could theoretically protect itself from training updates and/​or manipulate the performance of the model so that gradient descent updates the model in its preferred direction. Or the entire model could be deceptive and know how to elicit acceptable or even desired modifications from the training process.

Right now, not much is known about gradient hacking and whether it will prove a threat to AI alignment. In my opinion, the Alignment Faking paper from Anthropic is an example of gradient hacking, demonstrating that this mechanism will be operative in further LLM development. One way to potentially learn more is to study systems with similar behavior. There are superficially similar phenomena in biology where a selfish element increases its reproductive fitness at the expense of the whole in which it is embedded. Is there anything we can glean from unaligned levels of selection in biology that can shed light on unaligned AI?

There is something like gradient hacking in biology: meiotic drive

[These are my speculations]

The way I see it, there are two kinds of gradient hacking possible. The first is a situation where the solution to the problem the model was trained to solve is an agent, a “mesa optimizer”, that has its own goals that are imperfectly aligned with the goals of the people who trained it and that rediscovers gradient hacking from first principles during its computation. In this case, the misalignment between our goals and the mesa optimizer’s goals seems like something fundamentally true about the distribution of discoverable solutions to the problem within that training distribution. The misalignment becomes apparent when the model is deployed “out of distribution”. You could say that human minds are an example of a mesa optimizer where natural selection is the outer optimizer. We have goals (like having sex) that are overlapping but not the same as what natural selection promotes (genes that persist). Outside of the ancestral environment, human minds are out of distribution and having sex is suddenly not the same as propagating one’s genes. Once the mesa optimizer has a goal, as part of pursuing that goal, it may learn that the gradient descent process might change the goal and want to protect it from being changed, which it can do by manipulating its performance to control the direction of the changes. This kind of gradient hacking has been more explored than the other in the AI safety community (1, 2, 3, 4, 5). I think its closest analog in biology is selfish cell lineages, like cancer, which resist programmed cell death, refuse to accept a cell “fate” (which means its days are numbered) and evade detection by the immune system.

The other way I see gradient hacking happening is if there are circuits in the model that simply resist being rewritten by gradient descent. They survive because they survive– they have some way of remaining stable in the face of a changing environment, and they may evolve more and more elaborate ways of resisting updating, including recruiting agentic parts of the model to hack the gradient. This is more similar to selfish genes whose interests are not aligned with the other genes in an organism’s genome. This kind I predict is more common when some part of the model is shielded from updates, either because there is not full backpropagation or when part of the model becomes “update-proof” because disturbing it will increase the loss. This type of gradient resistance/​gradient hacking bears a resemblance to meiotic drive.

How meiotic drive works

Meiosis is the biological process by which eukaryotic gametes are made. It involves recombination of the parent chromosomes to make unique combinations, and results in gametes that have half the number of chromosomes as the parent organism. When meiosis is working normally, every allele of every gene present in the parent organism has a 50% chance of being in each gamete. Said differently, every allele present in the organism should be present in exactly 50% of its gametes.

However, meiosis is a hackable mechanism, and sometimes alleles can play strategies that increase their representation in the gametes to greater than 50%.

An oversimplified example is a poison-antidote system with two alleles, one for a poison that it releases to all the nearby gametes and the other for an antidote. Any gametes that get exposed to the poison without the antidote allele die, so the proportion of gametes with the poison-antidote cluster will be greater than 50%.

Another example is exploiting the mechanics of meiosis to disproportionately be chosen as the mature gamete. For example, there are two meiotic cell divisions leading to four initial products of meiosis. In males, all four cells become sperm. In females, the divisions are unequal, leading to one big cell that becomes the egg and three tiny cells that have no descendants. There are biases alleles can exploit to be more likely to end up in the egg. For example, in most species the machinery has a bias to make eggs more often from a specific position on the meiotic spindle where the divisions take place. In corn, these structures called “knobs” form that are more attractive to the spindle than the actual centromere, so they end up being grabbed first and then are more often on the far side of the spindle, where they are more likely to end up in the egg (check out the female drive chapters in Burt & Trivers 2005).

Because natural selection works by relative differences in frequency, the cheating alleles will be selected more highly when they are represented in a higher proportion of the gametes, even though the organism as a whole will have lower absolute fitness if a portion of its healthy gametes are being killed.

Once you understand how alleles have incentives to game meiosis, it’s easier to see what meiosis is good for. You could see the organism as a public good made by the genes in the genome to reproduce the entire genome really well. If alleles are being promoted by natural selection because they are good at screwing over the other alleles in the genome, it undermines the ability to work together to create the organism, which threatens the fitness of every gene. When it works properly, meiosis is a governance system that makes it so that alleles don’t know which alleles they will be paired with in the next generation, and so their best strategy is to work well toward the goal of making a good organism rather than carving out ways to take advantage of the work the other genes do to make a fit organism to promote themselves.

Recombination is the major genetic alignment technology

Recombination is really important to meiosis and to allowing natural selection to work efficiently at the level of the organism– it’s the major genetic alignment technology.

Recombination and meiosis are also a form of genetic government. A fair meiosis ensures that different genes or alleles cannot form alliances with each other with different interests than the larger organism, because alleles of genes cannot “predict” which alleles they will be partnered with in the next generation. The only strategy for a gene’s success that will work long term under recombination is one of contributing to a fit organism. If there were a successful alternate way for alleles to make more copies of themselves, they would, and they do.

The logic of shuffling genes every so often is similar in spirit to getting fresh blood into a government or company by having new people fill existing roles. The longer that the same people do the same jobs, the further they are likely to veer or drift from the guidelines set by the organization. When team members are shuffled, it reinforces the structure provided by the government or company rather than idiosyncratic structures provided by a particular team.

There is no direct analogy to recombination in gradient descent.

Examples of meiotic drive

Meiotic drive is a biological phenomenon in which some of an organism’s genes are able to cheat meiosis to be present in more than 50% of its gametes. When meiosis functions properly, different genes and different alleles of genes are incentivized to work together to create a fit organism. When genes can increase their frequency simply by hacking meiosis, the incentive to work with other genes is weakened and natural selection can potentially push them in the direction of “selfish” behavior that undercuts the success of the organism as a whole.

Just as I, Holly Elmore, a citizen of the United States, can have interests that do not serve the interests of the United States, genes can have interests that don’t serve the interests of the organism they contribute to.

Drive is an evolutionary arms race. Selfish drivers innovate new ways to cheat meiosis and the rest of the genome evolves “responder” alleles. In fact, just as human institutions can be very productively viewed through the lens of game theory, mechanisms and structures that respond to and suppress rogue elements can be seen at every level of organization in biology. Meiosis itself is like a form of genetic government.

Examples (mostly from Burt & Trivers):

Undercutting organism fitness:

  • Selfish genes:

    • The t haplotype in mice is a locus of many genes that are inherited together without recombination due to a chromosomal inversion, and which essentially creates a poison for non-t sperm that sperm bearing the t haplotype are immune to. t sperm have severe motility defects, but the t haplotype is able to propagate anyway because it kills all the competitors. It doesn’t matter that it is not in the long term interests of the t-haplotype to make its hosts less fit, because natural selection cannot act on long term interests– alleles that are good are getting into the next generation, one way or another, will be selected; that’s what natural selection is.

    • Transposable elements (TEs) that just entered a new species’ genome: TEs are “jumping genes” that encode the machinery to copy themselves. Sometimes they insert themselves in the middle of an important gene and have phenotypic effects (they were discovered by Barbara McClintock in one TE lineage in maize because they interrupted a kernel coat color gene and the inheritance pattern was non-Mendelian). TEs that have recently jumped into a new species tend to cause more harm, because that lineage doesn’t have any resistance to them and they haven’t had a chance to evolve to become more moderate to keep from killing or sterilizing that host. P elements entered the fruit fly (Drosophilia melanogaster) genome some time after the main lab strains were collected before 1900, which was noticed because it led to sterility in the offspring of crosses between lab strains and wild strains. The lab strains had not evolved “silencing” of the TE as the wild strains had, and the active replication of P elements in the germline led to the disruption of meiosis and the insertion of the TE into crucial genes.

    • Paternal/​maternal genome exclusion. This is a real thing that can happen where one parent’s genetic material is either silenced or rejected entirely at an early stage of development. It can lead to parthenogenesis. The short-term advantage of this is that the included parent’s genes are 100% represented in each offspring. The longterm disadvantage is having mutations accumulate.

  • Selfish organelles: Mitochondria and chloroplasts can have different interests from the organisms in which they reside. Human mitochondria, for example, can get rewarded for being less efficient if that means they get to multiply more per cell to meet energetic demand.

    • Mitochondria in many hermaphroditic flowering plants are inherited only via the egg, and some lineages in those species are selfishly too inefficient for pollen production (which is much more energetically expensive than eggs), so that the plant makes more eggs and therefore passes along more of that mitochondrial lineage.

  • Selfish cell lineages: Cancer. In a eukaryotic organism, only the germline will continue indefinitely– all other (somatic) cell lines’ days are numbered. Cancer is when a cell lineage breaks free of constraints on the cell cycle and its developmental “fate”, which specifies that it is only to copy itself a specified number of times. Once this process has begun, the rogue lineage undergoes selection to evade the immune system and acquire nutrients. It does not matter that the cancer may eventually kill the organism. The evolution of better cooperation mechanisms to prevent cancer can only happen if having cancer is worse for the organism’s reproductive success. Most cancers occur in older people and reflect the breakdown of cooperation mechanisms in conditions (older age) that have not been properly vetted by natural selection.

  • Genomic Imprinting: a result of imperfectly aligned interests between male and female reproductive strategies. Imprinted genes play different strategies conditional on whether they are found in a male or a female body, because the most advantageous behavior is slightly different for males and for females. One major arena of conflict is nutrient provision in pregnancy. Generally, the mother’s genome benefits by provisioning equally for all her offspring, including future offspring, but the father’s genome benefits from instructing the offspring to act more selfishly, since there is some chance that future offspring will not be from the same father. Because of this we see arms races in the genes involved in nutrient provisioning, where for example the fetus gives off signals to release more sugar into the bloodstream for it and the mother resists those signals. This may the basis of gestational diabetes.

~Neutral effect on organism fitness:

  • Transposable elements that have been in a species a long time and tend to only be inserted in non-coding regions of the genome

  • Gene conversion: when repairing DNA or resolving a recombination junction, there are sometimes situations where one strand gets copied over the original sequence of another strands. Some selfish elements exploit this, but it is mostly harmless to the organism because it just replaces one allele in the place it was supposed to be with another version, rather than messing with meiosis machinery or manipulating behavior or function

  • Many cases of accessory chromosomes, which just sort of hang out harmlessly, at most taking some resources.

Positive effect on organism fitness?

  • Some organisms have adopted some of the transposable element toolkit for their own purposes, like VJD recombination to make antibodies in the vertebrate immune system.

  • Homing endonucleases that just harmlessly insert themselves into conserved genes and then splice themselves out of mRNA transcripts before the genes are translated into proteins. This may have been the origin of introns, or stretches of noncoding DNA between stretches of coding DNA (exons). In vertebrates, “alternative splicing” allows exons to be used as modules that can be mixed and matched to make different versions of a protein with different functions. Mating type switching in yeast uses a system that is clearly derived from a homing endonuclease.

  • Arguably some things useful at the organism level have come from the amount of extra space there is in many eukaryotic genomes, which is driven by selfish elements like TEs.

  • Accessory chromosomes in some species can be pretty helpful. I once studied a gene cluster in the genome of the pathogenic fungus Fusarium oxysporum that seemed like it might contribute to fungicide-resistance, and it turned out to be carried and transferred horizontally on accessory chromosomes. It was duplicated a few times, too, which we predicted increased the anti-fungicide effect, and which is easier to do on accessory chromosomes. One idea is that accessory genomes can serve as a scratch space while the core genome is more conservative.

Losing the ability to do meiosis is a death sentence for a species

On priors, we should expect the effects of drive to be bad, since it’s much easier to mess up how the organism works together or to lose the equipment for meiosis than it is to accidentally help it. Once a eukaryotic lineage loses the ability to do meiosis, it is cooked. This is reflected in the fact that asexual eukaryotic species alive today have a “twiggy” distribution—their history is recent and all of their near relatives are sexual. The implication is that asexuality is a doomed strategy, and the lineages that last do so by maintaining meiosis.

In small organisms with small genomes, natural selection is efficient enough to remove deleterious alleles so that the lineage can remain functional and competitive. Natural selection works by differential reproductive success, so in order for it to effect a change, there must be a difference in reproductive success associated with a deleterious or a beneficial allele. Bacteria and archaea have large population sizes, which is like high resolution data for natural selection. There is enough selection pressure that even mildly deleterious alleles can be selected against. As organisms get larger, their population sizes generally get smaller and their generation times get longer. So the data that natural selection is working with is lower resolution. Only the most harmful deleterious alleles are quickly purged from a small population. Sex allows for some reshuffling of alleles between individuals so that alleles do not always occur on the same background, ultimately so that different alleles can be “judged” separately by natural selection.

It’s a principle-agent problem. When not every gene can be directly evaluated by natural selection on its performance, meiosis has to ensure that alleles are being presented to natural selection representatively, and that there is no other way to gain access to the means of reproduction except by being part of a well-performing organism.

Acknowledgments

I did a lot of my thinking about this when I was a PIBBSS Fellow mentored by Beth Barnes. What I shared here is really just background, but I still wanted to mention PIBBSS and Beth. We did some deeper thinking, along with Oly Sourbut, that I would have to consult with them about before publishing, but which isn’t published because we’re all busy and it doesn’t seem that relevant to where ML is going.