Project Proposal: Gears and Aging
Crossposted from LessWrong.
Imagine building an airplane the way we do biology research.
Hundreds of separate research groups each pick different airplane parts to work on. There’s no particular coordination in this; people choose based first on what they know how to build, second on what seems fun/interesting/flashy/fashionable, and finally on what they expect to be useful (based on limited local knowledge). Funds are allocated to anything which sounds vaguely related to airplanes. There might be fifty different groups all building propellers, one guy toiling away at gyros and nobody at all on fuel lines. One group is building an autopilot system which could be handy but isn’t really necessary; others are building things which won’t be useful at all but they don’t realize they won’t be useful.
There’s obviously room to generate value by assembling all the parts together, but there’s more to it than that. It’s not just that nobody is assembling the parts, there isn’t even a plan to assemble them. Nobody’s really sure what parts are even being produced, and nobody has a comprehensive list of parts needed in order to build an airplane. If things are missing, nobody knows it. If some work is extraneous, nobody knows that either; nobody knows what the minimum viable path to an airplane looks like. There is no airplane blueprint.
This is what the large majority of aging research looks like. There’s hundreds of different groups each studying specific subsystems. There’s little coordination on what is studied, few-if-any people assembling the parts on a large scale, and nothing like a blueprint.
The eventual vision of the Gears of Aging project is to create a blueprint.
What Does That Look Like?
A blueprint does not have to include all the internals of every single subsystem in comprehensive detail. The idea is to include enough detail that we can calculate whether the airplane will fly under various conditions.
Likewise, a blueprint-analogue for aging should include enough detail that we can calculate whether a given treatment/intervention will cure various age-related diseases, and independently verify each of the model’s assumptions.
Such a calculation doesn’t necessarily involve a lot of numerical precision. If we know the root cause of some age-related disease with high confidence, then we can say that reversing the root cause would cure the disease, without doing much math. On the other hand, we probably do need at least some quantitative precision in order to be highly confident that we’ve identified the root cause, and haven’t missed anything important.
Like an airplane blueprint, the goal is to show how all the components connect—a system-level point of view. Much research has already been published on individual components and their local connections—anything from the elastin → wrinkles connection to the thymic involution → T-cell ratio connection to the stress → sirtuins → heterochromatin → genomic instability pathway. A blueprint should summarize the key parameters of each local component and its connections to other components, in a manner suitable for tracing whole chains of cause-and-effect from one end to the other.
Most importantly, a blueprint needs some degree of comprehensiveness. We don’t want to model the entirety of human physiology, but we need a complete end-to-end model of at least some age-related diseases. The more diseases we can fully model, from root cause all the way to observed pathology, the more useful the blueprint will be.
Summary: a blueprint-analogue for aging would walk through every causal link, from root cause to pathology, for one or a few age-related diseases, in enough detail to calculate whether a given intervention would actually cure the disease.
What’s the Value-Add?
Why would a blueprint be useful?
I’d phrase the key feature as “vertical comprehensiveness”, in analogy to vertical integration in economics. It’s mapping out every step of the causal chain from root cause to pathology—the whole “production chain” of one or a few pathologies.
To see why this is useful, let’s compare it to the dual feature: horizontal comprehensiveness. A good example here is the SENS project: a program to prevent aging by cataloguing every potential root cause, and regularly repairing each of them. This is a purely-horizontal approach: it does not require any understanding at all of the causal pathways from root causes to pathologies, but it does require a comprehensive catalogue of every root cause.
SENS requires finding root causes, while a blueprint requires full pathways. Note that this diagram is a loose analogy; actual biological systems do not look like this.
The relative disadvantage of a SENS-style horizontal approach is that there’s no way to check it locally. If it turns out that we missed a root cause, SENS has no built-in way to notice that until the whole project is done and we notice some pathology which hasn’t been fixed. Conversely, if we mistakenly included a root cause which doesn’t matter, we have no built-in way to notice that at all; we waste resources fixing some extraneous problem. For example, here’s the original list of low-level damage types for SENS to address (from the Wikipedia page):
Accumulation of lysosomal aggregates
Accumulation of senescent cells
Age related tumors
Mitochondrial DNA mutations
Immune system damage
Buildup of advanced glycation end-products
Accumulation of extracellular protein aggregates
Cell loss
Hormonal muscle damage
Note the inclusion of senescent cells. Today, it is clear that senescent cells are not a root cause of aging, since they turn over on a timescale of days to weeks. Senescent cells are an extraneous target. Furthermore, since senescent cell counts do increase with age, there must also be some root cause upstream of that increase—and it seems unlikely to be any of the other items on the original SENS list. Some root cause is missing. If we attempted to address aging by removing senescent cells (via senolytics), whatever root cause induces the increase in senescent cells in the first place would presumably continue to accumulate, requiring ever-larger doses of senolytics until the senolytic dosage itself approached toxicity—along with whatever other problems the root cause induced.
This isn’t to bash the SENS program; I’m personally a fan of it. The point is that the SENS program lacks a built-in way to cheaply verify its plan. It needs to rely on other kinds of research in order to make sure that its list of targets is complete and minimal.
Conversely, built-in verification is exactly where vertical comprehensiveness shines.
When we have a full causal pathway, we can ask at each step:
Does this causal relationship actually hold?
Do the immediate causes actually change with age by the right amount to explain the observed effects?
Because we can do this locally, at each step of the chain, we can verify our model as we go. Much like a mathematical proof, we can check each step of our model along the way; we don’t need to finish the entire project in order to check our work.
In particular, this gives us a natural mechanism to notice missing or extraneous pieces. Checking whether senescent cells are actually a root cause is automatically part of the approach. So is figuring out what’s upstream of their age-related increase in count. If there’s more than one factor upstream of some pathology, we can automatically detect any we missed by quantitatively checking whether the observed change in causes accounts for the observed change in effects.
Summary: the main value-add of a blueprint-style end-to-end model is that we can locally verify each link in the causal chain, usually using already-existing data.
Is It Tractable?
I think that the data required to figure out the gears of most major human age-related diseases is probably already available, online, today. And I don’t mean that in the sense of “a superintelligent AI could figure it out”; I mean that humans could probably figure it out without any more data than we currently have.
That belief stems mainly from having dug into the problem a fair bit already. Everywhere I look, there’s plenty of data. Someone has experimentally tested, if not the exact thing I want to know, at least something close enough to provide evidence.
The hard part is not lack of data, the hard part is too much data. There’s more than a human could ever hope to work through, for each major subsystem. It’s all about figuring out which questions to ask, guessing which experiments could provide evidence for those questions and are likely to have already been done, then tracking down the results from those experiments.
So I think the data is there, although my reasons for that belief are not easy for someone to check without studying the literature in some depth.
The other piece of tractability is whether the system is simple enough, on some level, that a human can hope to understand all the key pieces. Based on having seen a fair bit, I definitely expect that it is simple enough—not simple, there are a lot of moving pieces and figuring them all out takes a fair bit of work, but still well within human capacity. We could also make some outside view arguments supporting this view—for instance, since the vast majority of molecules/structures/cells in a human turn over on much faster timescales than aging, there are unlikely to be more than a handful of independent root causes.
Outside-View Tractability
If a project like this is both useful and tractable, why hasn’t it already been done?
The usual academic outlet for a blueprint-style vertically-comprehensive work would be a textbook. And there are textbooks on aging, as well as monographs, and of course books on various subtopics as well. Unfortunately, the field is still relatively young, and textbook-writing tends to be under-incentivized in the sciences; most academic hiring and tenure committees prefer original research. Even those textbooks which do exist tend to either involve a broad-but-shallow summary of existing research (for single-author books) or standalone essays on particular components (for multi-author monographs). They are part catalogues, not blueprints.
But the biggest shortcoming of typical textbooks, compared to the blueprint-style picture, is that typical textbooks do not actually perform the local verification of model components.
This is exactly the sort of problem where we’d expect a rationalist skillset—statistics, causality, noticing confusion, mysterious answers, etc—to be more of a limiting factor than biological know-how. Add that to the lack of incentive for this sort of work, and it’s not surprising that it hasn’t been done.
A handful of examples, to illustrate the sort of reasoning which is lacking in most books on aging:
Many review articles and textbooks claim that the increased stiffness of blood vessels in old age results (at least partially) from an increase in the amount of collagen relative to elastin in vessel walls. But if we go look for studies which directly measure the collagen:elastin ratio in the blood vessels, we mostly find no significant change with age (rat, human, rat).
Many reviews and textbooks mention that the bulk of reactive oxygen species (ROS) are produced by mitochondria. Attempts at direct measurement instead suggest that mitochondria account for about 15% (PhysAging, table 5.3).
In 1991, a small-count genetic study suggested that amyloid protein aggregates in the brain cause Alzheimers. Notably, they “confirmed diagnoses via autopsy”—which usually means checking the brain for amyloid deposits. At least as early as 2003, it was known that amyloid deposits turn over on a timescale of hours. Yet, according to Wikipedia, over 200 clinical trials attempted to cure Alzheimers by clearing plaques between 2002 and 2012; only a single trial ended in FDA approval, and we still don’t have a full cure.
A great deal of effort has gone into imaging neuromuscular junctions in aging organisms. As far as I can tell, there was never any significant evidence that the observed changes played any significant causal role in any age-related disease. They did produce really cool pictures, though.
These are the sorts of things which jump out when we ask, for every link in a hypothesized causal chain:
Does this causal relationship actually hold?
Do the immediate causes actually change with age by the right amount to explain the observed effects?
Summary
I think that the data required to figure out the gears of most major human age-related diseases is probably already available, online, today. The parts to build an airplane are already on the market. We lack a blueprint: an end-to-end model of age-related pathologies, containing enough detail for each causal link in the chain to be independently validated by experimental and observational data, and sufficient to calculate whether a given intervention will actually cure the disease.
I think this is a good point. I wonder if there are examples were writing a textbook led to key insights.
I noticed that I have a vague sense that this is also true for AGI based on human cognition. I wonder if you think that polling a research community on questions like „Do we already know enough to derive X with a lot of smart effort?“ would give a good sense of tractability.