I want to highlight something that I missed on the first read but nagged me on the second read.
You define transformative AGI as:
1. Gross world product (GWP) exceeds 130% of its previous yearly peak value 2. World primary energy consumption exceeds 130% of its previous yearly peak value 3. Fewer than one billion biological humans remain alive on Earth
You predict when transformative AGI will arrive by building a model that predicts when we’ll have enough compute to train an AGI.
But I feel like there’s a giant missing link—what are the odds that training an AGI causes 1, 2, or 3?
It feels not only plausible but quite likely to me that the first AGI will be very expensive and very uneven (superhuman in some respects and subhuman in others). An expensive, uneven AGI may years or decades to self-improve to the point that GWP or energy consumption rises by 30% in a year.
It feels like you are implicitly ascribing 100% probability to this key step.
This is one reason (among others) that I think your probabilities are wildly high. Looking forward to setting up our bet. :)
You predict when transformative AGI will arrive by building a model that predicts when we’ll have enough compute to train an AGI.
But I feel like there’s a giant missing link—what are the odds that training an AGI causes 1, 2, or 3?
I think you’re right that my post neglected to discuss these considerations. On the other hand, my bottom-line probability distribution at the end of the post deliberately has a long tail to take into account delays such as high cost, regulation, fine-tuning, safety evaluation, and so on. For these reasons, I don’t think I’m being too aggressive.
Regarding the point about high cost in particular: it seems unlikely to me that TAI will have a prohibitively high inference cost. As you know, Joseph Carlsmith estimated brain FLOP with a central estimate of 10^15 FLOP/s. This is orders of magnitude higher than the cost of LLMs today, and it would still be comparable to prevailing human wages at current hardware prices. In addition, there are more considerations that push me towards TAI being cheap:
A large fraction of our economy can be automated without physical robots. The relevant brain anchor for intellectual tasks is arguably the cerebral cortex rather than the full human brain. And according to Wikipedia, “There are between 14 and 16 billion neurons in the human cerebral cortex.” It’s not clear to me how many synapses there are in the cerebral cortex, but if the synapse-to-neuron ratio is consistent throughout the brain, then the inference cost of the cerebral cortex is plausibly about 1/5th the inference cost of the whole brain.
The human brain is plausibly undertrained relative to its size, due to evolutionary constraints that push hard against delaying maturity in animals. As a consequence, ML models with brain-level efficiency can probably match human performance at much lower size (and thus, inference cost). I currently expect this consideration to mean that the human brain is 2-10x larger than “necessary”.
The chinchilla scaling laws suggest that inference costs should grow at about half the rate as training costs. This is essentially the dual consideration of the argument I gave in the post about data not being a major bottleneck.
We seem to have a wider range of strategies available for cutting down the cost of inference compared to evolution. I’m not sure about this consideration though.
I’m aware that you have some counterarguments to these points in your own paper, but I haven’t finished reading it yet.
Excellent post.
I want to highlight something that I missed on the first read but nagged me on the second read.
You define transformative AGI as:
1. Gross world product (GWP) exceeds 130% of its previous yearly peak value
2. World primary energy consumption exceeds 130% of its previous yearly peak value
3. Fewer than one billion biological humans remain alive on Earth
You predict when transformative AGI will arrive by building a model that predicts when we’ll have enough compute to train an AGI.
But I feel like there’s a giant missing link—what are the odds that training an AGI causes 1, 2, or 3?
It feels not only plausible but quite likely to me that the first AGI will be very expensive and very uneven (superhuman in some respects and subhuman in others). An expensive, uneven AGI may years or decades to self-improve to the point that GWP or energy consumption rises by 30% in a year.
It feels like you are implicitly ascribing 100% probability to this key step.
This is one reason (among others) that I think your probabilities are wildly high. Looking forward to setting up our bet. :)
Thanks for the comment.
I think you’re right that my post neglected to discuss these considerations. On the other hand, my bottom-line probability distribution at the end of the post deliberately has a long tail to take into account delays such as high cost, regulation, fine-tuning, safety evaluation, and so on. For these reasons, I don’t think I’m being too aggressive.
Regarding the point about high cost in particular: it seems unlikely to me that TAI will have a prohibitively high inference cost. As you know, Joseph Carlsmith estimated brain FLOP with a central estimate of 10^15 FLOP/s. This is orders of magnitude higher than the cost of LLMs today, and it would still be comparable to prevailing human wages at current hardware prices. In addition, there are more considerations that push me towards TAI being cheap:
A large fraction of our economy can be automated without physical robots. The relevant brain anchor for intellectual tasks is arguably the cerebral cortex rather than the full human brain. And according to Wikipedia, “There are between 14 and 16 billion neurons in the human cerebral cortex.” It’s not clear to me how many synapses there are in the cerebral cortex, but if the synapse-to-neuron ratio is consistent throughout the brain, then the inference cost of the cerebral cortex is plausibly about 1/5th the inference cost of the whole brain.
The human brain is plausibly undertrained relative to its size, due to evolutionary constraints that push hard against delaying maturity in animals. As a consequence, ML models with brain-level efficiency can probably match human performance at much lower size (and thus, inference cost). I currently expect this consideration to mean that the human brain is 2-10x larger than “necessary”.
The chinchilla scaling laws suggest that inference costs should grow at about half the rate as training costs. This is essentially the dual consideration of the argument I gave in the post about data not being a major bottleneck.
We seem to have a wider range of strategies available for cutting down the cost of inference compared to evolution. I’m not sure about this consideration though.
I’m aware that you have some counterarguments to these points in your own paper, but I haven’t finished reading it yet.