Tom Davidson’s model is often referred to in the Community, but it is entirely reliant on the current paradigm + scale reaching AGI.
This seems wrong.
It does use constants from the historical deep learning field to provide guesses for parameters and it assumes that compute is an important driver of AI progress.
These are much weaker assumptions than you seem to be implying.
Note also that this work is based on earlier work like bio anchors which was done just as the current paradigm and scaling were being established. (It was published in the same year as Kaplan et al.)
I don’t recall the details of Tom Davidson’s model, but I’m pretty familiar with Ajeya’s bio-anchors report, and I definitely think that if you make an assumption “algorithmic breakthroughs are needed to get TAI”, then there really isn’t much left of the bio-anchors report at all. (…although there are still some interesting ideas and calculations that can be salvaged from the rubble.)
See also here (search for “breakthrough”) where Ajeya is very clear in an interview that she views algorithmic breakthroughs as unnecessary for TAI, and that she deliberately did not include the possibility of algorithmic breakthroughs in her bio-anchors model (…and therefore she views the possibility of breakthroughs as a pro tanto reason to think that her report’s timelines are too long).
OK, well, I actually agree with Ajeya that algorithmic breakthroughs are not strictly required for TAI, in the narrow sense that her Evolution Anchor (i.e., recapitulating the process of animal evolution in a computer simulation) really would work given infinite compute and infinite runtime and no additional algorithmic insights. (In other words, if you do a giant outer-loop search over the space of all possible algorithms, then you’ll find TAI eventually.) But I think that’s really leaning hard on the assumption of truly astronomical quantities of compute [or equivalent via incremental improvements in algorithmic efficiency] being available in like 2100 or whatever, as nostalgebraist points out. I think that assumption is dubious, or at least it’s moot—I think we’ll get the algorithmic breakthroughs far earlier than anyone would or could do that kind of insane brute force approach.
I agree that these models assume something like “large discontinuous algorithmic breakthroughs aren’t needed to reach AGI”.
(But incremental advances which are ultimately quite large in aggregate and which broadly follow long running trends are consistent.)
However, I interpreted “current paradigm + scale” in the original post as “the current paradigm of scaling up LLMs and semi-supervised pretraining”. (E.g., not accounting for totally new RL schemes or wildly different architectures trained with different learning algorithms which I think are accounted for in this model.)
In this framework, AGI is developed by improving and scaling up approaches within the current ML paradigm, not by discovering new algorithmic paradigms.
So the kinda zoomed out idea behind the Compute-centric framwork is that I’m assuming something like the current paradigm is going to lead to human-level AI and further, and I’m assuming that we get there by scaling up and improving the current algorithmic approaches. So it’s going to look like better versions of transformers that are more efficient and that allow for larger context windows...”
Both of these seem to be pretty scaling-maximalist to me, so I don’t think the quote seems wrong, at least to me? It’d be pretty hard to make a model which includes the possibility of the paradigm not getting us to AGI and then needing a period of exploration across the field to find the other breakthroughs needed.
This seems wrong.
It does use constants from the historical deep learning field to provide guesses for parameters and it assumes that compute is an important driver of AI progress.
These are much weaker assumptions than you seem to be implying.
Note also that this work is based on earlier work like bio anchors which was done just as the current paradigm and scaling were being established. (It was published in the same year as Kaplan et al.)
I don’t recall the details of Tom Davidson’s model, but I’m pretty familiar with Ajeya’s bio-anchors report, and I definitely think that if you make an assumption “algorithmic breakthroughs are needed to get TAI”, then there really isn’t much left of the bio-anchors report at all. (…although there are still some interesting ideas and calculations that can be salvaged from the rubble.)
I went through how the bio-anchors report looks if you hold a strong algorithmic-breakthrough-centric perspective in my 2021 post Brain-inspired AGI and the “lifetime anchor”.
See also here (search for “breakthrough”) where Ajeya is very clear in an interview that she views algorithmic breakthroughs as unnecessary for TAI, and that she deliberately did not include the possibility of algorithmic breakthroughs in her bio-anchors model (…and therefore she views the possibility of breakthroughs as a pro tanto reason to think that her report’s timelines are too long).
OK, well, I actually agree with Ajeya that algorithmic breakthroughs are not strictly required for TAI, in the narrow sense that her Evolution Anchor (i.e., recapitulating the process of animal evolution in a computer simulation) really would work given infinite compute and infinite runtime and no additional algorithmic insights. (In other words, if you do a giant outer-loop search over the space of all possible algorithms, then you’ll find TAI eventually.) But I think that’s really leaning hard on the assumption of truly astronomical quantities of compute [or equivalent via incremental improvements in algorithmic efficiency] being available in like 2100 or whatever, as nostalgebraist points out. I think that assumption is dubious, or at least it’s moot—I think we’ll get the algorithmic breakthroughs far earlier than anyone would or could do that kind of insane brute force approach.
I agree that these models assume something like “large discontinuous algorithmic breakthroughs aren’t needed to reach AGI”.
(But incremental advances which are ultimately quite large in aggregate and which broadly follow long running trends are consistent.)
However, I interpreted “current paradigm + scale” in the original post as “the current paradigm of scaling up LLMs and semi-supervised pretraining”. (E.g., not accounting for totally new RL schemes or wildly different architectures trained with different learning algorithms which I think are accounted for in this model.)
From the summary page on Open Phil:
From this presentation about it to GovAI (from April 2023) at 05:10:
Both of these seem to be pretty scaling-maximalist to me, so I don’t think the quote seems wrong, at least to me? It’d be pretty hard to make a model which includes the possibility of the paradigm not getting us to AGI and then needing a period of exploration across the field to find the other breakthroughs needed.