Thanks. From an x-risk perspective, I don’ think deployment or >30% GDP growth are that important or necessary to be concerned with. A 10^29 FLOP training run is an x-risk itself in terms of takeover risk from inner misalignment during training, fine-tuning or evals (lab leak risk).
new hardware would need to be built or repurposed
In terms of physical takeover, repurposing via hacking is one vector a rogue AI may use.
The additional considerations of regulation and the lack of general-purpose robotics pushes my credence in very short timelines to very low levels
Regulation—hopefully this will slow things down, but for the sake of argument (i.e. in order to argue for regulation) it’s best to not incorporate it into this analysis. The worry is what could happen if we don’t have urgent global regulation!
I think the 10^22 FLOP/s figure is really conservative.
Yes, although as mentioned there are private sources of compute too (e.g. Google’s TPUs). But I guess those don’t amount 10^22 FLOP.
maybe 10^21 FLOP/s for the US government
What does the NSA have access to already?
A 10^29 FLOP training run within 4 years doesn’t seem plausible to me even with US government-levels of spending, simply because we need to greatly expand GPU production before that becomes a realistic option.
Epoch says on your trends page: “4.2x/year growth rate in the training compute of milestone systems”. Do you expect this trend to break in less than 4 years? AMD are bringing out new AI GPUs (MI300X) to compete with NVidia later this year. There might be a temporary shortage now, but is it really likely to last for years?
Following the Epoch trends, given 2x10^25 FLOP in 2022 (GPT-4), that’s 2.6x10^28 in 2027, a factor of 4 away from 10^29. Factor in algorithmic improvements and that’s 10^29 or even 10^30. To me it seems that 10^29 in 4 years (2027) is well within reach even for business as usual.
If [algorithmic] progress were caused by algorithmic experimentation, then there’s no mechanism by which it would scale with training runs independent of a simultaneous scale-up of experiments.
As I say in my footnote 6 above: algorithmic progress would be more or less immediately unlocked by (use of) the powerful new AI trained with 10^28 FLOP.
it seems more likely to me that actors will gradually scale up their compute budgets over the next few years to avoid wasting a lot of money on a single botched training run (though by “gradual” I mean something closer to 1 OOM per year than 0.2 OOMs per year).
1 OOM per year (as per Epoch trends, inc. algorithmic improvement) is 10^29 in 2027. I think another factor to consider is that given we are within grasping distance of AGI, people (who are naive to the x-risk) will be throwing everything they’ve got at crossing the line as fast as they can, so we should maybe expect an acceleration because of this.
A 10^29 FLOP training run is an x-risk itself in terms of takeover risk from inner misalignment during training, fine-tuning or evals (lab leak risk).
I’m not convinced that AI lab leaks are significant sources of x-risks, but I can understand your frustration with my predictions if you disagree with that. In the post, I mentioned that I disagree with hard takeoff models of AI, which might explain our disagreement.
Regulation—hopefully this will slow things down, but for the sake of argument (i.e. in order to argue for regulation) it’s best to not incorporate it into this analysis.
I’m not sure about that. It seems like you might still want to factor in the effects of regulation into your analysis even if you’re arguing for regulation. But even so, I’m not trying to make an argument for regulation in this post. I’m just trying to predict the future.
I think this result is very interesting, but my impression is that the result is generally in line with the slow progress we’ve seen over the last 10 years. I’ll be more impressed when I start seeing results that work well in a diverse array of settings, across multiple different types of tasks, with no guarantees about the environment, at speeds comparable to human workers, and with high reliability. I currently don’t expect results like that until the end of the 2020s.
Epoch says on your trends page: “4.2x/year growth rate in the training compute of milestone systems”. Do you expect this trend to break in less than 4 years?
I wouldn’t be very surprised if the 4.2x/year trend continued for another 4 years, although I expect it to slow down some time before 2030, especially for the largest training run. If it became obvious that the trend was not slowing down, I would very likely update towards shorter timelines. Indeed, I believe the trend from 2010-2015 was a bit faster than from 2015-2022, and from 2017-2023 the largest training run (according to Epoch data) went from ~3*10^23 FLOP to ~2*10^25 FLOP, which was only an increase of about 0.33 OOMs per year.
1 OOM per year (as per Epoch trends, inc. algorithmic improvement) is 10^29 in 2027.
But I was talking about physical FLOP in the comment above. My median for the amount of FLOP required to train TAI is closer to 10^32 FLOP using 2023 algorithms, which was defined in the post in a specific way. Given this median, I agree there is a small (perhaps 15% chance) that TAI can be trained at 10^29 2023 FLOP, which means I think there’s a non-negligible chance that TAI could be trained in 2027. However, I expect the actual explosive growth part to happen at least a year later, though, for reasons I outlined above.
I’ll be more impressed when I start seeing results that work well in a diverse array of settings, across multiple different types of tasks, with no guarantees about the environment, at speeds comparable to human workers, and with high reliability. I currently don’t expect results like that until the end of the 2020s.
Overall, I think Epoch should be taking more of a security mindset approach with regard to x-risk in it’s projections. i.e. not just being conservative in a classic academic prestige-seeking sense, but looking at ways that uncontrollable AGI could be built soon (or at least sooner) that break past trends and safe assumptions, and raising the alarm where appropriate.
I can only speak for myself here, not Epoch, but I don’t believe in using the security mindset when making predictions. I also dispute the suggestion that I’m trying to be conservative in a classic academic prestige-seeking sense. My predictions are simply based on what I think is actually true, to the best of my abilities.
Fair enough. I do say “projection” rather than prediction though—meaning that I think a range of possible futures should be looked at (keeping in mind a goal of working to minimise existential risk), rather than just trying to predict what’s most likely to happen or “on the mainline”.
Thanks. From an x-risk perspective, I don’ think deployment or >30% GDP growth are that important or necessary to be concerned with. A 10^29 FLOP training run is an x-risk itself in terms of takeover risk from inner misalignment during training, fine-tuning or evals (lab leak risk).
In terms of physical takeover, repurposing via hacking is one vector a rogue AI may use.
Regulation—hopefully this will slow things down, but for the sake of argument (i.e. in order to argue for regulation) it’s best to not incorporate it into this analysis. The worry is what could happen if we don’t have urgent global regulation!
General-purpose robotics—what do you make of Google’s recent progress (e.g. robots learning all the classic soccer tactics and skills from scratch)? And the possibility of Gemini being able to control robots? [mentioned in my footnote 4 above]
Yes, although as mentioned there are private sources of compute too (e.g. Google’s TPUs). But I guess those don’t amount 10^22 FLOP.
What does the NSA have access to already?
Epoch says on your trends page: “4.2x/year growth rate in the training compute of milestone systems”. Do you expect this trend to break in less than 4 years? AMD are bringing out new AI GPUs (MI300X) to compete with NVidia later this year. There might be a temporary shortage now, but is it really likely to last for years?
Following the Epoch trends, given 2x10^25 FLOP in 2022 (GPT-4), that’s 2.6x10^28 in 2027, a factor of 4 away from 10^29. Factor in algorithmic improvements and that’s 10^29 or even 10^30. To me it seems that 10^29 in 4 years (2027) is well within reach even for business as usual.
As I say in my footnote 6 above: algorithmic progress would be more or less immediately unlocked by (use of) the powerful new AI trained with 10^28 FLOP.
1 OOM per year (as per Epoch trends, inc. algorithmic improvement) is 10^29 in 2027. I think another factor to consider is that given we are within grasping distance of AGI, people (who are naive to the x-risk) will be throwing everything they’ve got at crossing the line as fast as they can, so we should maybe expect an acceleration because of this.
I’m not convinced that AI lab leaks are significant sources of x-risks, but I can understand your frustration with my predictions if you disagree with that. In the post, I mentioned that I disagree with hard takeoff models of AI, which might explain our disagreement.
I’m not sure about that. It seems like you might still want to factor in the effects of regulation into your analysis even if you’re arguing for regulation. But even so, I’m not trying to make an argument for regulation in this post. I’m just trying to predict the future.
I think this result is very interesting, but my impression is that the result is generally in line with the slow progress we’ve seen over the last 10 years. I’ll be more impressed when I start seeing results that work well in a diverse array of settings, across multiple different types of tasks, with no guarantees about the environment, at speeds comparable to human workers, and with high reliability. I currently don’t expect results like that until the end of the 2020s.
I wouldn’t be very surprised if the 4.2x/year trend continued for another 4 years, although I expect it to slow down some time before 2030, especially for the largest training run. If it became obvious that the trend was not slowing down, I would very likely update towards shorter timelines. Indeed, I believe the trend from 2010-2015 was a bit faster than from 2015-2022, and from 2017-2023 the largest training run (according to Epoch data) went from ~3*10^23 FLOP to ~2*10^25 FLOP, which was only an increase of about 0.33 OOMs per year.
But I was talking about physical FLOP in the comment above. My median for the amount of FLOP required to train TAI is closer to 10^32 FLOP using 2023 algorithms, which was defined in the post in a specific way. Given this median, I agree there is a small (perhaps 15% chance) that TAI can be trained at 10^29 2023 FLOP, which means I think there’s a non-negligible chance that TAI could be trained in 2027. However, I expect the actual explosive growth part to happen at least a year later, though, for reasons I outlined above.
Let’s see what happens with the Gemini release.
Overall, I think Epoch should be taking more of a security mindset approach with regard to x-risk in it’s projections. i.e. not just being conservative in a classic academic prestige-seeking sense, but looking at ways that uncontrollable AGI could be built soon (or at least sooner) that break past trends and safe assumptions, and raising the alarm where appropriate.
I can only speak for myself here, not Epoch, but I don’t believe in using the security mindset when making predictions. I also dispute the suggestion that I’m trying to be conservative in a classic academic prestige-seeking sense. My predictions are simply based on what I think is actually true, to the best of my abilities.
Fair enough. I do say “projection” rather than prediction though—meaning that I think a range of possible futures should be looked at (keeping in mind a goal of working to minimise existential risk), rather than just trying to predict what’s most likely to happen or “on the mainline”.