For what it’s worth, I think pre-training alone is probably enough to get us to about 1-3 month time horizons based on a 7 month doubling time, but pre-training data will start to run out in the early 2030s, meaning that you no longer (in the absence of other benchmarks) have very good general proxies of capabilities improvements.
The real issue isn’t the difference between hours and months long tasks, but the difference between months long tasks and century long tasks, which Steve Newman describes well here.
To a large extent, I agree that RL scaling is basically just inference scaling for the most part, but I disagree with this claim immensely, and this causes me to have different expectations of AI progress over the next 4-6 years (but agree in the longer term, absent new paradigms inference scaling will be more important and AI progress will slow back down to the prior compute trend of 1.55x efficiency per year, rather than getting 3-4x more compute every year):
> In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction (pre-training) stalled out.
Vladimir Nesov explains why here in more detail, but the issue here is that the scaling laws were already fairly weak (and probably closer to logarithmic returns than linear returns, meaning that the compute increase from GPT-4 to GPT-4.5 was much closer to 10x than 100x, which means it’s not surprising that people were disappointed in AI progress, since GPT-3 to GPT-4 type progress required 100x compute that will only come online this year and in 2028 and 2030), so we have little evidence that returns have recently gotten worse, especially in a way that suggests that pre-training has stalled.
I think this post is much better viewed as evidence that pre-training isn’t dead, it’s just resting, and that RL will in the near-term account for way less AI progress than pre-training, and that the big scale up of RLVR in 2025-2027 is much more of a one-time boost than a second trend that can progress independently of pre-training.