I disagree with the claim that pre-training returns declined much
Could you elaborate or link to somewhere where someone makes this argument? I’m curious to see if a strong defense can be made of self-supervised pre-training of LLMs continuing to scale and deliver worthwhile, significant benefits.
I currently can’t find a source, but to elaborate a little bit, my reason for thinking this is that the GPT-4 to GPT-4.5 scaleup used 15x the compute instead of 100x the compute, and I remember that 10x compute is enough to be competitive with the current algorithmic improvements that don’t involve scaling up models, whereas 100x compute increases result in the wow moments we associated with GPT-3 to GPT-4, and the GPT-5 release was not a scale up of compute, but instead productionizing GPT-4.5.
I’m more in the camp of “I find little reason to believe that pre-training returns have declined” here.
I’ll just mention that, for what it’s worth, the AI researcher and former OpenAI Chief Scientist Ilya Sutskever thinks the scaling of pre-training for LLMs has run out of steam. Dario Amodei, the CEO of Anthropic, has also said things that seem to indicate the scaling of pre-training no longer has the importance it once did.
Other evidence would be reporters talking to anonymous engineers inside OpenAI and Meta who have expressed disappointment with the results of scaling pre-training. Toby mentioned this in another blog post and I quoted the relevant paragraph in a comment here.
Could you elaborate or link to somewhere where someone makes this argument? I’m curious to see if a strong defense can be made of self-supervised pre-training of LLMs continuing to scale and deliver worthwhile, significant benefits.
I currently can’t find a source, but to elaborate a little bit, my reason for thinking this is that the GPT-4 to GPT-4.5 scaleup used 15x the compute instead of 100x the compute, and I remember that 10x compute is enough to be competitive with the current algorithmic improvements that don’t involve scaling up models, whereas 100x compute increases result in the wow moments we associated with GPT-3 to GPT-4, and the GPT-5 release was not a scale up of compute, but instead productionizing GPT-4.5.
I’m more in the camp of “I find little reason to believe that pre-training returns have declined” here.
I’ll just mention that, for what it’s worth, the AI researcher and former OpenAI Chief Scientist Ilya Sutskever thinks the scaling of pre-training for LLMs has run out of steam. Dario Amodei, the CEO of Anthropic, has also said things that seem to indicate the scaling of pre-training no longer has the importance it once did.
Other evidence would be reporters talking to anonymous engineers inside OpenAI and Meta who have expressed disappointment with the results of scaling pre-training. Toby mentioned this in another blog post and I quoted the relevant paragraph in a comment here.