Beware of the new scaling paradigm

This post investigates some of the potential implications of a new scaling paradigm that accompanies the recent release of a series of new models developed by OpenAI.

The paradigm: Scaling inference-time compute can compete with scaling training compute

In short, it might be possible that the next generation of open-weight models unlocks a new level of capabilities that, if used by rogue actors, could enable a major misuse incident. Time is not on our side.

I will first share some passages from OpenAI employees and the company itself, as well as a recent article by Vladimir Nesov. Then, I will share my own thoughts.

General context—OpenAI website

We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.

Noam Brown thread—OpenAI employee

o1 is trained with RL to “think” before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We’re no longer bottlenecked by pretraining. We can now scale inference compute too.

...

@OpenAI’s o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis? AI can be more than chatbots

Vladimir Nesov - LessWrong post

OpenAI o1 demonstrates that a GPT-4 level model can be post-trained into producing useful long horizon reasoning traces.
...

The problem right now is that a new level of pretraining scale is approaching in the coming months, while ability to cheaply apply long horizon reasoning post-training might follow shortly thereafter, possibly unlocked by these very same models at the new level of pretraining scale.

Bogdan Ionut Cirstea—Comment on same LessWrong post

Depending on how far inference scaling laws go, the situation might be worse still. Picture LLama-4-o1 scaffolds that anyone can run for indefinite amounts of time (as long as they have the money/compute) to autonomously do ML research on various ways to improve Llama-4-o1 and open-weights descendants, to potentially be again appliable to autonomous ML research. Fortunately, lack of access to enough quantities of compute for pretraining the next-gen model is probably a barrier for most actors, but this still seems like a pretty (increasingly, with every open-weights improvement) scary situation to be in.

I encourage everyone to read the original tweets and posts I referenced in full to avoid misunderstandings.

Now, there are several implications that I think are worth paying attention to:

Implications for closed-source models

If frontier labs like OpenAI can run an o1-like model for multiple days (which they likely already can), timelines seem to shrink significantly. This is because such a mechanism would likely allow them to greatly increase both the quality and quantity of their research. In fact, AGI (e.g GPT-5 level models in combination with chain of thought reasoning) could already be close to public release.
- “a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem.” [″]
With the new scaling paradigm that o1-like models introduce (where scaling inference-time compute can compete with scaling training compute), it appears that new algorithmic breakthroughs may become (entirely?) irrelevant. The only thing needed to significantly scale capabilities would seem to simply throw more compute and time at a given problem.

Implications for open-source models

But it’s not just the closed-source models we need to think about. The open-source community might be on the brink of accessing these powerful capabilities too. With the next generation of pre-trained models set to be released, there’s a real possibility that open-source models could be significantly enhanced using techniques similar to OpenAI’s o1.

What does this mean? Essentially, anyone with enough computing power could take an open-source model and fine-tune it to perform complex, long-term reasoning tasks—what some call “System 2 thinking.” This could unlock a level of AI capability that was previously out of reach for most people.

Imagine if malicious actors got their hands on these enhanced models. They could use them to develop harmful technologies or coordinate dangerous activities on an unprecedented scale.

The barriers to accessing powerful AI are lowering, and we are not be prepared for the consequences.

What do you make of this?