I think if you surveyed any expert on LLMs and asked them “which was a greater jump in capabilities, Gpt2 to GPT3 or GPT3 to GPT4?” the vast majority would say the former, and I would agree with them. This graph doesn’t capture that, which makes me cautious about overelying on it.
That’s a really broad question though. If you asked something like, which system unlocked the most real-world value in coding, people would probably say the jump to a more recent model like o3-mini or Gemini 2.5
You could similarly argue the jump from infant to toddler is much more profound in terms of general capabilities than college student to phd but the latter is more relevant in terms of unlocking new research tasks that can be done.
I think if you surveyed any expert on LLMs and asked them “which was a greater jump in capabilities, Gpt2 to GPT3 or GPT3 to GPT4?” the vast majority would say the former, and I would agree with them. This graph doesn’t capture that, which makes me cautious about overelying on it.
That’s a really broad question though. If you asked something like, which system unlocked the most real-world value in coding, people would probably say the jump to a more recent model like o3-mini or Gemini 2.5
You could similarly argue the jump from infant to toddler is much more profound in terms of general capabilities than college student to phd but the latter is more relevant in terms of unlocking new research tasks that can be done.