Pretty sure o1 and Gemini have access to the internet.
The main way it’s potentially misleading is that it’s not a log plot (most benchmark results will look like exponentials on a linear scale) – however, I expect Deep Research would still seem above trend even if it was. I also think it’s helpful to new readers to see some of the charts on linear scales, since in some ways it’s more intuitive.
While you can use o1 and gemini with internet access, I think they almost certainly evaluated it without such access (see the original paper here).
I really really do not think you should put the plot there. It’s like comparing two different students performance except one of them has access to the internet. I think it’s extremely misleading. If you want to illustrate progress you could just use the FrontierMath/GPQA results or even ARC-AGI.
Pretty sure o1 and Gemini have access to the internet.
The main way it’s potentially misleading is that it’s not a log plot (most benchmark results will look like exponentials on a linear scale) – however, I expect Deep Research would still seem above trend even if it was. I also think it’s helpful to new readers to see some of the charts on linear scales, since in some ways it’s more intuitive.
While you can use o1 and gemini with internet access, I think they almost certainly evaluated it without such access (see the original paper here).
I really really do not think you should put the plot there. It’s like comparing two different students performance except one of them has access to the internet. I think it’s extremely misleading. If you want to illustrate progress you could just use the FrontierMath/GPQA results or even ARC-AGI.
Thanks this is helpful.
(Just adding the FrontierMath/GPQA and ARC-AGI charts you mentioned for my own benefit, and others)