One question: why are most of the SOTA models Claude? Is it because Anthropic is the company that releases the most data about their models? I thought that by most measures, Gemini would be the SOTA model today.
To the largest degree possible we have collected data from public leader boards or system cards and it seems to be the case that Gemini models are a bit underrepresented. I am not sure why that is, but that Anthropic releases more data is definitely part of it. For example the most updated data points from CyBench come from the Claude (and Grok) system cards, and for the virology test there are data points in the system card of Opus 4.5 but not for Gemini 3.0 Pro.
Cool website!
One question: why are most of the SOTA models Claude? Is it because Anthropic is the company that releases the most data about their models? I thought that by most measures, Gemini would be the SOTA model today.
Thanks!
To the largest degree possible we have collected data from public leader boards or system cards and it seems to be the case that Gemini models are a bit underrepresented. I am not sure why that is, but that Anthropic releases more data is definitely part of it. For example the most updated data points from CyBench come from the Claude (and Grok) system cards, and for the virology test there are data points in the system card of Opus 4.5 but not for Gemini 3.0 Pro.