I think this is pretty cool. Good to see some relevant benchmarks collected in the same place, and I can see how this is handy as a communication tool.
From a quick skim I wasn’t really sure how to interpret the main graph, and there didn’t seem to be an explanation. In particular, the Y axis is a percentage, but a percentage of what? Some of the benchmarks are projected to reach 100% in a year, does that mean you project AI takeover in a year etc?
(Sharing less as ‘please answer my question’ and more as ‘user feedback’—if I’m confused by this, I imagine lots of people who know (even) less than me about AI (safety) will also be confused; though maybe they’re not your target audience)
Hi Jamie, thanks for your comment, glad you like it!
It’s hard to go into this without answering your question anyway a bit, but we appreciate the user feedback too.
We got some quick data on the project yesterday (n=15, tech audience but not xrisk, data here). We asked, among other questions: “In your own words, what is this website tracking or measuring?” Almost everyone gave a correct answer. Also from the other answers, I think the main points get across pretty well, so we’re not really planning to modify too much.
The percentage that you’re asking about (‘Score’) is the amount of questions answered correctly by the AI model in a benchmark (with 1-2 exceptions, we explain these under ‘Benchmarks’). I agree that’s not super clear, I’ve added an issue to Github to explain this a bit better.
Does 100% mean a takeover? Not really. The issue here is that none of us knows at which capabilities threshold a takeover can occur exactly. We don’t have data on takeovers since they haven’t happened yet and the world is complex. ‘Human expert level’ is definitely a relevant boundary to cross, and we have included this in the benchmark plots wherever meaningful (not on the homepage, that would have been too messy).
As we said, we think part of the website’s point is to point to missing pieces of the puzzle. Threat models (AI takeover scenarios) are currently hardly scientifically analysed, and we plan to do research into them this year (Existential Risk Observatory, MIT FutureTech, FLI). Once we have more robust threat models, we should determine which dangerous capabilities have which red lines for each model. Then, we can find out whether current benchmarks can measure those and if so, what the relevant scores are (and if not, build new ones that can). We’d like to work on these projects together with other researchers!
Currently, that work is not done. TakeOverBench is an attempt to shed more light on the matter using the research we have right now. We plan to update it when better research becomes available.
I think this is pretty cool. Good to see some relevant benchmarks collected in the same place, and I can see how this is handy as a communication tool.
From a quick skim I wasn’t really sure how to interpret the main graph, and there didn’t seem to be an explanation. In particular, the Y axis is a percentage, but a percentage of what? Some of the benchmarks are projected to reach 100% in a year, does that mean you project AI takeover in a year etc?
(Sharing less as ‘please answer my question’ and more as ‘user feedback’—if I’m confused by this, I imagine lots of people who know (even) less than me about AI (safety) will also be confused; though maybe they’re not your target audience)
Hi Jamie, thanks for your comment, glad you like it!
It’s hard to go into this without answering your question anyway a bit, but we appreciate the user feedback too.
We got some quick data on the project yesterday (n=15, tech audience but not xrisk, data here). We asked, among other questions: “In your own words, what is this website tracking or measuring?” Almost everyone gave a correct answer. Also from the other answers, I think the main points get across pretty well, so we’re not really planning to modify too much.
The percentage that you’re asking about (‘Score’) is the amount of questions answered correctly by the AI model in a benchmark (with 1-2 exceptions, we explain these under ‘Benchmarks’). I agree that’s not super clear, I’ve added an issue to Github to explain this a bit better.
Does 100% mean a takeover? Not really. The issue here is that none of us knows at which capabilities threshold a takeover can occur exactly. We don’t have data on takeovers since they haven’t happened yet and the world is complex. ‘Human expert level’ is definitely a relevant boundary to cross, and we have included this in the benchmark plots wherever meaningful (not on the homepage, that would have been too messy).
As we said, we think part of the website’s point is to point to missing pieces of the puzzle. Threat models (AI takeover scenarios) are currently hardly scientifically analysed, and we plan to do research into them this year (Existential Risk Observatory, MIT FutureTech, FLI). Once we have more robust threat models, we should determine which dangerous capabilities have which red lines for each model. Then, we can find out whether current benchmarks can measure those and if so, what the relevant scores are (and if not, build new ones that can). We’d like to work on these projects together with other researchers!
Currently, that work is not done. TakeOverBench is an attempt to shed more light on the matter using the research we have right now. We plan to update it when better research becomes available.