Hi Jamie, thanks for your comment, glad you like it!
It’s hard to go into this without answering your question anyway a bit, but we appreciate the user feedback too.
We got some quick data on the project yesterday (n=15, tech audience but not xrisk, data here). We asked, among other questions: “In your own words, what is this website tracking or measuring?” Almost everyone gave a correct answer. Also from the other answers, I think the main points get across pretty well, so we’re not really planning to modify too much.
The percentage that you’re asking about (‘Score’) is the amount of questions answered correctly by the AI model in a benchmark (with 1-2 exceptions, we explain these under ‘Benchmarks’). I agree that’s not super clear, I’ve added an issue to Github to explain this a bit better.
Does 100% mean a takeover? Not really. The issue here is that none of us knows at which capabilities threshold a takeover can occur exactly. We don’t have data on takeovers since they haven’t happened yet and the world is complex. ‘Human expert level’ is definitely a relevant boundary to cross, and we have included this in the benchmark plots wherever meaningful (not on the homepage, that would have been too messy).
As we said, we think part of the website’s point is to point to missing pieces of the puzzle. Threat models (AI takeover scenarios) are currently hardly scientifically analysed, and we plan to do research into them this year (Existential Risk Observatory, MIT FutureTech, FLI). Once we have more robust threat models, we should determine which dangerous capabilities have which red lines for each model. Then, we can find out whether current benchmarks can measure those and if so, what the relevant scores are (and if not, build new ones that can). We’d like to work on these projects together with other researchers!
Currently, that work is not done. TakeOverBench is an attempt to shed more light on the matter using the research we have right now. We plan to update it when better research becomes available.
Hi Jamie, thanks for your comment, glad you like it!
It’s hard to go into this without answering your question anyway a bit, but we appreciate the user feedback too.
We got some quick data on the project yesterday (n=15, tech audience but not xrisk, data here). We asked, among other questions: “In your own words, what is this website tracking or measuring?” Almost everyone gave a correct answer. Also from the other answers, I think the main points get across pretty well, so we’re not really planning to modify too much.
The percentage that you’re asking about (‘Score’) is the amount of questions answered correctly by the AI model in a benchmark (with 1-2 exceptions, we explain these under ‘Benchmarks’). I agree that’s not super clear, I’ve added an issue to Github to explain this a bit better.
Does 100% mean a takeover? Not really. The issue here is that none of us knows at which capabilities threshold a takeover can occur exactly. We don’t have data on takeovers since they haven’t happened yet and the world is complex. ‘Human expert level’ is definitely a relevant boundary to cross, and we have included this in the benchmark plots wherever meaningful (not on the homepage, that would have been too messy).
As we said, we think part of the website’s point is to point to missing pieces of the puzzle. Threat models (AI takeover scenarios) are currently hardly scientifically analysed, and we plan to do research into them this year (Existential Risk Observatory, MIT FutureTech, FLI). Once we have more robust threat models, we should determine which dangerous capabilities have which red lines for each model. Then, we can find out whether current benchmarks can measure those and if so, what the relevant scores are (and if not, build new ones that can). We’d like to work on these projects together with other researchers!
Currently, that work is not done. TakeOverBench is an attempt to shed more light on the matter using the research we have right now. We plan to update it when better research becomes available.