I agree that testing it is difficult. I partially addressed this above in the section on “Strategy and Verifiability”.
I would flag that people should arguably be equally suspicious of most humans. As we come up with various tests and evals, I expect that mostly the best AIs will have mediocre results, and most prominent humans will just refuse to be tested (we can still do lighter evals on public intellectuals and such using their available works, but this will be more limited).
Prediction markets seem like a pretty good test to me, though they are only one implementation.
I expect that with decent systems, we should have new “epistemic table stakes” of things like:
Can do forecasting in a wide variety of fields, at least roughly as good as Metaculus forecasters with say 10hrs per question
In extensive simulations, has low amounts of logical inconsistencies
Flags all claims that users might not agree with
Very low rates of hallucinations
Biases have been extensively tested under different situations
Extensive red-teaming by other top AI systems
Predictions of how well this AI will hold up, in comparison to better AI intellectual systems in 10 to 40 years.
Full oversight/visibility of potential conflicts of interest.
(I’m not saying that these systems will be broadly-trusted, just that they will exist. I would expect the smarter people at least to trust them, in accordance to their evals.)
Thanks for raising the concern!
I agree that testing it is difficult. I partially addressed this above in the section on “Strategy and Verifiability”.
I would flag that people should arguably be equally suspicious of most humans. As we come up with various tests and evals, I expect that mostly the best AIs will have mediocre results, and most prominent humans will just refuse to be tested (we can still do lighter evals on public intellectuals and such using their available works, but this will be more limited).
Prediction markets seem like a pretty good test to me, though they are only one implementation.
I expect that with decent systems, we should have new “epistemic table stakes” of things like:
Can do forecasting in a wide variety of fields, at least roughly as good as Metaculus forecasters with say 10hrs per question
In extensive simulations, has low amounts of logical inconsistencies
Flags all claims that users might not agree with
Very low rates of hallucinations
Biases have been extensively tested under different situations
Extensive red-teaming by other top AI systems
Predictions of how well this AI will hold up, in comparison to better AI intellectual systems in 10 to 40 years.
Full oversight/visibility of potential conflicts of interest.
(I’m not saying that these systems will be broadly-trusted, just that they will exist. I would expect the smarter people at least to trust them, in accordance to their evals.)