Be able to score 75th percentile (as compared to the corresponding year’s human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages.
Be able to learn the classic Atari game “Montezuma’s revenge” (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)
I wouldn’t be surprised if we’ve already passed this.
I don’t think the current systems are able to pass the Turing test yet. Quoting from Metaculus admins:
“Given evidence from previous Loebner prize transcripts – specifically that the chatbots were asked Winograd schema questions – we interpret the Loebner silver criteria to be an adversarial test conducted by reasonably well informed judges, as opposed to one featuring judges with no or very little domain knowledge.”
Most of my uncertainty is from potentially not understanding the criteria. They seem extremely weak to me:
I wouldn’t be surprised if we’ve already passed this.
I don’t think the current systems are able to pass the Turing test yet. Quoting from Metaculus admins:
“Given evidence from previous Loebner prize transcripts – specifically that the chatbots were asked Winograd schema questions – we interpret the Loebner silver criteria to be an adversarial test conducted by reasonably well informed judges, as opposed to one featuring judges with no or very little domain knowledge.”
I’d bet that current models with less than $ 100,000 of post-training enhancements achieve median human performance on this task.
Seems plausible the metaculus judges would agree, especially given that that comment is quite old.