calebp comments on Three polls: on timelines and cause prio

calebp 28 Apr 2025 19:06 UTC
4 points
2 ∶ 0
AGI by 2028 is more likely than not

Most of my uncertainty is from potentially not understanding the criteria. They seem extremely weak to me:
- * Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize.
- Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the “Winogrande” challenge or comparable data set for which human performance is at 90+%
- Be able to score 75th percentile (as compared to the corresponding year’s human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages.
- Be able to learn the classic Atari game “Montezuma’s revenge” (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)
I wouldn’t be surprised if we’ve already passed this.
- emre kaplan🔸 29 Apr 2025 19:52 UTC
  2 points
  0 ∶ 0
  Parent
  I don’t think the current systems are able to pass the Turing test yet. Quoting from Metaculus admins:
  “Given evidence from previous Loebner prize transcripts – specifically that the chatbots were asked Winograd schema questions – we interpret the Loebner silver criteria to be an adversarial test conducted by reasonably well informed judges, as opposed to one featuring judges with no or very little domain knowledge.”
  - calebp 29 Apr 2025 20:15 UTC
    2 points
    0 ∶ 0
    Parent
    I’d bet that current models with less than $ 100,000 of post-training enhancements achieve median human performance on this task.
    Seems plausible the metaculus judges would agree, especially given that that comment is quite old.