[Linkpost] The Problem With The Current State of AGI Definitions

This is a linkpost for https://​​www.lesswrong.com/​​posts/​​EpR5yTZMaJkDz4hhs/​​the-problem-with-the-current-state-of-agi-definitions

Key points:

If we can’t agree on what we mean when we say “AGI,” debating AGI timelines becomes meaningless exchanges of words, with neither side understanding the other.....As such, I would like to suggest a set of standardized, testable definitions for talking about AGI.

Such a list surprisingly does not exist, as far as I can tell, and almost all testable “AI Benchmarks,” with the possible exception of the Turing Test, were not designed to screen for actual AGI.

I present a draft version of such a list, based on my admittedly layman’s understanding of the field, and request help from an actual AI researcher to get a more polished version formally published in a scientific journal.[1] If anyone here would be willing to partner with me to make that happen, please let me know!

Partial List of (Mostly Testable) AGI Definitions

  • “Nano AGI” — Qualifies if it can perform above random chance (at a statistically significant level) on a multiple choice test found online[3] it was not explicitly trained on.

  • “Micro AGI” — Qualifies if it can reach either State of The Art (SOTA) or human-level on two or more AI benchmarks which have been mentioned in 10+ papers published in the past year,[4] and which were not explicitly present in its training data.

  • “Yitzian AGI” — Qualifies if it can perform at the level of an average human or above on multiple (2+) tests which were originally designed for humans, and which were not explicitly present in its training data.[5]

  • “OG Turing[6] AGI” — Qualifies if it can “pass” as a woman in a chat room (with a non-expert tester) for ten minutes, with a success rate higher than a randomly selected cisgender American male.

  • “Weak Turing AGI” — Qualifies if it can pass a 10-minute text-based Turing test where the judges are randomly selected Americans.

  • “Standard Turing AGI” — Qualifies if it can reliably pass a Turing test of the type that would win the Loebner Silver Prize.

  • “Gold Turing AGI” — Qualifies if it can reliably pass a 2-hour Turing test of the type that would win the Loebner Gold Prize.

  • “Truck AGI” — Qualifies if it can successfully drive a truck from the East Coast to the West Coast of America.[7]

  • “Book AGI” — Qualifies if it can write a 200+ page book (using a one-paragraph-or-less prompt) which makes it to the New York Times Bestseller list.[7]

  • “IMO AGI” — Qualifies if it can pass the IMO Grand Challenge.[8]

  • Anthonion[9] AGI” — Qualifies if it is A) Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize, B) Able to score 90% or more on a robust version of the Winograd Schema Challenge (e.g. the “Winogrande” challenge or comparable data set for which human performance is at 90+%), C) Able to score 75th percentile (as compared to the corresponding year’s human students) on the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages and having less than ten SAT exams as part of the training data, D) Able to learn the classic Atari game “Montezuma’s revenge” (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play.[9]

  • “Barnettian[10] AGI” — Qualifies if it is A) Able to reliably pass a 2-hour, adversarial Turing test[11] during which the participants can send text, images, and audio files during the course of their conversation, B) Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a[12] circa-2021 Ferrari 312 T4 1:8 scale automobile model, C) Achieve at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al., D) Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al.[13]

  • “Lawyer AGI” — Qualifies if it can win a formal court case against a human lawyer, where it is not obvious how the case will resolve beforehand.[14]

  • “Lisy-Dusanian[15] AGI” — Qualifies if it can replace more than half of all jobs humans can currently do.

  • “Lisy-Dusanian+[15] AGI” — Qualifies if it can replace all jobs humans can currently do in a cost-effective manner.

  • “Hyperhuman AGI” — Qualifies if there is nothing any human can do (using a computer) that it cannot do.

  • “Kurzweilian[16] AGI” — Qualifies if it “could successfully perform any intellectual task that a human being can.”[17]

  • “Impossible AGI” — never qualifies; no silicon-based intelligence will ever be truly general enough.

As for my personal opinion, I think that all of these definitions are far from perfect. If we set a definitional standard for AGI that we ourselves cannot meet, then such a definition is clearly too narrow. A plausible definition of “general intelligence” must include the vast majority of humans, unless you’re feeling incredibly solipsistic. But yet almost all of the above tests (with the exception of Turing’s) cannot be passed by the vast majority of humans alive! Clearly, our current tests are too exclusionary, and I would like to see an effort to create a “maximally inclusive test” for general intelligence which the majority of humans would be able to pass. Is Turing’s criteria as inclusive as we can go, or is it possible to improve it further without including clearly non-intelligent entities as well? I hope this post will encourage further thought on the matter, if nothing else.

  1. ^

    The reason I explicitly want a future version of this list published in a peer-reviewed journal is because A) this would allow future papers to cite the definitions easily, and, perhaps more importantly, B) Wikipedia doesn’t generally allow citations from self-published sources (and even peer-reviewed citations should optimally be secondary, rather than primary sources), so the more information we can get in peer-reviewed journals, the easier it becomes to share that information on one of the most viewed websites in the world.

No comments.