Ryan Greenblatt comments on On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

Ryan Greenblatt 26 Jun 2024 17:30 UTC
9 points
0 ∶ 0
Sure, maybe in a few months we’ll see the top score on the ARC Challenge above 85%, but could such a model work in the real world?
It sound like you agree with my claims that ARC-AGI isn’t that likely to track progress and that other benchmarks could work better?
(The rest of your response seemed to imply something different.)
- JWS 🔸 28 Jun 2024 12:07 UTC
  3 points
  0 ∶ 0
  Parent
  At the moment I think ARC-AGI does a good job at showing the limitations of transformer models on simple tasks that they don’t come across in their training set. I think if the score was claimed, we’d want to see how it came about. It might be through frontier models demonstrating true understanding, but it might through shortcut learning/data leakage/impressive but overly specific and intuitively unsatisfying solution.
  If ARC-AGI were to be broken (within the constraints Chollet and Knoop place on it) I’d definitely change my opinions, but what they’d change to depends on the matter of how ARC-AGI was solved. That’s all I’m trying to say in that section (perhaps poorly)