Marcel2 comments on On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

Marcel2 18 Jun 2024 13:44 UTC
2 points
0 ∶ 0
I almost clarified that I know some models technically are multi-modal, but my impression is that the visual reasoning abilities of the current models are very limited, so I’m not at all surprised they’re limited. Among other illustrations of this impression, occasionally I’ve found they struggle to properly describe what is happening in an image beyond a relatively general level.
- mlsbt 18 Jun 2024 13:49 UTC
  1 point
  0 ∶ 0
  Parent
  Looking forward to seeing the ARC performance of future multimodal models. I’m also going to try to think of a text-based ARC analog, that is perhaps more general. There are only so many unique simple 2D-grid transformation rules so it can be brute forced to some extent.