mlsbt comments on On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

mlsbt 18 Jun 2024 13:38 UTC
1 point
0 ∶ 0
I used GPT-4o which is multimodal (and in fact was even trained on these images in particular as I took the examples from the ARC website, not the Github). I did test more grid inputs and it wasn’t perfect at ‘visualizing’ them.
- Marcel D 18 Jun 2024 13:44 UTC
  2 points
  0 ∶ 0
  Parent
  I almost clarified that I know some models technically are multi-modal, but my impression is that the visual reasoning abilities of the current models are very limited, so I’m not at all surprised they’re limited. Among other illustrations of this impression, occasionally I’ve found they struggle to properly describe what is happening in an image beyond a relatively general level.
  - mlsbt 18 Jun 2024 13:49 UTC
    1 point
    0 ∶ 0
    Parent
    Looking forward to seeing the ARC performance of future multimodal models. I’m also going to try to think of a text-based ARC analog, that is perhaps more general. There are only so many unique simple 2D-grid transformation rules so it can be brute forced to some extent.