I think Ryan’s solution shows that the intelligence is coming for him, and not from Chat-GPT4o.
If this is true, then substituting in a less capable model should have equally good results; would you predict that to be the case? I claim that plugging in an older/smaller model would produce much worse results, and if that’s the case then we should consider a substantial part of the performance to be coming from the model.
This is what Chollet is talking about in the podcast when he says...‘I’m pretty skeptical that we’re going to see an LLM do 80% in a year. That said, if we do see it, you would also have to look at how this was achieved.’
This seems to me to be Chollet trying to have it both ways. Either a) ARC is an important measure of ‘true’ intelligence (or at least of the ability to reason over novel problems), and so we should consider LLMs’ poor performance on it a sign that they’re not general intelligence, or b) ARC isn’t a very good measure of true intelligence, in which case LLMs’ performance on it isn’t very important. Those can’t be simultaneously true. I think that nearly everywhere but in the quote, Chollet has claimed (and continues to claim) that a) is true.
If this is true, then substituting in a less capable model should have equally good results; would you predict that to be the case? I claim that plugging in an older/smaller model would produce much worse results, and if that’s the case then we should consider a substantial part of the performance to be coming from the model.
This seems to me to be Chollet trying to have it both ways. Either a) ARC is an important measure of ‘true’ intelligence (or at least of the ability to reason over novel problems), and so we should consider LLMs’ poor performance on it a sign that they’re not general intelligence, or b) ARC isn’t a very good measure of true intelligence, in which case LLMs’ performance on it isn’t very important. Those can’t be simultaneously true. I think that nearly everywhere but in the quote, Chollet has claimed (and continues to claim) that a) is true.