I think Ryan’s solution shows that the intelligence is coming for him, and not from Chat-GPT4o.
If this is true, then substituting in a less capable model should have equally good results; would you predict that to be the case? I claim that plugging in an older/smaller model would produce much worse results, and if that’s the case then we should consider a substantial part of the performance to be coming from the model.
This is what Chollet is talking about in the podcast when he says...‘I’m pretty skeptical that we’re going to see an LLM do 80% in a year. That said, if we do see it, you would also have to look at how this was achieved.’
This seems to me to be Chollet trying to have it both ways. Either a) ARC is an important measure of ‘true’ intelligence (or at least of the ability to reason over novel problems), and so we should consider LLMs’ poor performance on it a sign that they’re not general intelligence, or b) ARC isn’t a very good measure of true intelligence, in which case LLMs’ performance on it isn’t very important. Those can’t be simultaneously true. I think that nearly everywhere but in the quote, Chollet has claimed (and continues to claim) that a) is true.
From my perspective as a researcher not involved with fieldbuilding, this post misses an important distinction. I do occasionally suggest that new people take a BlueDot course (or apply to AI Safety Camp, or SPAR, or one of the other excellent programs out there), but far more often than that I point new people to the BlueDot curriculum. I commonly see others doing the same; I think it’s become the default AIS 101 reading. Maybe you’re mistaking that for people pushing the BlueDot course on everyone new to the field?
As a more general and perhaps contrarian pushback: AI safety (other than governance) isn’t at all a local problem, and so there’s no particular reason to focus on local groups. I realize that some people find it inherently motivating to be in the same room with other people in their own community and build social bonds, so there’s some value there. But in general I think it’s more valuable for people to find ways to fill important vacant niches in the AIS ecosystem than to focus on replicating another organization but in <location>. That can be supplemented with informal local groups that exist to serve those social needs.
That’s not obvious to me; I do think there are constraints there but my sense is that the field is currently mainly bottlenecked by funding (1, 2).
Why are they more likely to give AIS the benefit of the doubt? Won’t that be most likely to happen if their exposure is to the highest-quality course they have access to?