On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

JWS 🔸Jun 15, 2024, 8:24 PM

72 points

Artificial intelligence AI risk skepticism Building effective altruism AI safety Community epistemic health Large Language Models

Overview

Recently Dwarkesh Patel released an interview with François Chollet (hereafter Dwarkesh and François). I thought this was one of Dwarkesh’s best recent podcasts, and one of the best discussions that the AI Community has had recently. Instead of subtweeting those with opposing opinions or vagueposting, we actually got two people with disagreements on the key issue of scaling and AGI having a good faith and productive discussion.^[1]

I want to explicitly give Dwarkesh a shout-out for having such a productive discussion (even if I disagree with him on the object level) and having someone on who challenges his beliefs and preconceptions. Often when I think of different AI factions getting angry at each other, and the quality of AI risk discourse plummeting, I’m reminded of Scott’s phrase “I reject the argument that Purely Logical Debate has been tried and found wanting. Like GK Chesterton, I think it has been found difficult and left untried.” More of this kind of thing please, everyone involved.

I took notes as I listened to the podcast, and went through it again to make sure I got the key claims right. I grouped them into similar themes, as Dwarkesh and François often went down a rabbit-hole to pursue an interesting point or crux and later returned to the main topic.^[2] I hope this can help readers navigate to their points of interest, or make the discussion clearer, though I’d definitely recommend listening/watching for yourself! (It is long though, so feel free to jump around the doc rather than slog through it one go!)

Full disclosure, I am sceptical of a lot of the case for short AGI timelines these days, and thus also sceptical of claims that x-risk from AI is an overwhelmingly important thing to be doing in the entire history of humanity. This is of course comes across in my summarisation and takeaways, but I think acknowledging that openly is better than leaving it to be inferred, and I hope this post can be another addition in helping improve the state of AI discussion both in and outside of EA/AI-Safety circles. It is also important to state explicitly here that I might very well be wrong! Please take my perspective as just that, one perspective among many, and do not defer to me (or to anyone really). Come to your own conclusions on these issues.^[3]

The Podcast

All timestamps are for the YouTube video, not the podcast recording. I’ve tried to cover the podcast by the main things as they appeared chronologically, and then tracking them through the transcript. I include links to some external resources, passing thoughts in footnotes, and more full thoughts in block-quotes.

Introducing the ARC Challenge

The podcast starts with an introduction of the ARC Challenge itself, and Dwarkesh is happy that François has put out a line in the sand as an LLM sceptic instead of moving the goalposts [0:02:27]. François notes that LLMs struggle on ARC, in part because its challenges are novel and meant to not be found on the internet, instead the approaches that perform better are based on ‘Discrete Program Search’ [0:02:04]. He later notes that ARC puzzles are not complex and require very little knowledge to solve [0:25:45].

Dwarkesh agrees that the problems are simple and thinks it’s an “intriguing fact” that ARC problems are simple for humans, but LLMs are bad at them, and he hasn’t been convinced by the explanations he’s got from LLM proponents/scaling maximalists about why that is [0:11:57]. Towards the end François mentions in passing that big labs tried ARC but didn’t share because their results because they’re bad [1:08:28].^[4]

One of ARC’s main selling points is that humans are clearly meant to do well at this, even children, [0:12:27] but Dwarkesh does push on point, suggesting that while smart humans will do well those of average intelligence will have a mediocre score. François says that they tried with ‘average humans’ and got 85%, but this was with MTurk and Dwarkesh is sceptical that this actually captures the ‘average human’ [0:12:47].^[5]

Finally, Mike has teamed up with François to increase the prize pool for solving ARC to ~$1 million. It’s currently being hosted on Kaggle [1:09:30], though there are limitations on the models that you can use and some compute restrictions (so do check this section for concrete details). Mike thinks that this is an important moment of ‘contact with reality’, and will up the prize if it isn’t solved within 3 months [1:24:33].

Writing this section I actually think that the podcast didn’t dive too deep into what ARC actually is. François originally introduced the idea in paper On the Measure of Intelligence, which is well worth a read. There’s a GitHub repo with a public train/test set, though Francois has kept the private test set private. The idea is that you have a set of ~3 test cases in a grid, with an input/output transformation described by an unknown rule. You are given a final test case and have to produce the correct output grid. You score 1 for outputting the correct grid, and 0 for any other answer.
To get a sense of what the tests are like, you can play along yourself! A couple of good interactive versions can be found here and here. Very quickly I think you should get the intuition of “Huh, why are state-of-the-art models trained with trillions of tokens really bad at this obvious thing I can just ‘get’?”
Other excellent recent research along these lines has been spearheaded by Melanie Mitchell,^[6] so I’ll link to some of her research on these lines:
The Debate Over Understanding in AI’s Large Language Models (please everyone in AI Safety read this paper)
The ConceptARC Benchmark: Evaluating Understanding
and Generalization in the ARC Domain
Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Should we expect LLMs to “saturate” it?

At various points, Dwarkesh poses a challenge to François by asking why we shouldn’t just expect the best LLM within a year to ‘saturate’^[7] the test. [0:01:05, 0:14:04, 1:13:47] François thinks that the answer is empirical [0:09:52, 1:14:09] but he is sceptical that LLM-based/LLM-only/Scaling-driven approaches will be able to crack ARC, and that those kind of techniques have reached a plateau [01:04:40] Chollet says that existing methods have reached a plateau. . Mike notes that he expected the same thing when he first heard of ARC but has come around to it getting at something different than those other benchmarks [1:02:57], and that the longer ARC survives the more the story of ‘progress in LLMs has plateaued’^[8] starts to look more plausible [1:24:52].

Mike notes that as the public train and test set are on GitHub there’s an asterisk on any result on this metric, as it may have been included in the training data of the LLMs taking the test [1:20:05] Dwarkesh counters by referencing Scale AI’s recent work on data contamination on the GSM8K benchmark, and that while many models were overfit the leading ones didn’t suffer from correcting for this [1:21:07].

A few times, Dwarkesh asks if multi-modal models will be able to perform better than text only models [0:08:43, 0:09:44, 1:24:25].^[9] François also says that fine-tuning LLMs on ARC shows that it can parse the inputs, so that isn’t actually a problem. The reason that they don’t do well, instead, is the unfamiliarity of the tasks the LLMs are being asked to solve [0:11:12].

There are a few times in the podcast where Dwarkesh mentions talking about ARC personally with friends (often they’re working a leading AI labs in San Fransisco) and being unconvinced by their answers, or finding those friends overconfident in the ability of LLMs to be able to solve ARC puzzles. [1:11:31, 1:27:13] In the latter case, they went from going “of course LLMs can solve this” to getting 25% with Claude 3 Opus on the publicly available testing set.

This seems to be a bit of reference class tennis, where Dwarkesh is looking at recent performance of leading models on benchmarks and saying our prior on ARC should be a similar here, that it’s a harder test but it will fall the same fate. In contrast, I think LLM sceptics like François (and myself) are taking a more Lakatos-esque defence by saying the fact that LLMs can do well on things like MMLU but terribly on ARC is rather an falsification of an auxiliary hypothesis (‘that current benchmarks are actually measuring progress towards AGI’) and while the core hypothesis (‘current approaches will not scale all the way to AGI’) remains valid.
At heart, a huge chunk of this debate really is a question of epistemology and philosophy of science. Still, I suppose reality will adjudicate for the most extreme hypotheses either way within the next few years.

Has Jack Cole shown LLMs can solve ARC?

Dwarkesh brings up the example of Jack Cole (who, along with Mohamed Osman, has achieved SOTA performance on the hidden test set − 34%) a few times in the discussion as another way to push François on whether LLMs can perform well [0:13:58, 1:13:55].

François pushes back here, saying that what Jack is doing is actually trying to get active inference to work, and thereby get the LLMs to run program synthesis [1:14:15]. For the particulars, François notes two key things about Jack’s approach, which is that the high performance is the result of pre-training of millions of synthetically generated ARC tasks and combining that with test-time fine tuning so the LLM can actually learn from the test case [0:14:25].

I think this was important enough to make its own separate section. It brings up more clearly where François’ scepticism comes from. When you give an LLM a prompt you are doing static inference. There’s a conversion of your prompt into a numerical format, then ridiculous amounts of matrix multiplication, and then an output.
What does not happen in this process is any change to the weights of the model, they have essentially been ‘locked’. Thus, in some fundamental sense, LLMs never ‘learn’ anything once their training regime ends. Now, while the term active inference can lead you down a free-energy-principle rabbithole, I think François is using it mean ‘an AI system that can update its beliefs and internal states efficiently and quickly given novel inputs’, and he thinks this novelty is a critical part of dealing with the world effectively.
I think in his point of view, getting an AI system to do this would be a different paradigm to the current ‘Scale LLMs larger’ paradigm. And have to say, I’m very heavily on François’ side here, rather than Dwarkesh’s.

Is there a difference between ‘Skill’ and ‘Intelligence’?

Early on Dwarkesh asks if they have so much in distribution that we can’t tell if a test case is in distribution or not, does it even matter [0:04:32]? François fundamentally rejects the premise here, saying that you can never pre-train on everything you need, because the world is constantly changing [0:05:07]. Some animals don’t need this ability,^[10] but humans do. Humans are born with limited knowledge but the ability to learn efficiently in the face of things we’ve no seen before [0:06:32]. Instead, François draws a distinction between skill and intelligence, and keeps returning to the concept to answer Dwarkesh’s queries throughout the discussion [0:19:33, 0:21:10].

One of the main takeaways ARC is meant to show the general intelligence of humans is its demonstration of our extreme sample efficiency. Dwarkesh challenges by trying to analogise human formal education to pre-training of a transformer model [0:22:00]. François’ response to this is that building blocks of ‘core knowledge’ are necessary for general reasoning, but these are mostly acquired early in life [0:22:55, 1:32:01]. Dwarkesh responds that some of the geometric patterns that allow him to solve ARC puzzles are things he’s seen throughout his life [1:30:54].

Dwarkesh says its compatible with François’ story that even if the reasoning is local, it will get better as the size of the model increases [0:23:05]. François agrees and Dwarkesh is confused. François then clarifies that he’s talking about generality [0:23:44]. He specifically claims:

“General intelligence is not task-specific skill scaled up to many skills, because there is an infinite space of possible skills. General intelligence is the ability to approach any problem, any skill, and very quickly master it using very little data. This is what makes you able to face anything you might ever encounter. This is the definition of generality.” [0:23:56]

Dwarkesh responds by pointing to the ability of Gemini 1.5 to do translation of Kalamang, a language with fewer than 200 speakers^[11] shows that larger versions of these models are gaining the capacity to generalise efficiently [0:24:37]. Chollet essentially rolls to disbelieve, saying that if this was true then these models would be performing well on ARC puzzles, since they are not complex, and much less complex than Kalamang translation.

François’ distinction between these two concepts is in Section I.2 in “On the Measure of Intelligence”. Part of the disagreement here in this (and the next section) is that Dwarkesh seems to be taking an implicit frame of intelligence-as-outputs whereas François is using on intelligence-as-process.
This may be a case where the term ‘Artificial General Intelligence’ is again causing more confusion than clarity. It’s certainly at least conceivable for their to be transformative effects from AI without it ever clearly meeting any concept of ‘generality’, and I think Dwarkesh is focusing on those effects, which is why he tries to draw François into scenarios where most workers have been automated, for example.
I’m not sure who comes off best in the exchange about Kalamang translation. It seems very odd for frontier models to be able to do that but not solve ARC, but it does point to Dwarkesh’s underlying claim that if we can fit enough things into model training distribution, it could be ‘good’ enough at many them even if it doesn’t meet François’ definition of AGI. Still, in my mind the modal outcome there is still more likely to be one of ‘economic disruption’ and less of ‘catastrophic/existential’ risk.

Are LLMs ‘Just’ Memorising?

This section is probably the most cruxy of the entire discussion, it’s where Dwarkesh seems to get the most frustrated, and the two go in circles a bit. This is where two epistemic worldviews are colliding, and probably the most important part to focus on.

François casts doubt on the scaling laws by saying that they are based on benchmark performance, and that these benchmarks are able to be solved by memorisation. He summarises the leading models as ‘interpolative databases’, and that their performance on such benchmarks will increase will more scale [0:17:53]. He doesn’t think this is what we want though, and later on in the podcast even states “with enough scale, you can always cheat” [0:40:47].^[12]

Dwarkesh, however, denies the premise that all they are doing is memorisation and asks François why we could not in principle just brute-force intelligence [0:31:06, 0:40:49]. François actually agrees that this would be possible in a fully static world, but that the nature of the world is one where the future is unknown where this complete knowledge is not possible [0:42:16]. Dwarkesh gets his most annoyed in the podcast at this point and accuses François of playing semantics:

“you’re semantically labeling what the human does as skill. But it’s a memorization when the exact same skill is done by the LLM, as you can measure by these benchmarks.” [0:42:58]

François says that memorisation could still be used to automate many things, as long as they are part of a ‘static distribution’. Dwarkesh presses if this might include most jobs today, and François answers ‘potentially’ [0:44:31] He clarifies that LLMs can be useful, and that he has been a noted proponent of deep learning,^[13] but that this would not lead to general intelligence, and that it may be possible to decouple the automation of many jobs from the creation of a general intelligence [0:44;48].

Dwarkesh’s counter hypothesis is that creativity could just be interpolation in a high-enough dimension [0:35:24, 0:36:27]. He claims that models are vastly underparameterised compared to the human brain [0:32:59, 0:43:34],^[14] and points to the phenomenon of ‘grokking’^[15] as an example of generalisation within current models, arguing that as models get bigger the compression will lead to generalisation. [0:46:29, 0:48:14]. François actually agrees that LLMs have some degree of generalisation, this is due to compression of their training data and that grokking is not a new phenomenon but instead an expression of the minimum description length principle [0:47:04, 0:48:58].

Dwarkesh argues that transformers might, in some basic level, already be doing program synthesis. François says while it may be possible we should then expect them to do well on ARC given that the ‘solution program’ for any ARC task is simple, and Dwarkesh seemingly concedes the point [0:38:09]. François reinforces his point of view by pointing to the example that current LLMs fail to generalise Caesar Cipher solutions, and only retain those most stored in their training data [0:26:38].^[16]

Dwarkesh asks why François sees the current paradigm as intrinsically limited, and François responds due to the fundamental nature of the model being a ‘big parametric curve’, and it is thus limited to only ever generalising within distribution [0:49:04, 0:50:16]. In his following explanation, François argues that deep learning and discrete program search are essentially opposites and that progress will require combining the two [0:49:35].

Ok, this is the big one:
There are some points that get mixed up here, and the confusion stems from the skill/intelligence or outcome/process distinction mentioned in my comments to the last section.
First, François agrees with Dwarkesh that it is at least conceptually possible for an AI system in a ‘memorisation regime’ (e.g. GPT-4+N) to be highly skilful, useful, and automate many jobs. Thus, at some level, while François thinks that this question is an empirical one.
Second, François thinks that world that exists has so much irreducable change and complexity that going beyond memorisation is necessary to function at any level of complex capability. All humans have this,^[17] and AGI must have this quality to, and the LLM-family-tree of models don’t. The empirical anchors he points to things like the simplicity of ARC, or LLMs failure with Caesar Ciphers.
Dwarkesh is annoyed because he thinks that François is conceptually defining LLM-like models as incapable of generalisation, whereas I think François’ more fundamental claim is about the unpredictability and irreducible complexity of the world itself. If he didn’t believe the world was that irreducibly complex, he’d be much more of a scaling maximalist.
I think, on the object level of memorisation vs generalisation vs reasoning, François comes out on top in this crucial discussion. François has a deep knowledge of how these models work, whereas Dwarkesh is not at the same level of expertise and is pointing to an explanation of observations, but when François counters he has little to support his point apart from repeating that enough interpolation/memorisation is the same thing as generalisation/intelligence, which is rather assuming the answer to the whole issue at hand, and moves me much less than François’ counterpoints.
I think Dwarkesh’s view of general intelligence simply being a patchwork of local generalisation is rather impoverished way of looking at it (and other human beings), but I’ve separated that more into the following section. I think when Dwarkesh talks about benchmarks he’s assuming they’re are reliable guide, but I think that’s very much open to question.

Are the missing pieces to AGI difficult or hard to solve?

At multiple points, Dwarkesh notes that even his LLM maximalist friends do not believe ‘scale is all you need’, but that scale is the most important thing and that adding on the additional extras needed to get to AGI beyond scale will be the easy part [0:16:28, 0:55:09]. François disagrees, and says that the hard part of intelligence is the system 2 part [0:16:57, 0:56:21].

Dwarkesh refers back to a previous podcast where one of his guests^[18] believes that intelligence is just hierarchically associated memories [0:57:04]. François doesn’t quite seem to get what Dwarkesh is getting at, and they end up not diving into this more because Dwarkesh thinks that they are going in circles [0:59:37].

This is another critical crux, and one that I am again much more inclined to take Chollet’s point-of-view on than the scaling maximalists. Again, in the previously related podcast, Trenton says “most intelligence is pattern matching” and, I don’t know, that seems a really contested claim? ^[19] It just seem like many of the scaling maximalists have assumed that System 2/Common Sense Reasoning will be the easy part, but I very much disagree.
However, much of my thinking on this issue of intelligence/explanation/creativity has has been highly influenced by David Deutsch and his works. I’d highly recommend buying and reading The Beginning of Infinity, which if you disagree with might mean we have some core epistemological differences.
In other places the concept I think is highly difficult to get to work (compared to more scaling) is referred to as ‘Discovering actions’ by Stuart Russell in Human Compatible,^[20] or ‘savannah-to-boardroom generalization’ by Ajeya Cotra in this LessWrong dialogue (though my point-of-view is very much closer to Ege in that discussion).

What would it take for François to change his mind?

At various points in the podcast, Dwarkesh asks the question of what would happen to François’ views if they do succeed at ARC, or what would he need to see to think that the scaling paradigm is on the path to AGI. [0:02:34, 0:03:36] Mostly, François to this by saying that this is an empirical question (i.e. he’d change his mind if he sees evidence), but he then clarifies that but also makes the point on how the performance was achieved on these benchmarks, and that he’d want to see cases where models can adapt on the fly and do something truly novel that is not in its training data [0:02:51, 0:03:44].

Some Odds and Ends

Both Mike and François are sad about the closing down of previously openly shared research in Frontier AI work [1:06:16], and François goes further to lay the blame at OpenAI’s feet and says that this has set AGI back 5-10 years^[21] [1:07:08].

A couple of points also stood out to me as quite weird but perhaps lacking context or inferential distance:

At [1:08:40] Dwarkesh mentions Devin as an example of a scaffolding-type approach that could be promising on ARC, which is weird given that Devin seems to have been overhyped and very much not good at what it claimed to be good at.
At [1:33:19] François says that the goal of ARC is to “accelerate progress towards AGI”, but instead of that being a latent e/acc-ism from François instead I think it’s a reflection that he thinks LLMs are an “off ramp” to AGI, so acceleration could just mean ‘getting back on track’. Still, if you do want to find a research paradigm that leads to AGI, you’ll need to tackle Russell’s question: What happens if we succeed?

Takeaways

I tried, as much as reasonable, to leave my fingerprints in my summary off apart from the ending sections in the previous section. However, thinking about this episode and its response has been both enlightening and disappointing for me. These takes are fairly hot but I also didn’t want to make them a post in-and-of themselves, so I apologise if they weren’t fleshed out fully, especially if you disagree with me!

First, I was surprised to see how much of a splash this made on AI Twitter, it seemed that many people hadn’t heard of the ARC Challenge at all. In the podcast Dwarkesh mentions senior ML researchers in his social scene who just naïvely expected current LLMs to easily solve it.
- I think this makes me very concern of a strong ideological and philosophical bubble in the Bay regarding these core questions of AI. As Nuno Sempere suggests, the fact that the Bay has seemingly converged so much “is a sign that a bunch of social fuckery is going on.”
- An intuition I have is that LLM-scepticism has been banished to ‘outgroup ideas’ by those working in the Bay, and that François would not be listened to on his own. However, Dwarkesh may have provided a ‘valid’ route for those ideas to enter the community.
- It is sociologically interesting, though, that Dwarkesh spent an hour grilling François, but also spent four hours mostly soft-balling Aschenbrenner and laughing about buying galaxies instead of really grilling his worldview in the same way. I think Aschenbrenner’s perspective is OOMs more in needed of such a grilling than François’.
Second, I was disappointed by the initial response in AI Safety community.
- The initial post I saw about it on LessWrong, encouraging the Safety Community to take the ideas seriously, was both negative in karma and agreement votes for a while.
- That poster was @jacquesthibs (see also his website and twitter). I want to give Jacques a shout-out for how well he’s integrated this conflicting perspective into his worldview, and taking the ARC Challenge seriously.
- I think there is a more pernicious strand here in some parts of AI Safety/AI community overall:
  - The view that humans are ‘mere pattern matches’, or the view that this is all we are or all we are doing. For instance, we back in discussions about GPT-2, Scott Alexander starts off with a ‘your mum’ insult:
    - A machine learning researcher writes me in response to yesterday’s post, saying “I still think GPT-2 is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.” I resisted the urge to answer “Yeah, well, your mom is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.” But I think it would have been true.
    - I don’t think it’s true. Maybe the ML researcher was right, Scott.
    - There maybe a deep philosophical disagreement about what humans are, but I think this kind of view can be quite misanthropic.
  - I also see this connected to so many people in AI denying the existence of qualia obviously, instead of one that is a fraught philosophical issue.^[22]
    - Now, a lot of these impressions are from Twitter. But unfortunately a lot of AI discourse happens on Twitter.
Third, if François is right, then I think this should be considered strong evidence that work on AI Safety is not overwhelmingly valuable, and may not be one of the most promising ways to have a positive impact on the world.
- It would make AI Safety work less tractable, since we’d have very little information on what transformative/general AI would look. Cluelessness would overcome our ability to steer the future. Only by buying into the ‘scale is all you need’ perspective can you really reduce Knightian uncertainty enough to make clear ITN calculations on AI Safety.
- Scale Maximalists, both within the EA community and without, would stand to lose a lot of Bayes points/social status/right to be deferred to. I mean, I think they should lose those things already because I think they’re very wrong. Similarly, if current work in AI Safety is not overwhelmingly the most valuable thing to do now (or in the whole history of humanity) then those saying to should likewise ought to lose out. It also ought to lead to a change in funding/position of AI Safety when compared to other EA causes.
  - In any case, if you had just deferred on AI ability/AI timelines to the most concerned EAs, I implore you to stop and start looking for credible, good-faith alternative takes on this issue.
  - I would highly recommend, for example, Tan Zhi Xuan’s pushback on Jacob Steinhardt’s case for GPT-2030 under the transformer architecture.^[23]
- I see this viewpoint as holding quite strongly for those at Open Philanthropy, and I would like to see them take on the challenge of François’ perspective, and it doesn’t seem to me they’ve accounted for “What if François is right” in their models. For instance:^[24]
  - Tom Davidson’s model is often referred to in the Community, but it is entirely reliant on the current paradigm + scale reaching AGI. That’s sweeping the biggest point of disagreement away as an assumption. But that assumption is everything.
  - In April 2022, Alex Lawsen said that the perspective that Bongard problems^[25] will pose issues to AI will ‘end up looking pretty silly soon’. I wonder if the stubbornness of the ARC Challenge over 2 years from this claim might have updated them?
  - It seems that many people in Open Phil have substantially shortened their timelines recently (see Ajeya here).
  - I know it’s a primarily a place for him to post funny memes, but come on Julian, this is not an accurate portrayal of LLM/trendline sceptics. I think the actual case is framed more fairly by Seth Lazar here.
  - To clarify, I think OpenPhil do a lot of good work, and all of the people I’ve mentioned are probably very nice, conscientious, and rigorous. But OpenPhil is by far the biggest player in the EA scene, especially involving funding, and I think that necessitates extra scrutiny. I apologise if what I’ve written here has gone beyond the line.

And that’s all folks! Thanks for reading this far if you have. I’ll try to respond in the comments to discussions as much as possible, but I am probably going to need a bit of a break after writing all of this, and I’m on holiday for the next few days.

^
At the end of the podcast Dwarkesh explicitly says he was playing devil’s advocate, but I think he is arguing for a pro-scaling point-of-view. His post Will scaling work? provides a more clear look at his perspective, and I highly recommend reading it.
^
There’s a second section with Mike Knoop (now Mike) which is more focused on the ARC Prize relaunch, which I have fewer notes on but still included
^
At least, come to your own conclusion on how you stand regarding them. Not saying everyone has to become an expert in mechanistic interpretability.
^
If there’s anyone reading who’s in a position to verify this, can we? Even the fact of ~poor performance from the leading labs would support Francois’ prior and not Dwarkesh’s.
^
The most relevant research paper I could find is this one, where children were able outperform the average LLM on a simplified ARC test from around the age of 6 onwards. Still, these were kids visiting the ‘NEMO science museum in
Amsterdam’ so again it’s not really a sample of median humans.
^
Perhaps not-coincidentally, a noted critic of AI x-risk
^
He essentially means get expert performance, solve close enough to ~100% so that there is not much signal in a model’s ARC score.
^
Or has even… dare we say… hit a wall?
^
I’m a bit confused on this point, and also about what ‘natively multi-modal’ means, or at least why Dwarkesh is expecting it to be such a game changer? Aren’t GPT4o and Gemini already multimodal models that perform badly at ARC?
^
Chollet seems to be referring to cases like Syphex Wasps, though how accurate that anecdote actually is is up for debate. But to me, even simple organisms showing adaptive behaviour beyond the capacity of LLMs is even more reason to be sceptical about projections of imminent AGI.
^
See section 4.2.2.1
^
He is gesturing at the notion of shortcut learning
^
He has literally written a textbook about it
^
The implication here is that as their scale increases, they’ll be able to achieve human level extrapolation via interpolation.
^
In the linked paper, it’s defined as an phenomenon that’s observed “where models abruptly transition to a generalizing solution after a large number of training steps, despite initially overfitting”
^
Even I, as an LLM sceptic, was sceptical of this claim by François, but it’s actually true!
^
There was an interesting exchange between François and Subbarao Kambhampati on whether this also holds for civilisation, which you can read here
^
Trenton Bricken, Member of Technical Staff on the Mechanistic Interpretability team at Anthropic
^
To be very fair to him, Trenton does introduce this as a ‘hot take’
^
In the chapter called “How Might AI Progress in the Future?”, Russell says “I believe this capability is the most important step needed to reach
human-level AI.” though he also says this could come at any point given a breakthrough. Perhaps, but I still think getting to that breakthrough will be much more difficult than scaling transformers to ever-larger-sizes, and especially if scaling maximalism becomes ideologically dominant to the exclusion of alternative paradigms.
^
Given François’ sceptical position, I wouldn’t put too much stock in taking his timeline adjustments too concretely.
^
When the Chinese Room comes up, for instance, it’s instantly dismissed with the systems reply despite Searle addressing that in this original paper.
^
I actually can’t recommend Xuan’s work and perspective highly enough. My route to LLM scepticism really picked up momentum with this thread, I think.
^
I’m calling out particularly examples here because I think it’s good to do so rather than to vaguepost, but please see my final bullet point in this section. I think my issue might be with OpenPhil’s epistemic perspective on AI culturally, rather than any of the individuals working there.
^
Bongard Problems were developed in the 1960s, and are very similar to ARC puzzles. There are a few shots indicating some kind of rule, and you’ll solve the test once you can identify the rule.

What links here?