This other Ryan Greenblatt is my old account[1]. Here is my LW account.
- ^
Account lost to the mists of time and expired university email addresses.
This other Ryan Greenblatt is my old account[1]. Here is my LW account.
Account lost to the mists of time and expired university email addresses.
I can buy that GPT4o would be best, but perhaps other LLMs might reached ‘ok’ scores on ARC-AGI if directly swapped out? I’m not sure what you refer to be ‘careful optimization’ here though.
I think much worse LLMs like GPT-2 or GPT-3 would virtually eliminate performance.
This is very clear as these LLMs can’t code basically at all.
If you instead consider LLMs which are only somewhat less powerful like llama-3-70b (which is perhaps 10x less effective compute?), the reduction in perf will be smaller.
It is also highly variable to what we mean by AGI though.
I’m happy to do timelines to the singularity and operationize this with “we have the technological capacity to pretty easily build projects as impressive as a dyson sphere”.
(Or 1000x electricity production, or whatever.)
In my views, this likely adds only a moderate number of years (3-20 depending on how various details go).
I think there signal vs noise tradeoffs, so I’m naively tempted to retreat toward more exclusivity.
This poses costs of its own, so maybe I’d be in favor of differentiation (some more and some less exclusive version).
Low confidence in this being good overall.
I’m not really referring to hardware here, in pre-training and RLHF the model weights are being changed and updated
Sure, I was just using this as an example. I should have made this more clera.
Here is a version of the exact same paragraph you wrote but for activations and incontext learning:
in pre-training and RLHF the model activations are being changed and updated by each layer, and that’s where the ‘in-context learning’ (if we want to call it that) comes in—the activations are being updated/optimized to better predict the next token and understand the text. The layers learned to in-context learn (update the activations) across a wide variety of data in pretraining.
(We can show transformers learning to optimization in [very toy cases](https://www.lesswrong.com/posts/HHSuvG2hqAnGT5Wzp/no-convincing-evidence-for-gradient-descent-in-activation#Transformers_Learn_in_Context_by_Gradient_Descent__van_Oswald_et_al__2022_).)
Fair enough if you want to say “the model isn’t learning, the activations are learning”, but then you should also say “short term (<1 minute) learning in humans isn’t the brain learning, it is the transient neural state learning”.
Perhaps in this case ARC-AGI is best used as a suite of benchmarks, where the same model and scaffolding should be used for each?
Yes, it seems reasonable to try out general purpose scaffolds (like what METR does) and include ARC-AGI in general purpose task benchmarks.
I expect substantial performance reductions from general purpose scaffolding, though some fraction will be due to not having prefix compute allocating test time compute less effectively.
I still think the hard part is the scaffolding.
For this project? In general?
As far as this project, seems extremely implausible to me that the hard part of this project is the scaffolding work I did. This probably holds for any reasonable scheme for dividing credit and determining what is difficult.
Sure, maybe in a few months we’ll see the top score on the ARC Challenge above 85%, but could such a model work in the real world?
It sound like you agree with my claims that ARC-AGI isn’t that likely to track progress and that other benchmarks could work better?
(The rest of your response seemed to imply something different.)
Fifth and finally, I’m slightly disappointed at Buck and Dwarkesh for kinda posing this as a ‘mic drop’ against ARC.
I don’t think the objection is to ARC (the benchmark), I think the objection is to specific (very strong!) claims that chollet makes.
I think the benchmark is a useful contribution as I note in another comment.
So, if I accept Ryan’s framing of the inconsistent triad, I’d reject the 3rd one, and say that “Current LLMs never “learn” at runtime (e.g. the in-context learning they can do isn’t real learning)”
You have to reject one of the three. So, if you reject the third (as I do), then you think LLMs do learn at runtime.
I’m quite confused, given the fact that all of the weights in the transformer are frozen after training and RLHF, why it’s called learning at all
In RLHF and training, no aspect of the GPU hardware is being updated at all, its all frozen. So why does that count as learning? I would say that a system can (potentially!) be learning as long as there is some evolving state. In the case of transformers and in-context learning, that state is activations.
Third, and most importantly, I think Ryan’s solution shows that the intelligence is coming for him, and not from Chat-GPT4o. skybrian makes this point in the comments in the substack comments.
[...]
To my eyes, I think the hard part here was the scaffolding done by Ryan rather than the pre-training[4] of the LLM (this is another cruxy point I highlighted in my article).
Quoting from a substack comment I wrote in response:
Certainly some credit goes to me and some to GPT4o.
The solution would be much worse without careful optimization and wouldn’t work at all without gpt4o (or another llm with similar performance).
It’s worth noting a high fraction of my time went into writing prompts and optimization the representation. (Which is perhaps better described as teaching gpt4o and making it easier for it to see the problem.)
There are different analogies here which might be illuminating:
Suppose that you strand a child out in the woods and never teach them anything. I expect they would be much worse at programming. So, some credit for there abilities goes to society and some to their brain.
If you remove my ability to see (on conversely, use fancy tools to make it easier for a blind person to see) this would greatly affect my ability to do ARC-AGI puzzles.
You can build systems around people which remove most of the interesting intelligence from various tasks.
I think what is going on here is analogous to all of these.
Separately, this tweet is relevant: https://x.com/MaxNadeau_/status/1802774696192246133
I think it’s much less conceptually hard to scrape the entire internet and shove it through a transformer architecture. A lot of leg work and cost sure, but the hard part is the ideas bit,
It is worth noting that hundreds (thousands?) of high quality researcher years have been put into making GPT4o more performant.
the claimed numbers are not SOTA, but that is because there are different training sets and I think the ARC-AGI team should more clear about that
Agreed, though it is possible that my approach is/was SOTA on the private set. (E.g., because Jack Cole et al.’s approach is somewhat more overfit.)
I’m waiting on the private leaderboard results and then I’ll revise.
My only sadness here is that I get the impression you think this work is kind of a dead-end?
I don’t think it is a dead end.
As I say in the post:
ARC-AGI probably isn’t a good benchmark for evaluating progress towards TAI: substantial “elicitation” effort could massively improve performance on ARC-AGI in a way that might not transfer to more important and realistic tasks.
But, I still think that work like ARC-AGI can be good on the margin for getting a better understanding of current AI capabilities.
Pair this with the EA concern that we should be concerned about the counterfactual impact of our actions, and that there are opportunities to do good right here and now,[3] it shouldn’t be a primary EA concern.
As in, your crux is that the probability of AGI within the next 50 years is less than 10%?
I think from an x-risk perspective it is quite hard to beat AI risk even on pretty long timelines. (Where the main question is bio risk and what you think about (likely temporary) civilizational collapse due to nuclear war.)
It’s pretty plausible that on longer timelines technical alignment/safety work looks weak relative to other stuff focused on making AI go better.
I don’t comment or post much on the EA forum because the quality of discourse on the EA Forum typically seems mediocre at best. This is especially true for x-risk.
I think this has been true for a while.
Farmed animals are also neglected relative to wild animals
Typo?
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
Hmm, I think the time when deals with AIs are important is pretty much “when AIs pose serious risk via misalignment”. (I also hope this isn’t soon all else equal.) Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use). E.g., make a foundation which is run by AI obligation honoring purists with the mission of doing what the AI wants and donate to the foundation.
I think there is a strong case for work on making deals with AIs and investigating what preferences AIs have (if any) for mitigating AI takeover risk. I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
This work is strongly connected to digital minds work.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
Some quick takes on this from me: I agree with 2 and 3, but it’s worth noting that “post-AGI” might be “2 years after AGI while there is a crazy singularity on going and vast amounts of digital minds”.
I think as stated, (1) seems about 75% likely to me, which is not hugely reassuring. Further, I think there is a critical time you’re not highlighting: a time when AGI exists but humans are still (potentially) in control and society looks similar to now.
By “permanent”, I mean >10 billion years. By “global”, I mean “it ‘controls’ >80% of resources under earth originating civilization control”. (Where control evolves with the extent to which technology allows for control.)
I think this seems like mostly a fallacy. (I feel like there should be a post explaning this somewhere.)
Here is an alternative version of what you said to indicate why I don’t think this is a very interesting claim:
Sure you can have a very smart quadriplegic who is very knowledgable. But they won’t do anything until you let them control some actuator.
If your view is that “prediction won’t result in intelligence”, fair enough, though its notable that the human brain seems to heavily utilize prediction objectives.