I’m an ML researcher, and I would give the probability of baseline HFDT leading to a PASTA set of capabilities as approximately 0, and my impression is that this is the experience of the majority of ML researchers.
Baseline HFDT seems to be the single most straightforward vision that could plausibly work to train transformative AI very soon. From informal conversations, I get the impression that many ML researchers would bet on something like this working in broadly the way I described in this post, and multiple major AI companies are actively trying to scale up the capabilities of models trained with something like baseline HFDT.
In other words, I disagree with this, and therefore it seems unclear what to take away from the rest of the post, which is well-reasoned iff you agree with the starting assumptions.
Remember that PASTA is actually a very strong criterion: it requires AI to be able to do all activities in the scientific research loop. Including evaluating whether ideas are good and generating creative new ideas—skills which I think are general-intelligence-complete. I’ve no doubt that a HFDT system can automate some parts of the scientific discovery process, such as writing draft papers, or controlling some robot to do lab experiments in constrained environments. But ultimately this subset of PASTA only makes such a system a research assistant, one which like AlphaFold may make some research more efficient, but which does not complete the full loop.
Can you say more about why you think this? Both why you think there’s 0 chance of HFDT leading to a system that can evaluate whether ideas are good and generate creative new ideas, and why you think this is what the majority of ML researchers think?
(I’ve literally never met a ML researcher with your view before to my knowledge, though I haven’t exactly gone around asking everyone I know & my environment is of course selected against people with your view since I’m at OpenAI.)
The rough shape of the argument is that I think a PASTA system requires roughly human-level general intelligence, and that implies some capabilities which HFDT as described in this post does not have the ability to learn. Using Karnofsky’s original PASTA post, let’s look at some of the requirements:
Consume research papers.
Identify research gaps and open problems.
Choose an open problem to focus on.
Generate hypotheses to test within a problem to potentially solve it.
Generate experiment ideas to test the hypotheses.
Judge how good the experiment ideas are and select experiments to perform.
Carry out the experiments.
Select a hypothesis based on experiment results.
Write up a paper.
Judge how good the paper is.
A system which can do all of the above is a narrower requirement than full general intelligence, especially if each task is decomposed into separate models, but many of these points seem impractical given what we currently know about RL with human feedback. Crucially, a PASTA system requires the ability to do these tasks autonomously in order to have any chance of transformatively accelerating scientific discovery—Ajeya specifies that Alex is an end-to-end system. It’s not sufficient, for example, to make a system which can generate a bunch of experiment or research ideas in text but relies on humans to evaluate them. In particular, I would identify 4, 5, 6, 8, and 10 as the key parts of the research loop which seem to require significant ML breakthroughs to achieve, and which I think may be general-intelligence-complete. One thing these tasks have in common is that good feedback is hard for humans, the rewards are sparse, and the episode length is potentially extremely long.
And yet current work on RL with human feedback shows that these techniques work well in highly constrained environments with relatively short episode length, and accurate & frequent reward signal and gradients in the human feedback. Sparse feedback immediately decreases performance significantly (see the ReQueST paper). To me, this suggests a significant number of fundamental breakthroughs remain to achieve PASTA, and that HFDT as described here does not have this capability, even if scaled + minor improvements.
Regarding my impression of the opinions of other ML researchers, I’m a PhD student at Cambridge and have spoken to many academics (both senior and junior), some people at DeepMind and some at Google Brain about their guesses for pathways to AGI, and the vast majority don’t seem to think that scaling current RL algorithms + LLMs + human feedback gets us much further towards AGI than we currently are, and they think there are many missing pieces of the puzzle. I’m not surprised to hear that OpenAI has a very different set of opinions though—the company has bet heavily on a particular approach to AGI so would naturally attract people with a different view to the one I’ve described.
One way to see this is to look at timelines to human level general intelligence—I think most ML researchers would not put this at within 10 or 20 years, based on previous surveys. Yet as Ajeya describes in the post, if training PASTA with baseline HFDT is possible, it seems very likely to happen within 10 years, and I agree with her that “the sooner transformative AI is developed, the more likely it is to be developed in roughly this way”. I think that if a full PASTA system can be made, then we are likely to have solved almost all of the bottlenecks to create an AGI. Therefore I think that the timelines conflict with the hypothesis that scaling current techniques which don’t require new fundamental breakthroughs is enough for PASTA.
I’m pretty unconvinced that your “suggests a significant number of fundamental breakthroughs remain to achieve PASTA” is strong enough to justify the odds being “approximately 0,” especially when the evidence is mostly just expecting tasks to stay hard as we scale (something which seems hard to predict, and easy to get wrong). Though it does seem that innovation in certain domains may lead to long episode lengths and inaccurate human evaluation, it also seems like innovation in certain fields (e.g., math) could easily not have this problem (i.e., in cases where verifying is much easier than solving).
I gave the comment a strong upvote because it’s super clear and informative. I also really appreciate it if people spell out their reasons for “scale in not all you need”, which doesn’t happen that often.
That said, I don’t agree with the argument or conclusion. Your argument, at least as stated, seems to be “tasks with the following criteria are hard for current RL with human feedback, so we’ll need significant fundamental breakthroughs”. The transformer was published 5 years ago. Back then, you could have used a very analogous argument about language models to argue that language models will never do this or that task; but for many of these tasks, language models can perform them now (emergent properties).
Thank you for the comment - it’s a fair point about the difficulty of prediction. In my post I attempted to point to some heuristics which suggest strongly to me that significant fundamental breakthroughs are needed. Other people have different heuristics. At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
I also think that even the original Attention Is All You Need paper gave some indication of the future direction by testing a large and small transformer and showing greatly improved performance with the large one, while RLHF’s early work does not appear to have a similar immediately obvious way to scale up and tackle the big RL challenges like sparse rewards, problems with long episode length, etc.
At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
Sorry, what I wanted to say is it seems unclear if fundamental breakthroughs are needed. They might be needed, or not. I personally am pretty uncertain about this and think that both options are possible. I think it’s also possible that any breakthroughs that will happen won’t change the general picture described in the OP much.
I’m an ML researcher, and I would give the probability of baseline HFDT leading to a PASTA set of capabilities as approximately 0, and my impression is that this is the experience of the majority of ML researchers.
In other words, I disagree with this, and therefore it seems unclear what to take away from the rest of the post, which is well-reasoned iff you agree with the starting assumptions.
Remember that PASTA is actually a very strong criterion: it requires AI to be able to do all activities in the scientific research loop. Including evaluating whether ideas are good and generating creative new ideas—skills which I think are general-intelligence-complete. I’ve no doubt that a HFDT system can automate some parts of the scientific discovery process, such as writing draft papers, or controlling some robot to do lab experiments in constrained environments. But ultimately this subset of PASTA only makes such a system a research assistant, one which like AlphaFold may make some research more efficient, but which does not complete the full loop.
Can you say more about why you think this? Both why you think there’s 0 chance of HFDT leading to a system that can evaluate whether ideas are good and generate creative new ideas, and why you think this is what the majority of ML researchers think?
(I’ve literally never met a ML researcher with your view before to my knowledge, though I haven’t exactly gone around asking everyone I know & my environment is of course selected against people with your view since I’m at OpenAI.)
The rough shape of the argument is that I think a PASTA system requires roughly human-level general intelligence, and that implies some capabilities which HFDT as described in this post does not have the ability to learn. Using Karnofsky’s original PASTA post, let’s look at some of the requirements:
Consume research papers.
Identify research gaps and open problems.
Choose an open problem to focus on.
Generate hypotheses to test within a problem to potentially solve it.
Generate experiment ideas to test the hypotheses.
Judge how good the experiment ideas are and select experiments to perform.
Carry out the experiments.
Select a hypothesis based on experiment results.
Write up a paper.
Judge how good the paper is.
A system which can do all of the above is a narrower requirement than full general intelligence, especially if each task is decomposed into separate models, but many of these points seem impractical given what we currently know about RL with human feedback. Crucially, a PASTA system requires the ability to do these tasks autonomously in order to have any chance of transformatively accelerating scientific discovery—Ajeya specifies that Alex is an end-to-end system. It’s not sufficient, for example, to make a system which can generate a bunch of experiment or research ideas in text but relies on humans to evaluate them. In particular, I would identify 4, 5, 6, 8, and 10 as the key parts of the research loop which seem to require significant ML breakthroughs to achieve, and which I think may be general-intelligence-complete. One thing these tasks have in common is that good feedback is hard for humans, the rewards are sparse, and the episode length is potentially extremely long.
And yet current work on RL with human feedback shows that these techniques work well in highly constrained environments with relatively short episode length, and accurate & frequent reward signal and gradients in the human feedback. Sparse feedback immediately decreases performance significantly (see the ReQueST paper). To me, this suggests a significant number of fundamental breakthroughs remain to achieve PASTA, and that HFDT as described here does not have this capability, even if scaled + minor improvements.
Regarding my impression of the opinions of other ML researchers, I’m a PhD student at Cambridge and have spoken to many academics (both senior and junior), some people at DeepMind and some at Google Brain about their guesses for pathways to AGI, and the vast majority don’t seem to think that scaling current RL algorithms + LLMs + human feedback gets us much further towards AGI than we currently are, and they think there are many missing pieces of the puzzle. I’m not surprised to hear that OpenAI has a very different set of opinions though—the company has bet heavily on a particular approach to AGI so would naturally attract people with a different view to the one I’ve described.
One way to see this is to look at timelines to human level general intelligence—I think most ML researchers would not put this at within 10 or 20 years, based on previous surveys. Yet as Ajeya describes in the post, if training PASTA with baseline HFDT is possible, it seems very likely to happen within 10 years, and I agree with her that “the sooner transformative AI is developed, the more likely it is to be developed in roughly this way”. I think that if a full PASTA system can be made, then we are likely to have solved almost all of the bottlenecks to create an AGI. Therefore I think that the timelines conflict with the hypothesis that scaling current techniques which don’t require new fundamental breakthroughs is enough for PASTA.
I’m pretty unconvinced that your “suggests a significant number of fundamental breakthroughs remain to achieve PASTA” is strong enough to justify the odds being “approximately 0,” especially when the evidence is mostly just expecting tasks to stay hard as we scale (something which seems hard to predict, and easy to get wrong). Though it does seem that innovation in certain domains may lead to long episode lengths and inaccurate human evaluation, it also seems like innovation in certain fields (e.g., math) could easily not have this problem (i.e., in cases where verifying is much easier than solving).
I gave the comment a strong upvote because it’s super clear and informative. I also really appreciate it if people spell out their reasons for “scale in not all you need”, which doesn’t happen that often.
That said, I don’t agree with the argument or conclusion. Your argument, at least as stated, seems to be “tasks with the following criteria are hard for current RL with human feedback, so we’ll need significant fundamental breakthroughs”. The transformer was published 5 years ago. Back then, you could have used a very analogous argument about language models to argue that language models will never do this or that task; but for many of these tasks, language models can perform them now (emergent properties).
Thank you for the comment - it’s a fair point about the difficulty of prediction. In my post I attempted to point to some heuristics which suggest strongly to me that significant fundamental breakthroughs are needed. Other people have different heuristics. At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
I also think that even the original Attention Is All You Need paper gave some indication of the future direction by testing a large and small transformer and showing greatly improved performance with the large one, while RLHF’s early work does not appear to have a similar immediately obvious way to scale up and tackle the big RL challenges like sparse rewards, problems with long episode length, etc.
Sorry, what I wanted to say is it seems unclear if fundamental breakthroughs are needed. They might be needed, or not. I personally am pretty uncertain about this and think that both options are possible. I think it’s also possible that any breakthroughs that will happen won’t change the general picture described in the OP much.
I agree on the rest of your comment!