I gave the comment a strong upvote because it’s super clear and informative. I also really appreciate it if people spell out their reasons for “scale in not all you need”, which doesn’t happen that often.
That said, I don’t agree with the argument or conclusion. Your argument, at least as stated, seems to be “tasks with the following criteria are hard for current RL with human feedback, so we’ll need significant fundamental breakthroughs”. The transformer was published 5 years ago. Back then, you could have used a very analogous argument about language models to argue that language models will never do this or that task; but for many of these tasks, language models can perform them now (emergent properties).
Thank you for the comment - it’s a fair point about the difficulty of prediction. In my post I attempted to point to some heuristics which suggest strongly to me that significant fundamental breakthroughs are needed. Other people have different heuristics. At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
I also think that even the original Attention Is All You Need paper gave some indication of the future direction by testing a large and small transformer and showing greatly improved performance with the large one, while RLHF’s early work does not appear to have a similar immediately obvious way to scale up and tackle the big RL challenges like sparse rewards, problems with long episode length, etc.
At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
Sorry, what I wanted to say is it seems unclear if fundamental breakthroughs are needed. They might be needed, or not. I personally am pretty uncertain about this and think that both options are possible. I think it’s also possible that any breakthroughs that will happen won’t change the general picture described in the OP much.
I gave the comment a strong upvote because it’s super clear and informative. I also really appreciate it if people spell out their reasons for “scale in not all you need”, which doesn’t happen that often.
That said, I don’t agree with the argument or conclusion. Your argument, at least as stated, seems to be “tasks with the following criteria are hard for current RL with human feedback, so we’ll need significant fundamental breakthroughs”. The transformer was published 5 years ago. Back then, you could have used a very analogous argument about language models to argue that language models will never do this or that task; but for many of these tasks, language models can perform them now (emergent properties).
Thank you for the comment - it’s a fair point about the difficulty of prediction. In my post I attempted to point to some heuristics which suggest strongly to me that significant fundamental breakthroughs are needed. Other people have different heuristics. At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely.
I also think that even the original Attention Is All You Need paper gave some indication of the future direction by testing a large and small transformer and showing greatly improved performance with the large one, while RLHF’s early work does not appear to have a similar immediately obvious way to scale up and tackle the big RL challenges like sparse rewards, problems with long episode length, etc.
Sorry, what I wanted to say is it seems unclear if fundamental breakthroughs are needed. They might be needed, or not. I personally am pretty uncertain about this and think that both options are possible. I think it’s also possible that any breakthroughs that will happen won’t change the general picture described in the OP much.
I agree on the rest of your comment!