Benjamin_Todd comments on Teaching AI to reason: this year’s most important story

Benjamin_Todd 14 Feb 2025 18:56 UTC
4 points
0 ∶ 0
Glad it’s useful! I categorise RL on chain of thought as a type of post-training, rather than test time compute. (Sometimes people lump them together as both ‘inference scaling’, but I think that’s confusing.) I agree RL opens up novel capabilities you can’t get just from next token prediction on the internet.