This is my understanding too – some crucial questions going forward:
How useful are AIs that are mainly good at these verifiable tasks?
How much does getting better at reasoning on these verifiable tasks generalise to other domains? (It seems like at least a bit e.g. o1 improved at law)
How well will reinforcement learning work when applied at scale to areas with weaker reward signals?
Thanks this is helpful.