Strong agree that absent new approaches the tailwind isn’t enough—but it seems unclear that pretraining scaling doesn’t have farther to go, and it seems that current approaches with synthetic data and training via RL to enhance one-shot performance have room left for significant improvement.
I also don’t know how much room there is left until we hit genius level AGI or beyond, and at that point even if we hit a wall, more scaling isn’t required, as the timeline basically ends.
Strong agree that absent new approaches the tailwind isn’t enough—but it seems unclear that pretraining scaling doesn’t have farther to go, and it seems that current approaches with synthetic data and training via RL to enhance one-shot performance have room left for significant improvement.
I also don’t know how much room there is left until we hit genius level AGI or beyond, and at that point even if we hit a wall, more scaling isn’t required, as the timeline basically ends.