A few points against this being nearly as crazy as the comparisons suggest:
GPT-2030 may learn much less sample efficiently, and much less compute efficiently, than humans. In fact, this is pretty likely. Ball-parking, humans do 1e24 FLOP before they’re 30, which is ~20X less than GPT-4. And we learn languages/maths from way fewer data points. So the actual rate at which GPT-2030 itself gets smarter will be lower than the rates implied.
This is a sense of “learn” as in “improves its own understanding”. There’s another sense which is “produces knowledge for the rest of the world to use, eg science papers” where I think your comparisons are right.
Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won’t help. This could make the conclusion much less extreme.
Learning may also be bottlenecked by experiments in the real world, which may not immediately get much faster.
Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won’t help. This could make the conclusion much less extreme.
Do you have any examples in mind of domains where we might expect this? I’ve heard people say things like ‘some maths problems require serial thinking time’, but I still feel pretty vague about this and don’t have much intuition about how strongly to expect it to bite.
I like the vividness of the comparisons!
A few points against this being nearly as crazy as the comparisons suggest:
GPT-2030 may learn much less sample efficiently, and much less compute efficiently, than humans. In fact, this is pretty likely. Ball-parking, humans do 1e24 FLOP before they’re 30, which is ~20X less than GPT-4. And we learn languages/maths from way fewer data points. So the actual rate at which GPT-2030 itself gets smarter will be lower than the rates implied.
This is a sense of “learn” as in “improves its own understanding”. There’s another sense which is “produces knowledge for the rest of the world to use, eg science papers” where I think your comparisons are right.
Learning may be bottlenecked by serial thinking time past a certain point, after which adding more parallel copies won’t help. This could make the conclusion much less extreme.
Learning may also be bottlenecked by experiments in the real world, which may not immediately get much faster.
Thanks, I think these points are good.
Do you have any examples in mind of domains where we might expect this? I’ve heard people say things like ‘some maths problems require serial thinking time’, but I still feel pretty vague about this and don’t have much intuition about how strongly to expect it to bite.