Nitpick: It’s fairly unlikely that GPT-4 is 1tn params; this size doesn’t seem compute-optimal. I grant you the Semafor assertion is some evidence, but I’m putting more weight on compute arithmetic.
Nitpick: It’s fairly unlikely that GPT-4 is 1tn params; this size doesn’t seem compute-optimal. I grant you the Semafor assertion is some evidence, but I’m putting more weight on compute arithmetic.