The crux for me is I don’t agree that compute scaling has dramatically changed, because I don’t think pre-training scaling has gotten much worse returns.
The crux for me is I don’t agree that compute scaling has dramatically changed, because I don’t think pre-training scaling has gotten much worse returns.