I am not sure what this footnote means. The “cost of training per 1m tokens” is a very weird unit to talk about, since it depends on the model size and the GPU efficiency. I strongly suspect you meant to write something else and got mixed up.
I am not sure what this footnote means. The “cost of training per 1m tokens” is a very weird unit to talk about, since it depends on the model size and the GPU efficiency. I strongly suspect you meant to write something else and got mixed up.