(I’m not at all an expert on any of this, please discount appropriately)
Agree with reasoning for directional adjustment and bounds, magnitude-wise seems a bit overcorrected? SemiAnalysis’ figures roughly suggest 15M center. But you’re on track to becoming correct given token efficiency trends anyhow
I wish I had a more empirically-grounded sense of how token usage varies by type of task, fixing task duration at 8 hours for a human professional (that you’d pay $400/day for, say). My guess from comparing model vs human jaggedness (e.g. this) is that leadership-level / early-employee / entrepreneurial / high-context / taste-heavy work would require way more tokens to get 8 hours of work done than the routine analyst-type / junior SWE etc tasks typical of benchmarks
My sense is global average cost per token will go down a lot due to the following, but very unclear as to the mix
a key driver of inference demand going forward being very cache tokens-heavy agentic workflows
a rising share of demand being satisficing not maximising w.r.t output quality for ever-growing task share (e.g. plan with Opus → code with Sonnet or even DeepSeek models at 1-2 OOM cheaper price point)
(I’m not at all an expert on any of this, please discount appropriately)
Agree with reasoning for directional adjustment and bounds, magnitude-wise seems a bit overcorrected? SemiAnalysis’ figures roughly suggest 15M center. But you’re on track to becoming correct given token efficiency trends anyhow
I wish I had a more empirically-grounded sense of how token usage varies by type of task, fixing task duration at 8 hours for a human professional (that you’d pay $400/day for, say). My guess from comparing model vs human jaggedness (e.g. this) is that leadership-level / early-employee / entrepreneurial / high-context / taste-heavy work would require way more tokens to get 8 hours of work done than the routine analyst-type / junior SWE etc tasks typical of benchmarks
My sense is global average cost per token will go down a lot due to the following, but very unclear as to the mix
a key driver of inference demand going forward being very cache tokens-heavy agentic workflows
a rising share of demand being satisficing not maximising w.r.t output quality for ever-growing task share (e.g. plan with Opus → code with Sonnet or even DeepSeek models at 1-2 OOM cheaper price point)
race to the bottom pricing wars (DeepSeek again)