Mo Putera comments on AI energy forecasts may be missing large-scale inference demand

Mo Putera 27 May 2026 9:23 UTC
8 points
1 ∶ 0
(I’m not at all an expert on any of this, please discount appropriately)
1. Agree with reasoning for directional adjustment and bounds, magnitude-wise seems a bit overcorrected? SemiAnalysis’ figures roughly suggest 15M center. But you’re on track to becoming correct given token efficiency trends anyhow
  1. I wish I had a more empirically-grounded sense of how token usage varies by type of task, fixing task duration at 8 hours for a human professional (that you’d pay $400/day for, say). My guess from comparing model vs human jaggedness (e.g. this) is that leadership-level / early-employee / entrepreneurial / high-context / taste-heavy work would require way more tokens to get 8 hours of work done than the routine analyst-type / junior SWE etc tasks typical of benchmarks
2. My sense is global average cost per token will go down a lot due to the following, but very unclear as to the mix
  1. a key driver of inference demand going forward being very cache tokens-heavy agentic workflows
  2. a rising share of demand being satisficing not maximising w.r.t output quality for ever-growing task share (e.g. plan with Opus → code with Sonnet or even DeepSeek models at 1-2 OOM cheaper price point)
  3. race to the bottom pricing wars (DeepSeek again)