Tom_Davidson comments on Estimating the Substitutability between Compute and Cognitive Labor in AI Research

Tom_Davidson 2 Jun 2025 11:42 UTC
8 points
2 ∶ 0
However, if $σ < 1$ , then a software-only intelligence explosion occurs only if $ϕ > 1$ . But if this condition held, we could get an intelligence explosion with constant, human-only research input. While not impossible, we find this condition fairly implausible.
Hmm, I think a software-only intelligence explosion is plausible even if $σ < 1$ , but without the implication that you can do it with human-only research input.
The basic idea is that when you double the efficiency of software, you can now:
- Run twice as many experiments
- Have twice as much cognitive labour
So both the inputs to software R&D double.
I think this corresponds to:
dA = A^phi F(A K_res, A K_inf)
And then you only need phi > 0 to get an intelligence explosion. Not phi > 1.
This is really an explosion in the efficiency at which you can run AI algorithms, but you could do that for a while and then quickly use your massive workforce to develop superintelligence, or start training your ultra-efficient algorithms using way more compute.
- Owen Cotton-Barratt 2 Jun 2025 14:12 UTC
  4 points
  0 ∶ 0
  Parent
  hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?
  (Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can’t do this with human-only research input, which was maybe your main point.)
  - Tom_Davidson 2 Jun 2025 17:24 UTC
    6 points
    0 ∶ 0
    Parent
    Agree that i wouldn’t particularly expect the efficiency curves to be the same.
    But if the phi>0 for both types of efficiency, then I think this argument will still go through.
    To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then:
    dA_exp = A_exp^phi_exp F(A_exp K_res, A_cog K_inf)
    dA_cog = A_cog^phi_cog F(A_exp K_res, A_cog K_inf)
    And then I think you’ll find that, even with sigma < 1, it explodes when phi_exp>0 and phi_cog>0.
    - Parker_Whitfill 3 Jun 2025 3:02 UTC
      6 points
      0 ∶ 0
      Parent
      I spent a bit of time thinking about this today.
      Lets adopt the notation in your comment and suppose that $F (\cdot)$ is the same across research sectors, with common $λ$ . Let’s also suppose common $σ < 1$ .
      Then we get blow up in $A_{c o g}$ iff
      ${\begin{matrix} ϕ_{c o g} + λ > 1 & if ϕ_{c o g} \leq ϕ_{e x p} max {ϕ_{c o g}, ϕ_{e x p} + λ} > 1 & if ϕ_{c o g} > ϕ_{e x p} \end{matrix}$
      The intution for this result is that when $σ < 1$ , you are bottlenecked by your slower growing sector.
      If the slower growing sector is cognitive labor, then asympotically $F \propto A_{c o g}$ , and we get $˙ A \propto A_{c o g}^{ϕ_{c o g}} A_{c o g}^{λ}$ so we have blow-up iff $ϕ_{c o g} + λ > 1$ .
      If the slower growing sector is experimental compute, then there are two cases. If experimental compute is blowing up on its own, then so is cogntive labor because by assumption cognitive labor is growing faster. If experimental compute is not blowing up on its own then asympotically $F \propto A_{e x p}$ and we get ${˙ A}_{c o g} \propto A_{c o g}^{ϕ_{c o g}} A_{e x p}^{λ}$ . Here we get a blow-up iff $ϕ_{c o g} > 1$ .^[1]
      In contrast, if $σ > 1$ then F is approximately the fastest growing sector. You get blow-up in both sectors if either sector blows up. Therefore, you get blow-up iff $max {ϕ_{c o g} + λ, ϕ_{e x p} + λ} > 1$ .
      So if you accept this framing, complements vs substitutes only matters if some sectors are blowing up but not others. If all sectors have the returns to research high enough, then we get an intelligence explosion no matter what. This is an update for me, thanks!
      ^
      I’m only analyzing blow-up conditions here. You could get e.g. double exponential growth here by having $ϕ_{c o g} = 1$ and $ϕ_{e x p} + λ = 1$ .
      - Tom_Davidson 3 Jun 2025 7:51 UTC
        5 points
        1 ∶ 0
        Parent
        Nice!
        I think that condition is equivalent to saying that A_cog explodes iff either
        phi_cog + lambda > 1 and phi_exp + lambda > 1, or
        phi_cog > 1
        Where the second possibility is the unrealistic one where it could explode with just human input
  - Thomas Kwa 2 Jun 2025 15:50 UTC
    4 points
    0 ∶ 0
    Parent
    If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
    - Owen Cotton-Barratt 2 Jun 2025 16:30 UTC
      4 points
      0 ∶ 0
      Parent
      OK I see the model there.
      I guess it’s not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
      
      However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don’t feel like I have a clear picture of the dynamics here, though.)
- Parker_Whitfill 4 Jun 2025 6:15 UTC
  3 points
  0 ∶ 0
  Parent
  Note that if you accept this, our estimation of $σ$ in the raw compute specification is wrong.
  The cost-minimization problem becomes
  ${min}_{H, K} w H + r K s . t . F (A K, H) = ¯ F$ .
  Taking FOCs and re-arranging,
  $\frac{K}{H} = σ \frac{γ}{1 - γ} + σ ln \frac{w A}{r}$
  So our previous estimation equation was missing an A on the relative prices. Intuitively, we understated the degree to which compute was getting cheaper. Now A is hard to observe, but let’s just assume its growing exponentially with an 8 month doubling time per this Epoch paper.
  Imputing this guess of A, and estimating via OLS with firm fixed effects gives us $σ = .89$ with $.10$ standard errors.
  Note that this doesn’t change the estimation results for the frontier experiments since the $A$ in $\frac{A K_{r e s}}{A K_{t r a i n}}$ just cancels out.
- Parker_Whitfill 2 Jun 2025 12:24 UTC
  3 points
  0 ∶ 0
  Parent
  This is a good point, we agree, thanks! Note that you need to assume that the algorithmic progress that gives you more effective inference compute is the same that gives you more effective research compute. This seems pretty reasonable but worth a discussion.
  Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the $\frac{A K_{r e s}}{A K_{t r a i n}}$ so the A cancels out.^[1]
  1. ^
    You might be able to avoid this by adding the A’s in a less naive fashion. You don’t have to train larger models if you don’t want to. So perhaps you can freeze the frontier, and then you get $\frac{A K_{r e s}}{A_{f r o z e n} K_{t r a i n}}$ ? I need to think more about this point.
  - Tom_Davidson 2 Jun 2025 17:19 UTC
    2 points
    0 ∶ 0
    Parent
    Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the $\frac{A K_{r e s}}{A K_{t r a i n}}$ so the A cancels out.
    Yep, as you say in your footnote, you can choose to freeze the frontier, so you train models of a fixed capability using less and less compute (at least for a while).
  - Parker_Whitfill 2 Jun 2025 12:38 UTC
    1 point
    0 ∶ 0
    Parent
    Also, updating this would change all the intelligence explosion conditions, not just when $σ < 1$ .