However, if σ<1, then a software-only intelligence explosion occurs only if ϕ>1. But if this condition held, we could get an intelligence explosion with constant, human-only research input. While not impossible, we find this condition fairly implausible.
Hmm, I think a software-only intelligence explosion is plausible even if σ<1, but without the implication that you can do it with human-only research input.
The basic idea is that when you double the efficiency of software, you can now:
Run twice as many experiments
Have twice as much cognitive labour
So both the inputs to software R&D double.
I think this corresponds to:
dA = A^phi F(A K_res, A K_inf)
And then you only need phi > 0 to get an intelligence explosion. Not phi > 1.
This is really an explosion in the efficiency at which you can run AI algorithms, but you could do that for a while and then quickly use your massive workforce to develop superintelligence, or start training your ultra-efficient algorithms using way more compute.
hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?
(Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can’t do this with human-only research input, which was maybe your main point.)
Agree that i wouldn’t particularly expect the efficiency curves to be the same.
But if the phi>0 for both types of efficiency, then I think this argument will still go through.
To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then:
The intution for this result is that when σ<1, you are bottlenecked by your slower growing sector.
If the slower growing sector is cognitive labor, then asympotically F∝Acog, and we get ˙A∝AϕcogcogAλcog so we have blow-up iff ϕcog+λ>1.
If the slower growing sector is experimental compute, then there are two cases. If experimental compute is blowing up on its own, then so is cogntive labor because by assumption cognitive labor is growing faster. If experimental compute is not blowing up on its own then asympotically F∝Aexp and we get ˙Acog∝AϕcogcogAλexp. Here we get a blow-up iff ϕcog>1.[1]
In contrast, if σ>1 then F is approximately the fastest growing sector. You get blow-up in both sectors if either sector blows up. Therefore, you get blow-up iff max{ϕcog+λ,ϕexp+λ}>1.
So if you accept this framing, complements vs substitutes only matters if some sectors are blowing up but not others. If all sectors have the returns to research high enough, then we get an intelligence explosion no matter what. This is an update for me, thanks!
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
I guess it’s not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don’t feel like I have a clear picture of the dynamics here, though.)
Note that if you accept this, our estimation of σ in the raw compute specification is wrong.
The cost-minimization problem becomes
minH,KwH+rKs.t.F(AK,H)=¯F.
Taking FOCs and re-arranging,
KH=σγ1−γ+σlnwAr
So our previous estimation equation was missing an A on the relative prices. Intuitively, we understated the degree to which compute was getting cheaper. Now A is hard to observe, but let’s just assume its growing exponentially with an 8 month doubling time per this Epoch paper.
Imputing this guess of A, and estimating via OLS with firm fixed effects gives us σ=.89 with .10standard errors.
Note that this doesn’t change the estimation results for the frontier experiments since the A in AKresAKtrain just cancels out.
This is a good point, we agree, thanks! Note that you need to assume that the algorithmic progress that gives you more effective inference compute is the same that gives you more effective research compute. This seems pretty reasonable but worth a discussion.
Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the AKresAKtrain so the A cancels out.[1]
You might be able to avoid this by adding the A’s in a less naive fashion. You don’t have to train larger models if you don’t want to. So perhaps you can freeze the frontier, and then you getAKresAfrozenKtrain? I need to think more about this point.
Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the AKresAKtrain so the A cancels out.
Yep, as you say in your footnote, you can choose to freeze the frontier, so you train models of a fixed capability using less and less compute (at least for a while).
Hmm, I think a software-only intelligence explosion is plausible even if σ<1, but without the implication that you can do it with human-only research input.
The basic idea is that when you double the efficiency of software, you can now:
Run twice as many experiments
Have twice as much cognitive labour
So both the inputs to software R&D double.
I think this corresponds to:
dA = A^phi F(A K_res, A K_inf)
And then you only need phi > 0 to get an intelligence explosion. Not phi > 1.
This is really an explosion in the efficiency at which you can run AI algorithms, but you could do that for a while and then quickly use your massive workforce to develop superintelligence, or start training your ultra-efficient algorithms using way more compute.
hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?
(Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can’t do this with human-only research input, which was maybe your main point.)
Agree that i wouldn’t particularly expect the efficiency curves to be the same.
But if the phi>0 for both types of efficiency, then I think this argument will still go through.
To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then:
dA_exp = A_exp^phi_exp F(A_exp K_res, A_cog K_inf)
dA_cog = A_cog^phi_cog F(A_exp K_res, A_cog K_inf)
And then I think you’ll find that, even with sigma < 1, it explodes when phi_exp>0 and phi_cog>0.
I spent a bit of time thinking about this today.
Lets adopt the notation in your comment and suppose that F(⋅) is the same across research sectors, with common λ. Let’s also suppose common σ<1.
Then we get blow up in Acog iff
{ϕcog+λ>1if ϕcog≤ϕexpmax{ϕcog,ϕexp+λ}>1if ϕcog>ϕexp
The intution for this result is that when σ<1, you are bottlenecked by your slower growing sector.
If the slower growing sector is cognitive labor, then asympotically F∝Acog, and we get ˙A∝AϕcogcogAλcog so we have blow-up iff ϕcog+λ>1.
If the slower growing sector is experimental compute, then there are two cases. If experimental compute is blowing up on its own, then so is cogntive labor because by assumption cognitive labor is growing faster. If experimental compute is not blowing up on its own then asympotically F∝Aexp and we get ˙Acog∝AϕcogcogAλexp. Here we get a blow-up iff ϕcog>1.[1]
In contrast, if σ>1 then F is approximately the fastest growing sector. You get blow-up in both sectors if either sector blows up. Therefore, you get blow-up iff max{ϕcog+λ,ϕexp+λ}>1.
So if you accept this framing, complements vs substitutes only matters if some sectors are blowing up but not others. If all sectors have the returns to research high enough, then we get an intelligence explosion no matter what. This is an update for me, thanks!
I’m only analyzing blow-up conditions here. You could get e.g. double exponential growth here by having ϕcog=1 and ϕexp+λ=1.
Nice!
I think that condition is equivalent to saying that A_cog explodes iff either
phi_cog + lambda > 1 and phi_exp + lambda > 1, or
phi_cog > 1
Where the second possibility is the unrealistic one where it could explode with just human input
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
OK I see the model there.
I guess it’s not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don’t feel like I have a clear picture of the dynamics here, though.)
Note that if you accept this, our estimation of σ in the raw compute specification is wrong.
The cost-minimization problem becomes
minH,KwH+rKs.t.F(AK,H)=¯F.
Taking FOCs and re-arranging,
KH=σγ1−γ+σlnwAr
So our previous estimation equation was missing an A on the relative prices. Intuitively, we understated the degree to which compute was getting cheaper. Now A is hard to observe, but let’s just assume its growing exponentially with an 8 month doubling time per this Epoch paper.
Imputing this guess of A, and estimating via OLS with firm fixed effects gives us σ=.89 with .10standard errors.
Note that this doesn’t change the estimation results for the frontier experiments since the A in AKresAKtrain just cancels out.
This is a good point, we agree, thanks! Note that you need to assume that the algorithmic progress that gives you more effective inference compute is the same that gives you more effective research compute. This seems pretty reasonable but worth a discussion.
Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the AKresAKtrain so the A cancels out.[1]
You might be able to avoid this by adding the A’s in a less naive fashion. You don’t have to train larger models if you don’t want to. So perhaps you can freeze the frontier, and then you getAKresAfrozenKtrain? I need to think more about this point.
Yep, as you say in your footnote, you can choose to freeze the frontier, so you train models of a fixed capability using less and less compute (at least for a while).
Also, updating this would change all the intelligence explosion conditions, not just when σ<1.