Working on various aspects of Econ + AI.
Parker_Whitfill
On 2), the condition you find makes sense, but aren’t you implicitly assuming an elasticity of substitution of 1 with Cobb-Douglas?
Yes, definitely. In general, I don’t have a great idea about what looks like. The Cobb-Douglas case is just an example.
Yep. We are treating as homogenous (no differentiation in skill, speed, etc.) I’m interested in thinking about quality differentiation a bit more.
In complete generality, you could write effective labor as
.
That is, effective labor is some function of the number of human researchers we have, the effective inference compute we have (quantity of AIs we can run) and the effective training compute (quality of AIs we trained).
The perfect substitution claim is that once training compute is sufficiently high, then eventually we can spend the inference compute on running some AI that substitutes for human researchers. Mathematically, for some ,
where is the compute cost to run the system.
So you could think of our analysis as saying, once we have an AI that perfectly substitutes for AI researchers, what happens next?
Now of course, you might expect substantial recursive self-improvement even with an AI system that doesn’t perfectly substitute for AI labor. I think this is a super interesting and important question. I’m trying to think more about this question, but its hard to make progress because its unclear what looks like. But let me try to gesture at a few things. Let’s fix at some sub-human level
At the very least, you need some function that goes to infinity as A goes to infinity. For example, if there are certain tasks which must be done in AI research and these tasks can only be done by humans, then these tasks will always bottleneck progress.
If you assume say Cobb-Douglas, i.e.
where denotes the share of labor tasks that AI can do, then you’ll pick up another in the explosion condition i.e. will become This captures the intuition that as the fraction of tasks an AI can do increases, the explosion condition gets easier and easier to hit.
Here is a fleshed out version of Cheryl’s response. Lets suppose actual research capital is but we just used in our estimation equation.
Then the true estimation equation is
re-arranging we get
So if we regress on a constant and then the coefficient on is still as long as q is independent of .
Nevertheless, I think this should increase your uncertainty in our estimates because there is clearly a lot going on behind the scenes that we might not fully understand—like how is research vs. training compute measured, etc.
Note that if you accept this, our estimation of in the raw compute specification is wrong.
The cost-minimization problem becomes
.
Taking FOCs and re-arranging,
So our previous estimation equation was missing an A on the relative prices. Intuitively, we understated the degree to which compute was getting cheaper. Now A is hard to observe, but let’s just assume its growing exponentially with an 8 month doubling time per this Epoch paper.
Imputing this guess of A, and estimating via OLS with firm fixed effects gives us with standard errors.
Note that this doesn’t change the estimation results for the frontier experiments since the in just cancels out.
I spent a bit of time thinking about this today.
Lets adopt the notation in your comment and suppose that is the same across research sectors, with common . Let’s also suppose common .
Then we get blow up in iff
The intution for this result is that when , you are bottlenecked by your slower growing sector.
If the slower growing sector is cognitive labor, then asympotically , and we get so we have blow-up iff .
If the slower growing sector is experimental compute, then there are two cases. If experimental compute is blowing up on its own, then so is cogntive labor because by assumption cognitive labor is growing faster. If experimental compute is not blowing up on its own then asympotically and we get . Here we get a blow-up iff .[1]
In contrast, if then F is approximately the fastest growing sector. You get blow-up in both sectors if either sector blows up. Therefore, you get blow-up iff .
So if you accept this framing, complements vs substitutes only matters if some sectors are blowing up but not others. If all sectors have the returns to research high enough, then we get an intelligence explosion no matter what. This is an update for me, thanks!
- ^
I’m only analyzing blow-up conditions here. You could get e.g. double exponential growth here by having and .
- ^
Also, updating this would change all the intelligence explosion conditions, not just when .
Yep, I think this gets the high-level dynamics driving the results right.
Thanks for the clarification. We updated the post accordingly.
This is a good point, we agree, thanks! Note that you need to assume that the algorithmic progress that gives you more effective inference compute is the same that gives you more effective research compute. This seems pretty reasonable but worth a discussion.
Although note that this argument works only with the CES in compute formulation. For the CES in frontier experiments, you would have the so the A cancels out.[1]
- ^
You might be able to avoid this by adding the A’s in a less naive fashion. You don’t have to train larger models if you don’t want to. So perhaps you can freeze the frontier, and then you get? I need to think more about this point.
- ^
Thanks for the insightful comment.
I take your overall point as the static optimization problem may not be properly specified. For example, costs may not be linear in labor size because of adjustment costs to growing very quickly or costs may not be linear in compute because of bulk discounting. Moreover, these non-linear costs may be changing over time (e.g., adjustment costs might only matter in 2021-2024 as OpenAI, Anthropic have been scaling labor aggressively). I agree that this would bias the estimate of . Given the data we have, there should be some way to at least partially deal with this (e.g., by adding lagged labor as a control). I’ll have to think about it more.
On some of the smaller comments:
wages/r_{research} is around 0.28 (maybe you have better data here)
The best data we have is The Information’s article that OpenAI spent $700M on salaries and $1000M on research compute in 2024, so the (assuming you meant instead of ).
The whole industry is much larger now and elasticity of substitution might not be constant; if so this is worrying because to predict whether there’s a software-only singularity we’ll need to extrapolate over more orders of magnitude of growth and the human labor → AI labor transition.
I agree. might not be constant over time, which is a problem for both estimation/extrapolation and also predicting what an intelligence explosion might look like. For example, if falls over time, then we may have a foom for a bit until falls below 1 and then fizzles. I’ve been thinking about writing something up about this.
Are you planning follow-up work, or is there other economic data we could theoretically collect that could give us higher confidence estimates?
Yes, although we’re not decided yet on what is the most useful to follow-up on. Very short-term there is trying to accomodate non-linear pricing. Of course, data on what non-linear pricing looks like would be helpful e.g., how does Nvidia bulk discount.
We also may try to estimate with the data we have.
Estimating the Substitutability between Compute and Cognitive Labor in AI Research
Great paper, as always Phil.
I’m curious to hear your thoughts a bit more about if we can salvage SWE by introducing non-standard preferences.
Minor quibble: “There is then no straightforward sense in which economic growth has historically been exponential, the central stylized fact which SWE and semi-endogenous models both seek to explain”
I agree that there is no consumption aggregate under non-homothetic preferences, but we can still say economic growth has been exponential in the sense that GDP growth is exponential. Perhaps it is not a very meaningful number under non-homothetic preferences, as you have argued elsewhere, but it still exists. Do you have thoughts on why GDP growth has been exponential in a model without a consumption aggregate?
Parker_Whitfill’s Quick takes
People often appeal to Intelligence Explosion/Recursive Self-Improvement as some win-condition for current model developers e.g. Dario argues Recursive Self-Improvement could enshrine the US’s lead over China.
This seems non-obvious to me. For example, suppose OpenAI trains GPT 6 which trains GPT 7 which trains GPT 8. Then a fast follower could take GPT 8 and then use it to train GPT 9. In this case, the fast follower has a lead and has spent far less on R&D (since they didn’t have to develop GPT 7 or 8 themselves).
I guess people are thinking that OpenAI will be able to ban GPT 8 from helping competitors? But has anyone argued for why they would be able to do that (either legally or technically)?
Is the alignment motivation distinct from just using AI to solve general bargaining problems?
Here is a counterargument: focusing on the places where there is altruistic alpha is ‘defecting’ against other value systems. See discussion here
Roughly buy that there is more “alpha” in making the future better because most people are not longtermist but most people do want to avoid extinction.
Good point, but can’t this trade occur just through financial markets without involving 1 on 1 trades among EAs? For example, if you have short timelines, you could take out a loan, donate it all to AI Safety.
We are still working on getting a more official version of this on Arvix, possibly with estimates for λ and ϕ.
When we do that, we’ll also upload full replication files. But I don’t want to keep anyone waiting for the data in case they have some uses for it, so see here for the main CSV we used: https://github.com/parkerwhitfill/EOS_AI