Executive summary: The post argues that claims about a software intelligence explosion rest on weak data and misspecified models, finds suggestive but highly uncertain evidence for increasing returns to AI software R&D, and concludes that targeted experiments are the best way to make progress.
Key points:
Prior estimates of “returns to R&D” r using the Jones model show substantial probability mass above 1 in computer vision, reinforcement learning, and language models, but uncertainties are large and straddle the r = 1 threshold.
Input measures are flawed due to spillovers and proxy choices: paper counts miss cross-domain and academic-to-lab contributions, and lab-only measures of labor and compute omit external research.
Output measures are also weak, relying on average growth rates and assumptions like constant software progress, with possible accelerations (e.g., reasoning models) not cleanly captured.
Applying the Jones model far outside observed regimes likely misleads: real research faces compute bottlenecks and limited parallelizability, so naive scaling of “number of researchers” can overstate progress.
A lab-based proxy implies r = λ/β equals research output over inputs with gK ≈ 1.3, gL ≈ 0.85, ϵK ≈ 0.67, and gA ≈ 1.1; accounting for compute bottlenecks (Cobb-Douglas with ϵK ≈ 2⁄3) cuts effective returns by ~3×, placing them below 1.
The authors propose experiments—e.g., isolating the role of data in software gains, randomized compute budgets, and scaling studies—to directly test key assumptions and reduce model/data confounds.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: The post argues that claims about a software intelligence explosion rest on weak data and misspecified models, finds suggestive but highly uncertain evidence for increasing returns to AI software R&D, and concludes that targeted experiments are the best way to make progress.
Key points:
Prior estimates of “returns to R&D” r using the Jones model show substantial probability mass above 1 in computer vision, reinforcement learning, and language models, but uncertainties are large and straddle the r = 1 threshold.
Input measures are flawed due to spillovers and proxy choices: paper counts miss cross-domain and academic-to-lab contributions, and lab-only measures of labor and compute omit external research.
Output measures are also weak, relying on average growth rates and assumptions like constant software progress, with possible accelerations (e.g., reasoning models) not cleanly captured.
Applying the Jones model far outside observed regimes likely misleads: real research faces compute bottlenecks and limited parallelizability, so naive scaling of “number of researchers” can overstate progress.
A lab-based proxy implies r = λ/β equals research output over inputs with gK ≈ 1.3, gL ≈ 0.85, ϵK ≈ 0.67, and gA ≈ 1.1; accounting for compute bottlenecks (Cobb-Douglas with ϵK ≈ 2⁄3) cuts effective returns by ~3×, placing them below 1.
The authors propose experiments—e.g., isolating the role of data in software gains, randomized compute budgets, and scaling studies—to directly test key assumptions and reduce model/data confounds.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.