I agree with much of this. A few responses.
As I see it, there are a couple of different reasons to fit hyperbolic growth models — or, rather, models of form (dY/dt)/Y = aY^b + c — to historical growth data.
...
I think the distinction between testing a theory and testing a mathematical model makes sense, but the two are intertwined. A theory will tend naturally to to imply a mathematical model, but perhaps less so the other way around. So I would say Kremer is testing both a theory and and model—not confined to just one side of that dichotomy. Whereas as far as I can see the sum-of-exponentials model is, while intuitive, not so theoretically grounded. Taken literally, it says the seeds of every economic revolution that has occurred and will occur were present 12,000 years ago (or in Hanson (2000), 2 million years ago), and it’s just taking them a while to become measurable. I see no framework behind it that predicts how the system will evolve as a function of its current state rather than as a function of time. Ideally, the second would emerge from the first.
Note that what you call Kremer’s “Two Heads” model predates him. It’s in the endogenous growth theory of Romer (1986, 1990), which is an essential foundation for Kremer. And Romer is very much focused on the modern era, so it’s not clear to me that “For the purposes of testing Kremer’s theory, the pre-industrial (or perhaps even pre-1500) data is nearly all that matters.” Kuznets (1957) wrote about the contribution of “geniuses”—more people, more geniuses, faster progress. Julian Simon built on that idea in books and articles.
A lot of the reason I’m skeptical of Kremer’s model is that it doesn’t seem to fit very well with the accounts of economic historians and their descriptions of growth dynamics....it seems suspicious that the model leaves out all of the other salient differences that typically draw economic historians’ attention. Are changes in institutions, culture, modes of production, and energetic constraints really all secondary enough to be slipped into the error term?
Actually, I believe the standard understanding of “technology” in economics includes institutions, culture, etc.—whatever affects how much output a society wrings from a given amount of inputs. So all of those are by default in Kremer’s symbol for technology, A. And a lot of those things plausibly could improve faster, in the narrow sense of increasing productivity, if there are more people, if more people also means more societies (accidentally) experimenting with different arrangements and then setting examples for others; or if such institutional innovations are prodded along by innovations in technology in the narrower sense, such as the printing press.
Thank you Ben for this thoughtful and provocative review. As you know I inserted a bunch of comments on the Google doc. I’ve skimmed the dialog between you and Paul but haven’t absorbed all its details. I think I mostly agree with Paul. I’ll distill a few thoughts here.
1. The value of outside views
In a previous comment, Ben wrote:
Kahneman and Tversky showed that incorporating perspectives that neglect inside information (in this case the historical specifics of growth accelerations) can reduce our ignorance about the future—at least, the immediate future. This practice can improve foresight both formally—leading experts to take weighted averages of predictions based on inside and outside views—and informally—through the productive friction that occurs when people are challenged to reexamine assumptions. So while I think the feeling expressed in the quote is understandable, it’s also useful to challenge it.
Warning label: I think it’s best not to take the inside-outside distinction too seriously as a dichotomy, nor even as a spectrum. Both the “hyperbolic” and the sum-of-exponentials models are arguably outside views. Views based on the growth patterns of bacteria populations might also be considered outside views. Etc. So I don’t want to trap myself or anyone else into discussion about which views are outside ones, or more outsiderish. When we reason as perfect Bayesians (which we never do) we can update from all perspectives, however labeled or categorized.
2. On the statement of the Hyperbolic Growth Hypothesis
The current draft states the HGH as
I think this statement would be more useful if made less precise in one respect and more precise in another. I’ll explain first about what I perceive as its problematic precision.
In my paper I write a growth equation more or less as g_y = s * y ^ B where g_y is the growth rate of population or gross world product and ^ means exponentiation. If B = 0, then growth is exponential. If B = 1, then growth is proportional to the level, as in the HGH definition just above. In my reading, Ben’s paper focuses on testing (and ultimately rejecting) B = 1. I understand that one reason for this focus is that Kremer (1993) finds B = 1 for population history (as does von Foerster et al. (1960) though that paper is not mentioned).
But I think the important question is not whether B = 1 but whether B > 0. For if 0 < B < 1, growth is still superexponential and y still hits a singularity if projected forward. E.g., I estimate B = ~0.55 for GWP since 10,000 BCE. The B > 0 question is what connects most directly to the title of this post, “Does Economic History Point Toward a Singularity?” And as far as I can see a focus on whether B = 1 is immaterial to the substantive issue being debated in these comments, such as whether a model with episodic growth changes is better than one without. If we are focusing on whether B = 1, seemingly a better title for this post would be “Was Kremer (1993) wrong?”
To be clear, the paper seems to shift between two definitions of hyperbolic growth: usually it’s B = 1 (“proportional”), but in places it’s B > 0. I think the paper could easily be misunderstood to be rejecting B > 0 (superexponential growth/singularity in general) in places where it’s actually rejecting B = 1 (superexponential growth/singularity with a particular speed). This is the sense in which I’d prefer less specificity in the statement of the hyperbolic growth hypothesis.
I’ll explain where I’d ideally want more specificity in the next item.
3. The value of an explicit statistical model
We all recognize that the data are noisy, so that the only perfect model for any given series will have as many parameters as data points. What we’re after is a model that strikes a satisfying balance between parsimony (few parameters) and quality of fit. Accepting that, the question immediately arises: how do you measure quality of fit? This question rarely gets addressed systematically—not in Ben’s paper, not in the comments on this post, not in Hanson (2000), nor nearly all the rest of the literature. In fact Kremer (1993) is the only previous paper I’ve found that does proper econometrics—that’s explicit about its statistical model, as well as the methods used to fit it to data, the quality of fit, and validity of underlying assumptions such as independence of successive error terms.
And even Kremer’s model is not internally consistent because it doesn’t take into account how shocks in each decade, say, feed into the growth process to shape the probability distribution for growth over a century. That observation was the starting point for my own incremental contribution.
To be more concrete, look back at the qualifiers in the HGH statement: “tended to be roughly proportional.” Is the HGH, so stated, falsifiable? Or, more realistically, can it be assigned a p value? I think the answer is no, because there is no explicitly hypothesized, stochastic data generating process. The same can be asked of many statements in these comments, when people say a particular kind of model seems to fit history more or less well. It’s not fully clear what “better” would mean, nor what kind of data could falsify or strongly challenge any particular statement about goodness of fit.
I don’t want to be read as perfectionist about this. It’s really hard in this context to state a coherent, rigorously testable statistical model: the quantity of equations in my paper is proof. And at the end of the day, the data are so bad that it’s not obvious that fancy math gives us more insight than hand-wavy verbal debate.
I would suggest however that is important to understand the conceptual gap, just as we try to incorporate Bayesian thinking into our discourse even if we rarely engage in formal Bayesian updating. So I will elaborate.
Suppose I’m looking at a graph of population over time and want to fit a curve to it. I might declare that the one true model is
y = f(t) + e
where f is exponential or what-have-you, and e is an error term. It is common when talking about long-term population or GWP history to stop there. The problem with stopping there is that every model then fits. I could postulate that f is an S-curve, or the Manhattan skyline in profile, a fractal squiggle, etc. Sure, none of these f ’s fit the data perfectly, but I’ve got my error term e there to absorb the discrepancies. Formally my model fits the data exactly.
The logical flaw is the lack of characterization of e. Classically, we’d assume that all of the values of e are drawn independently from a shared probability distribution that has mean 0 and that is itself independent of t and previous values of y. These assumptions are embedded in standard regression methods, at least when we start quoting standard errors and p values. And these assumptions will be violated by most wrong models. For example, if the best-fit S-curve predicts essentially zero growth after 1950 while population actually keeps climbing, then after 1950 discrepancies between actual and fitted values—our estimates of e—will be systematically positive. They will be observably correlated with each other, not independent. This is why something that sounds technical, checking for serial correlation, can have profound implications for whether a model is structurally correct.
I believe this sort of fallacy is present in the current draft of Ben’s paper, where it says, “Kremer’s primary regression results don’t actually tell us anything that we didn’t already know: all they say is that the population growth rate has increased.” (emphasis in the original) Kremer in fact checks whether his modeling errors are independent and identically distributed. Leaving aside whether these checks are perfectly reassuring, I think the critique of the regressions is overdrawn. The counterexample developed in the current draft of Ben’s paper does not engage with the statistical properties of e.
More generally, without explicit assumptions about the distribution of e, discussions about the quality of various models can get bogged down. For then there is little rigorous sense in which one model is better than another. With such assumptions, we can say that the data are more likely under one model than under another.
4. I’m open to the hyperbolic model being too parsimonious
The possibility that growth accelerates episodically is quite plausible to me. And I’d put significant weight on the the episodes being entirely behind us. In fact my favorite part of Ben’s paper is where it gathers radiocarbon-dating research that suggests that “the” agricultural revolution, like the better-measured industrial revolution, brought distinct accelerations in various regions.
In my first attack on modeling long-term growth, I chose to put a lot of work into the simpler hyperbolic model because I saw an opportunity to improve is statistical expression, in particular by modeling how random growth shocks at each moment feed into the growth process and shape the probability distribution for growth over finite periods such as 10 years. Injecting stochasticity into the hyperbolic model seemed potentially useful for two reasons. For one, since adding dynamic stochasticity is hard, it seemed better to do it in a simpler model first.
For another, it allowed a rigorous test of whether second-order effects—the apparently episodic character of growth accelerations—could be parsimoniously viewed as mere noise within a simpler pattern of long-term acceleration. Within the particular structure of my model, the answer was no. For example, after being fit to the GWP data for 10,000 BCE to 1700 CE, my model is surprised at how high GWP was in 1820, assigning that outcome a p value of ~0.1. Ben’s paper presents similar findings, graphically.
So, sure, growth accelerations may be best seen as episodic.
But, as noted, it’s not clear that stipulating an episodic character should in itself shift one’s priors on the possibility of singularity-like developments. Hanson (2000)’s seminal articulation of the episodic view concludes that “From a purely empirical point of view, very large changes are actually to be expected within the next century.” He extrapolates from the statistics of past explosions (the few that we know of) to suggest that the next one will have a doubling time of days or weeks. He doesn’t pursue the logic further, but could have. The next revolution after that could come within days and have a doubling time of seconds. So despite departing from the hyperbolic model, we’re back to predicting a singularity.
And I’ve seen no parsimonious theory for episodic models, by which I mean one or more differential equations whose solutions yield episodic growth. Differential equations are important for expressing how the state of a system affects changes in that state.
Something I’m interested in now is how to rectify that within a stochastic framework. Is there an elegant way to simulate episodic, stochastic acceleration in technological progress?
My own view of growth prospects is at this point black swan-style (even if the popularizer of that term called me a “BSer”). A stochastic hyperbolic model generates fat-tailed distributions for future growth and GWP, ones that imply that the expected value of future output is infinite. Leavening a conventional, insider prediction of stagnation with even a tiny bit of that outside view suffices to fatten its tails, send its expectation to infinity, and, as a practical matter, raise the perceived odds of extreme outcomes.