But am I reading right that that one doesn’t push through to a concrete demonstration of impacts on expected values of interventions?
David Roodman
Interesting question! Certainly it is the nonlinearities in the cost-effectiveness analysis that makes uncertainty matter to an expected value maximizer. If we thought that the cost-effectiveness of an intervention was best modeled as the sum of two uncertain variables (a simple example of a linear model), then the expected value of the intervention would be the sum of the expected values of the two variables. Their uncertainty would not matter.
The most serious effort I know of to incorporate uncertainty into the GiveWell cost-effectiveness analysis is this post by Sam Nolan, Hannah Rokebrand, and Tanae Rao. I was surprised at how little it changed the expected values—I think by a typical 10-15%, but I’m finding it a little hard to tell.
I think when the denominator is cost rather than impact of school construction on years of schooling, our uncertainty range is less likely to put much weight on the possibility that the true value is zero. Cost might even be modeled as log normal, so that it can never be zero. In this case, there would be little weight on ~infinite cost-effectiveness.
Good question! I think it brings out a couple of subtleties.
First, in putting forward the “universal” truth that education and earnings go hand-in-hand, I mean that this is a regularly found society-level statistical association. It does not mean that it holds true for every individual and it does not take a position on causality. So I don’t mean to imply that if a particular child is made to go to school more this will “universally” cause the kid to earn more later.
In particular, while I think poorer regencies got more schools and that increased primary school completion, and while I believe that in general education and earnings go together, I’m much less sure that this particular exogenous perturbation in life paths had much impact. (One reason, which falls outside the study, you allude to: I don’t know if they learned much.)
Of course, it is entirely plausible that getting more kids into school will cause them to earn more later. Which brings me to the second subtlety. It would be easy to read this post (and perhaps I need to edit it to make it harder) as judging whether there’s any affect: does education raise wages. But I want it to be read as asking, given all we know about the world outside the Duflo study, should the Duflo study cause us to update our views much about the size of the impact of education on earnings?
I think the paper comes off as rather confident that it should persuade skeptics—see the block quote toward the end of my post. That is, if you were generally skeptical that education has much effect on earnings before reading the paper, it should change your mind, according to its author, because of the techniques it uses to isolate one causal link, and the precision of the resulting estimates. When I wrote “I wound up skeptical that the paper made its case,” this is what I was referring to. If you generally think education increases earnings on average because of all you know about the world, or if you think the opposite, this paper probably shouldn’t move you much.
I can do both—economics has a long tradition of accepting circulation of “working papers” or “preprints” without jeopardizing publication in journals. In fact, Esther suggested I hold off submitting a comment to the AER until she steps down as editor in a few weeks.
Actually, if clustering was not as common in 2001 as now, it was not rare. She clustered standard errors in the other two chapters in her thesis. My future colleague Mead Over coauthored a program in 1996 for clustering standard errors in instrumental variables regressions in Stata.
Hi Ozzie. I’m out of my depth here, but what I had in mind was the Uwezo program at one of my “this” links, which I believe was inspired by Pratham in India. I think these organizations originally gained fame for conducting their own surveys of how much (or little) children were actually learning, in an attempt to hold the education system accountable for results.
But that’s surely just a small part of a large topic, how a citizenry holds a public bureaucracy more accountable. Specific solutions include “democracy”… You know, so just do that.
I should say that there is a strong and arguably opposing view, embodied by the evidence-based Teaching at the Right Level approach. The idea is to completely script what teachers do every day. It’s very top-down.
Thanks for the feedback. I can see why that is confusing. You figured it out. I inserted a couple of sentences before the first table to clarify. And I changed “young-old pay gap” to “old-young pay gap” because I think the hyphen reads, at least subliminally, like a minus sign.
Thanks @MHR.
1. Is exactly the right question. My work is just one input to answering it. My coworkers are confronting it more directly, but I think nothing is public at this point. My gut is that the result is broadly representative and that expanding schooling supply alone is often pushing on a string. It is well documented that in many primary schools in poor countries, kids are learning pitifully little. Dig into the question of why, and it has do with lack of accountability of teachers and school systems for results, which in turn has to do with the distribution of power in society. That is not easily changed. But nor is it hopeless (this, this, this), so the problem is also potentially an opportunity.
2. Whoops! The table’s header row got chopped during editing. I fixed it.
Hi Karthik. Without belaboring shades of emphasis, I basically agree with you. But you know, I’ve just spent thousands of words criticizing someone’s work and I want to end positively, within reason.
Does putting kids in school now put money in their pockets later? Revisiting a natural experiment in Indonesia
I agree with much of this. A few responses.
As I see it, there are a couple of different reasons to fit hyperbolic growth models — or, rather, models of form (dY/dt)/Y = aY^b + c — to historical growth data.
...
I think the distinction between testing a theory and testing a mathematical model makes sense, but the two are intertwined. A theory will tend naturally to to imply a mathematical model, but perhaps less so the other way around. So I would say Kremer is testing both a theory and and model—not confined to just one side of that dichotomy. Whereas as far as I can see the sum-of-exponentials model is, while intuitive, not so theoretically grounded. Taken literally, it says the seeds of every economic revolution that has occurred and will occur were present 12,000 years ago (or in Hanson (2000), 2 million years ago), and it’s just taking them a while to become measurable. I see no framework behind it that predicts how the system will evolve as a function of its current state rather than as a function of time. Ideally, the second would emerge from the first.
Note that what you call Kremer’s “Two Heads” model predates him. It’s in the endogenous growth theory of Romer (1986, 1990), which is an essential foundation for Kremer. And Romer is very much focused on the modern era, so it’s not clear to me that “For the purposes of testing Kremer’s theory, the pre-industrial (or perhaps even pre-1500) data is nearly all that matters.” Kuznets (1957) wrote about the contribution of “geniuses”—more people, more geniuses, faster progress. Julian Simon built on that idea in books and articles.
A lot of the reason I’m skeptical of Kremer’s model is that it doesn’t seem to fit very well with the accounts of economic historians and their descriptions of growth dynamics....it seems suspicious that the model leaves out all of the other salient differences that typically draw economic historians’ attention. Are changes in institutions, culture, modes of production, and energetic constraints really all secondary enough to be slipped into the error term?
Actually, I believe the standard understanding of “technology” in economics includes institutions, culture, etc.—whatever affects how much output a society wrings from a given amount of inputs. So all of those are by default in Kremer’s symbol for technology, A. And a lot of those things plausibly could improve faster, in the narrow sense of increasing productivity, if there are more people, if more people also means more societies (accidentally) experimenting with different arrangements and then setting examples for others; or if such institutional innovations are prodded along by innovations in technology in the narrower sense, such as the printing press.
Thank you Ben for this thoughtful and provocative review. As you know I inserted a bunch of comments on the Google doc. I’ve skimmed the dialog between you and Paul but haven’t absorbed all its details. I think I mostly agree with Paul. I’ll distill a few thoughts here.
1. The value of outside views
In a previous comment, Ben wrote:
My general feeling towards the evolution of the economy over the past ten thousand years, reading historical analysis, is something like: “Oh wow, this seems really complex and heterogeneous. It’d be very surprising if we could model these processes well with a single-variable model, a noise term, and a few parameters with stable values.” It seems to me like we may in fact just be very ignorant.
Kahneman and Tversky showed that incorporating perspectives that neglect inside information (in this case the historical specifics of growth accelerations) can reduce our ignorance about the future—at least, the immediate future. This practice can improve foresight both formally—leading experts to take weighted averages of predictions based on inside and outside views—and informally—through the productive friction that occurs when people are challenged to reexamine assumptions. So while I think the feeling expressed in the quote is understandable, it’s also useful to challenge it.
Warning label: I think it’s best not to take the inside-outside distinction too seriously as a dichotomy, nor even as a spectrum. Both the “hyperbolic” and the sum-of-exponentials models are arguably outside views. Views based on the growth patterns of bacteria populations might also be considered outside views. Etc. So I don’t want to trap myself or anyone else into discussion about which views are outside ones, or more outsiderish. When we reason as perfect Bayesians (which we never do) we can update from all perspectives, however labeled or categorized.
2. On the statement of the Hyperbolic Growth Hypothesis
The current draft states the HGH as
For most of human history, up until the twentieth century, the economic growth rate has tended to be roughly proportional to the level of economic output.
I think this statement would be more useful if made less precise in one respect and more precise in another. I’ll explain first about what I perceive as its problematic precision.
In my paper I write a growth equation more or less as g_y = s * y ^ B where g_y is the growth rate of population or gross world product and ^ means exponentiation. If B = 0, then growth is exponential. If B = 1, then growth is proportional to the level, as in the HGH definition just above. In my reading, Ben’s paper focuses on testing (and ultimately rejecting) B = 1. I understand that one reason for this focus is that Kremer (1993) finds B = 1 for population history (as does von Foerster et al. (1960) though that paper is not mentioned).
But I think the important question is not whether B = 1 but whether B > 0. For if 0 < B < 1, growth is still superexponential and y still hits a singularity if projected forward. E.g., I estimate B = ~0.55 for GWP since 10,000 BCE. The B > 0 question is what connects most directly to the title of this post, “Does Economic History Point Toward a Singularity?” And as far as I can see a focus on whether B = 1 is immaterial to the substantive issue being debated in these comments, such as whether a model with episodic growth changes is better than one without. If we are focusing on whether B = 1, seemingly a better title for this post would be “Was Kremer (1993) wrong?”
To be clear, the paper seems to shift between two definitions of hyperbolic growth: usually it’s B = 1 (“proportional”), but in places it’s B > 0. I think the paper could easily be misunderstood to be rejecting B > 0 (superexponential growth/singularity in general) in places where it’s actually rejecting B = 1 (superexponential growth/singularity with a particular speed). This is the sense in which I’d prefer less specificity in the statement of the hyperbolic growth hypothesis.
I’ll explain where I’d ideally want more specificity in the next item.
3. The value of an explicit statistical model
We all recognize that the data are noisy, so that the only perfect model for any given series will have as many parameters as data points. What we’re after is a model that strikes a satisfying balance between parsimony (few parameters) and quality of fit. Accepting that, the question immediately arises: how do you measure quality of fit? This question rarely gets addressed systematically—not in Ben’s paper, not in the comments on this post, not in Hanson (2000), nor nearly all the rest of the literature. In fact Kremer (1993) is the only previous paper I’ve found that does proper econometrics—that’s explicit about its statistical model, as well as the methods used to fit it to data, the quality of fit, and validity of underlying assumptions such as independence of successive error terms.
And even Kremer’s model is not internally consistent because it doesn’t take into account how shocks in each decade, say, feed into the growth process to shape the probability distribution for growth over a century. That observation was the starting point for my own incremental contribution.
To be more concrete, look back at the qualifiers in the HGH statement: “tended to be roughly proportional.” Is the HGH, so stated, falsifiable? Or, more realistically, can it be assigned a p value? I think the answer is no, because there is no explicitly hypothesized, stochastic data generating process. The same can be asked of many statements in these comments, when people say a particular kind of model seems to fit history more or less well. It’s not fully clear what “better” would mean, nor what kind of data could falsify or strongly challenge any particular statement about goodness of fit.
I don’t want to be read as perfectionist about this. It’s really hard in this context to state a coherent, rigorously testable statistical model: the quantity of equations in my paper is proof. And at the end of the day, the data are so bad that it’s not obvious that fancy math gives us more insight than hand-wavy verbal debate.
I would suggest however that is important to understand the conceptual gap, just as we try to incorporate Bayesian thinking into our discourse even if we rarely engage in formal Bayesian updating. So I will elaborate.
Suppose I’m looking at a graph of population over time and want to fit a curve to it. I might declare that the one true model is
y = f(t) + e
where f is exponential or what-have-you, and e is an error term. It is common when talking about long-term population or GWP history to stop there. The problem with stopping there is that every model then fits. I could postulate that f is an S-curve, or the Manhattan skyline in profile, a fractal squiggle, etc. Sure, none of these f ’s fit the data perfectly, but I’ve got my error term e there to absorb the discrepancies. Formally my model fits the data exactly.
The logical flaw is the lack of characterization of e. Classically, we’d assume that all of the values of e are drawn independently from a shared probability distribution that has mean 0 and that is itself independent of t and previous values of y. These assumptions are embedded in standard regression methods, at least when we start quoting standard errors and p values. And these assumptions will be violated by most wrong models. For example, if the best-fit S-curve predicts essentially zero growth after 1950 while population actually keeps climbing, then after 1950 discrepancies between actual and fitted values—our estimates of e—will be systematically positive. They will be observably correlated with each other, not independent. This is why something that sounds technical, checking for serial correlation, can have profound implications for whether a model is structurally correct.
I believe this sort of fallacy is present in the current draft of Ben’s paper, where it says, “Kremer’s primary regression results don’t actually tell us anything that we didn’t already know: all they say is that the population growth rate has increased.” (emphasis in the original) Kremer in fact checks whether his modeling errors are independent and identically distributed. Leaving aside whether these checks are perfectly reassuring, I think the critique of the regressions is overdrawn. The counterexample developed in the current draft of Ben’s paper does not engage with the statistical properties of e.
More generally, without explicit assumptions about the distribution of e, discussions about the quality of various models can get bogged down. For then there is little rigorous sense in which one model is better than another. With such assumptions, we can say that the data are more likely under one model than under another.
4. I’m open to the hyperbolic model being too parsimonious
The possibility that growth accelerates episodically is quite plausible to me. And I’d put significant weight on the the episodes being entirely behind us. In fact my favorite part of Ben’s paper is where it gathers radiocarbon-dating research that suggests that “the” agricultural revolution, like the better-measured industrial revolution, brought distinct accelerations in various regions.
In my first attack on modeling long-term growth, I chose to put a lot of work into the simpler hyperbolic model because I saw an opportunity to improve is statistical expression, in particular by modeling how random growth shocks at each moment feed into the growth process and shape the probability distribution for growth over finite periods such as 10 years. Injecting stochasticity into the hyperbolic model seemed potentially useful for two reasons. For one, since adding dynamic stochasticity is hard, it seemed better to do it in a simpler model first.
For another, it allowed a rigorous test of whether second-order effects—the apparently episodic character of growth accelerations—could be parsimoniously viewed as mere noise within a simpler pattern of long-term acceleration. Within the particular structure of my model, the answer was no. For example, after being fit to the GWP data for 10,000 BCE to 1700 CE, my model is surprised at how high GWP was in 1820, assigning that outcome a p value of ~0.1. Ben’s paper presents similar findings, graphically.
So, sure, growth accelerations may be best seen as episodic.
But, as noted, it’s not clear that stipulating an episodic character should in itself shift one’s priors on the possibility of singularity-like developments. Hanson (2000)’s seminal articulation of the episodic view concludes that “From a purely empirical point of view, very large changes are actually to be expected within the next century.” He extrapolates from the statistics of past explosions (the few that we know of) to suggest that the next one will have a doubling time of days or weeks. He doesn’t pursue the logic further, but could have. The next revolution after that could come within days and have a doubling time of seconds. So despite departing from the hyperbolic model, we’re back to predicting a singularity.
And I’ve seen no parsimonious theory for episodic models, by which I mean one or more differential equations whose solutions yield episodic growth. Differential equations are important for expressing how the state of a system affects changes in that state.
Something I’m interested in now is how to rectify that within a stochastic framework. Is there an elegant way to simulate episodic, stochastic acceleration in technological progress?
My own view of growth prospects is at this point black swan-style (even if the popularizer of that term called me a “BSer”). A stochastic hyperbolic model generates fat-tailed distributions for future growth and GWP, ones that imply that the expected value of future output is infinite. Leavening a conventional, insider prediction of stagnation with even a tiny bit of that outside view suffices to fatten its tails, send its expectation to infinity, and, as a practical matter, raise the perceived odds of extreme outcomes.
I dug waaay into this topic when investigating geomagnetic storms. I found it quite interesting and useful. See
https://www.openphilanthropy.org/research/geomagnetic-storms-using-extreme-value-theory-to-gauge-the-risk
https://www.openphilanthropy.org/research/updating-my-risk-estimate-for-the-geomagnetic-big-one