When we draw on research, we vet it in rare depth (as does GiveWell, from which we spun off). I have sometimes spent months replicating and reanalyzing a key study—checking for bugs in the computer code, thinking about how I would run the numbers differently and how I would interpret the results. This interface between research and practice might seem like a picture of harmony, since researchers want their work to guide decision-making for the public good and decision-makers like Open Philanthropy want to receive such guidance.
Yet I have come to see how cultural misunderstandings prevail at this interface. From my side, what the academy does and what I and most of the public think it does are not the same. There are two problems. First, about half the time I reanalyze a study, I find that there are important bugs in the code, or that adding more data makes the mathematical finding go away, or that there’s a compelling alternative explanation for the results. (Caveat: most of my experience is with non-randomized studies.) Second, when I send my critical findings to the journal that peer-reviewed and published the original research, the editors usually don’t seem interested (recent exception). Seeing the ivory tower as a bastion of truth-seeking, I used to be surprised. I understand now that, because of how the academy works, in particular, because of how the individuals within academia respond to incentives beyond their control, we consumers of research are sometimes more truth-seeking than the producers.
I had a similar realisation towards the end of my studies which was a key factor in persuading me to not pursue academia. Also I’ve mentioned this before, but it surprised me how much more these kinds of details mattered in my experience in industry.
Skipping over to his recap of the specific case he looked into:
To recap:
Two economists performed a quantitative analysis of a clever, novel question.
Another researcher promptly responded that the analysis contains errors (such as computing average daytime temperature with respect to Greenwich time rather than local time), and that it could have been done on a much larger data set (for 1990 to ~2019 instead of 2000–04). These changes make the headline findings go away.
After behind-the-scenes back and forth among the disputants and editors, the journal published the comment and rejoinder.
These new articles confused even an expert.
An outsider (me) delved into the debate and found that it’s actually a pretty easy call.
If you score the journal on whether it successfully illuminated its readership as to the truth, then I think it is kind of 0 for 2. …
That said, AEJ Applied did support dialogue between economists that eventually brought the truth out. In particular, by requiring public posting of data and code (an area where this journal and its siblings have been pioneers), it facilitated rapid scrutiny.
Still, it bears emphasizing: For quality assurance, the data sharing was much more valuable than the peer review. And, whether for lack of time or reluctance to take sides, the journal’s handling of the dispute obscured the truth.
My purpose in examining this example is not to call down a thunderbolt on anyone, from the Olympian heights of a funding body. It is rather to use a concrete story to illustrate the larger patterns I mentioned earlier. Despite having undergone peer review, many published studies in the social sciences and epidemiology do not withstand close scrutiny. When they are challenged, journal editors have a hard time managing the debate in a way that produces more light than heat.
This passage from David Roodman’s essay Appeal to Me: First Trial of a “Replication Opinion” resonated:
I had a similar realisation towards the end of my studies which was a key factor in persuading me to not pursue academia. Also I’ve mentioned this before, but it surprised me how much more these kinds of details mattered in my experience in industry.
Skipping over to his recap of the specific case he looked into: