My view is that I think it is very likely (>95% chance) that one would find something better than GiveWell’s top charities if one were to put in 4 person-years of time into the project, which was the claim in my original post. My team put in about 6 months of person time on the project. Hauke and I put in a few months writing up our post, which wasn’t about trying to find donation opportunities but just to make the case that there is reason to think we would find something if we tried. I don’t know how much time Lant has put in to actually trying to find a donation opportunities, but that doesn’t seem to be what he spends his time doing just looking at his research output. A substantial fraction of the time that Stephen and Aidan spent on the project was on establishing what happened if you analysed things in terms of life satisfaction rather than $s, as this is what Alex Berger has asked for in the response to mine and Hauke’s post. I wouldn’t expect to find a good donation opportunity after about 3-4 person months of trying from two (very smart) people without much prior experience in the area. I think that would take ~4 years, which is what I suggested in the original post. No-one has yet put that in, so I don’t think it is surprising that no-one has found anything.
My claim that we would find something better than deworming is not just based on intuition. There are numerous independent lines of argument that suggest that working on economic policy must be better than directly funding deworming: (a) economic growth explains a substantial fraction of the variance in human welfare in the world today, variations in RCT-evaluable things explain little; (b) the era of focusing on national development rather than programmatic RCT-evaluable things produced greater improvements in human welfare than all prior human history combined, so let’s keep doing that; (c) if you conservatively add up the impact of all development economists and make conservative assumptions about the effect they had on China, it looks better than directly funding RCT-type stuff; (d) similar thoughts apply to the World Bank and IMF. As I say here, I appreciate that the ICIER numbers may not be right (I haven’t checked), but the value here is illustrative and does not affect the basic arguments. Even if you think there is only a slim chance of success of any feasible campaign succeeding, the size of the rewards are so large that the expected benefits are very large, and must be larger than deworming.
I don’t think it is an apples to oranges comparison. I am asking what I would do as a philanthropist with money in eg East Asia in the 1950s, 60s and 70s, or in any developing country at any point. I have the option of something that could in principle be tested by an RCT and something that might increase growth, like privatising agriculture, legalising supermarkets or reducing tariffs. I don’t think it would ever be the correct response to fund the RCT thing. I don’t think the appeal to what is uncontroversial among economists works in this debate. It is also highly controversial whether deworming produces the posited income benefits. I don’t understand why it is not fair to compare a single intervention to the suite of things that might increase growth rates. That seems like a fair comparison to me and the benefits are measured in $s, which is not an apples to oranges comparison. I use the deworming example because the benefits are provided in monetary terms, which makes the comparison easier. I also think the same applies for GW’s other top charities. Indeed, I think the health benefits of their other charities is overrated because GiveWell overrates the life satisfaction benefits of saving lives.
I expect you and I to differ on this, but I do think there should be more criticism of the recommendation of deworming. EA is generally quite concerned about replicability and the various issues with science. But the deworming recommendation is only based on 1 RCT. Two meta-analyses have found that “deworming did not have an impact on outcomes such as cognition, height, and school performance”, so it is very unclear how deworming is meant to have produced these economic benefits. There is evidence of data mining and multiple comparisons in the paper. There is strong evidence that the external validity of all RCTs is very low—this is not accounted for systematically and formally in GW’s research. This is a classic case where we would expect the study finding to not be real.
I don’t think those would be useful RCTs because they have low value of information. With the road pricing one for example, i find it hard to see how you would set a control group in any useful sense. I know it is (very) unfashionable in economics to say this, but I think it is true that most of the case for road pricing can be known from the armchair (i’m not saying this is true of all topics!) Economists now tend to form their views based on the median published empirical study on a topic, which we have every reason to believe will be wrong. The approach of complete agnosticism on a topic before an RCT is produced on it just seems to me to be similarly wrong
I’m not sure your ‘less than <10%’ is right. Analysing the same top 5 journals for the same year (2015) Duflo suggests that a third of development economics papers are RCTs, which is what I based my claim on.
Fair enough that not all of that time (much less all of Lant’s career) was spent trying to come up with a good example here. I still think that if after a nontrivial amount of cognitive resource spent on this by very smart people (e.g. Lant is in an excellent position to have run across something and at least mentioned it), no one has even come up with a single plausible example that’s worthy of further study, that’s discouraging in a bayesian sense. I fully agree it’s worth more research; I simply doubt that anything fantastic will appear after four years of work (or quite possibly ever).
I disagree with all of your (a)-(d), not least because economists and the WB/IMF have caused lots of bad things to happen in addition to lots of good things. However even if we accept those general lines of argument, the conclusion that despite there being only a small chance of success “the size of the rewards are so large that the expected benefits are very large, and must be larger than deworming” is entirely your intuition. I don’t share it, because I think the probability of success is very very small indeed. I could be wrong, and if you have a quantitative argument regarding the probability of any given high-quality campaign being successful a priori (not the actual ex post GDP change over decades) which isn’t based on your intuition, then we can discuss that.
This is indeed apples to oranges. A philanthropist can cause more deworming or bednets to be implemented. A philanthropist cannot privatize agriculture or legalize supermarkets or reduce tariffs. That’s the whole point. China in the 1950s/60s/70s was led by Mao, so how exactly would you (an external EA philanthropist) have convinced him to alter national policy?
External validity is certainly under-studied and -appreciated (at least by most economists, in part because of publication incentives), and I hope that GW takes it into account. The Poland paper I linked to is interesting in part because it reaches very different conclusions than earlier RCTs in the same domain, and I have published specifically on this problem. Criticism is also great, and the worm wars are evidence that Miguel & Kremer has received a lot of criticism! However none of this implies that we “expect the study finding to not be real”; if you have specific criticisms in this case, what are they? The Cochrane report you refer to is highly problematic, in part precisely because it only considers RCTs. For the interested reader, here are some non-RCT papers that all point in the same direction: onetwothreefour
First you said RCTs were not possible in those cases (which I disputed and you seem to have walked back), now you say they are not necessarily useful. Neither I nor anyone else has ever claimed that an RCT is the right way to answer every question in economics, nor that we should be agnostic before any RCTs, so this is a strawman. I’m a theorist at heart (my PhD was on repeated games), and I fully agree with you that modern econ has gone on average too far toward “let the data speak for itself” (whether RCTs or otherwise), and that progress can sometimes be made from armchairs. I just don’t think theory is going to tell us whether in the real world funding an NGO to lobby for change to Tanzania’s macro policy is in expectation a better bet than funding deworming or bednets.
Your phrase was “a substantial chunk of the economics profession (>30%?)”, which I disputed (and linked to a source showing that it is less than 10% even for development). If your claim is instead that a nontrivial minority of the top development papers are RCTs, then I agree. And that’s an important indicator, although very different than the profession as a whole. What do you think the ‘right’ proportion would be in each case? I would also repeat myself that all of those authors value non-RCTs (and many of them also work on non-RCTs), and that both JPAL and GiveWell have become increasingly interested in broader policy work (as they ought to be, imo).
You seem here to be eliding the claim that we didn’t find any leads with the claim that we didn’t recommend anything. We found many leads. FP found 7 different organisations that we thought it might be worth investigating if we had control of money. In Hauke’s appendices to our post, he lists several organisations and interventions that would be worth investigating. So, there are many plausible examples of things that are worthy of further study. If I put 3 months into a climate change report, I wouldn’t argue that the failure to recommend a climate charity proves that aren’t any good climate charities.
It is true that economists have caused some bad things, but on balance, as Hauke and I discuss in section 3.2, the net gains of growth episodes have been extremely large since 1950. If economists affected the growth accelerations and decelerations in equal measure, then the effect would be strongly positive. For what you say to be true, the economists would have to have massively disproportionately affected the growth decelerations, which I don’t think happened. Further to your criticism of intuition, your argument seems to be either that (i) all subjective probability assessments must be rejected because they are based on intuition or (ii) our subjective probability assessments are too high. Firstly, these two arguments are inconsistent. You cannot both say that you reject our argument because it is based on intuition and then say that according to your intuition the true probability is a lot lower. Secondly, argument (i) applies to all policy work, including the policy work that GiveWell are now going to investigate and that Open Phil already funds; I don’t see why economic policy would be more subjective than air quality work. Thirdly, the case for deworming is also based on subjective probability judgements. For example, the various staff replicability adjustments that GiveWell staff members used to provide varied by a factor of 10. If you read the documentation of the way the replicability discount is now done, a lot of it is based on informal and qualitative reasoning. James Snowden describes GiveWell’s mission as ‘estimating the unknowable’. I don’t think there is a sharp distinction in terms of non-subjectiveness between the things that GW recommends and policy stuff.
To avoid misunderstanding here, am I right in interpreting you as saying the comparison as apples to oranges because philanthropists can never cause beneficial policy change? Do you also think comparing San Francisco planning reform and AMF is an apples to oranges comparison? I do in fact think there were ways to influence Chinese economic policy, namely by funding growth research, which is a global public good. As Hauke and I discuss in section 4 of our post, China under Deng sought extensive advice from foreign economists before making its reforms.
I laid out several specific criticisms of the deworming estimate earlier in the thread. There are many many obviously important things that could affect the external validity of the study, and not just the worm burden which is what GW adjusts for. eg the intervention will now take place 20 years later, in a different country, with a different economy, different culture etc etc. The external validity problems are quite clear—there’s a study showing that deworming some schoolchildren in Kenya increases income (one of many outcomes they were measuring) and we then need to make the assumption that this result will apply in all other contexts even though we have evidence from lots of other domains that the external validity of all RCTs is pretty minimal. The ‘only considers RCT’ criticism of the Cochrane review is one that it is difficult for GW to make given its epistemic approach, which is the one I am criticising.
It’s a bit tricky to carve out a difference between ‘not practically possible’ and ‘not doable in such a way that you could get value of information’. Like you could sort of do an RCT where you allowed foreign investment in some companies but not in others, but it just seems like it would be a total mess to do that. The view that we should be agnostic before RCTs is I think an important guiding implicit assumption that a lot of people make. For example, it seems like by far the most obvious way to explain the approach GiveWell has taken since its inception. Lant discusses an example of a paper in the AEJ which found that building schools closer to children increases enrollment and says “In this paper, we evaluation a simple intervention entirely focused on access to primary schools. The empirical challenge is the potential endogenous relationship between the school availability and household characteristics. [Footnote 1]. Governments, for example, may place schooling either in areas of high demand for education or in areas with low demand for education, in the hopes of encouraging higher participation levels. Either will bias simple cross-sectional estimates of the relationship between access and enrollment.” But the finding that proximity to school increases enrollment is something economists have known for ages on the basis of theory and non-RCT evidence . Not all possible confounders are actual confounders. I didn’t say or imply that “theory is going to tell us whether in the real world funding an NGO to lobby for change to Tanzania’s macro policy is in expectation a better bet than funding deworming or bednets”, it is so implausible that I have never given the impression of having believed it, and explicitly said in my last comment that I wasn’t making such a claim- “(i’m not saying this is true of all topics!)”
Yes you’re right I’m happy to retract that. I had based my estimate on the Duflo numbers not on the wider field number. In answer to your question, I would have less than 5% of the field work on RCTs, maybe less than 1%. I do also think it is important that top 5 journals disproportionately publish RCTs as this does suggest that the top talent is disproportionately working on RCTs.
I’m going to try to step back first and speculate where we actually disagree, in hopes of getting at what you actually think should be happening differently, if anything. You seem to be arguing to some extent against things that do not exist, and in particular that neither I nor others are saying. I think we agree that (i) 1-5% of work in economics should be RCTs; (ii) RCTs are not the right approach for many, indeed most, questions in social science; (iii) there exists lots of policy-relevant and actionable information from non-RCT sources; (iv) intuitions can be a useful input, as long as one is transparent that that’s what they are; (v) the EA community should be spending resources studying policy interventions, including around growth but also imo health (e.g. lead paint, tobacco); and (vi) economists do more good than harm in the world.
Where I think we disagree is that your intuition is that with another three person-years of effort, the EA community will find growth policy funding opportunities (not based on RCTs...) that are far more effective than the current top GW charities, and my intuition is that we won’t (but that I still think we should look, as I’ve said many times, because the uncertainty is high and we might find them!). Neither of us knows for sure, as this hasn’t been done yet. Is it more than that?
You’re right that finding leads but not making recommendations could cause one to update upward, downward, or not at all (depending on one’s prior) so I shouldn’t have suggested that it was necessarily a bad signal. However I didn’t see anything quantitative (I did look at the appendices, but I believe the only quantitatively worked-out example was the incorrect and retrospective ICRIER one in the main text). This makes it very hard to meaningfully compare against current top interventions. I would love if you would be willing to go on the record, with as many caveats as you want, and pick one or two of the ones Hauke listed and put some back-of-the-envelope numbers behind them just to see how it plays out (you said that you “actually don’t think some of the GiveWell numbers are based on more evidence”—so where is your evidence?). Then we could have a substantive discussion. Or maybe I would be convinced once I saw the concrete reasoning.
To say that the net impact of historic policy economists has been positive (which I agree with) does not begin to tell us that it is “better than directly funding RCT-type stuff”: we need (prospective) numbers, or else we’re basing it on your intuition. I don’t reject your argument because it is based on intuition; I reject the claim that it is based on evidence, if it is based on intuition. No inconsistency from me, and certainly no claim that subjective assessments should all be rejected; indeed I have published those and am devoting a large portion of my current / upcoming research agenda to surveying people about their values in order to use those estimates in policy work. No RCTs whatsoever, except some light randomization around framing!
Thanks for checking—no you are not right. If I thought that philanthropists can never cause beneficial policy change, why would the very first sentence of the TL/DR in my very first reply have been “I 100% agree that we should be doing more research on effective ways to leverage broad policy initiatives”?? The issue is that while I can donate to AMF, I can’t donate to “San Francisco planning reform”. There’s an extra step involved in lobbying, which may be where our intuitions differ (I think it’s possible but low probability; I guess you think it’s higher but you never seem to talk about that part as much you do that GDP growth was really big). That’s the sense in which every time you compare a specific GW charity to ‘whatever it was that caused growth’ your claim is difficult to parse and evaluate from the perspective of a philanthropist.
If GW is saying to only consider RCTs, then I very much disagree with them (but see below). NB you’re the one who brought up the Cochrane review as support for your position on worms, even though exactly your own arguments about RCTs not being the holy grail—which I fully agree with—show that it is deeply flawed. I’m not sure who you’re arguing with here. Btw do you have any citations for the claim that “the external validity of all RCTs is pretty minimal”?
GW paid me some years ago specifically to do an informally bayesian analysis of the deworming results (i.e. what should our prior have been before Baird et al?), putting it in the context of everything we know about not just deworming (from RCTs and otherwise) but also the impact of early childhood interventions / shocks on adult outcomes, most of which was not from RCTs. So I honestly don’t know where you get the idea that there’s a group of people out there who only believe in RCTs (except that Lant and Angus D sometimes like to give that impression, so that they can argue against it). The fact that Lant can find one imperfect paper in one good-but-not-fantastic journal (if he had found something in a top-5 journal he would have used it instead, so clearly he didn’t) doesn’t prove much. Can you point to any randomista or organization saying that we should be agnostic before doing RCTs? I would be very surprised if so, because literally no one I know believes that. As far as Tanzanian macro policy, sorry if I gave the impression that you believed that particular example to be true. The difficulty is that you appear to be unwilling or unable to give any specific example whatsoever, so I had to make something up just to instantiate the point that we (as EAs) don’t implement policy; we fund organizations who indirectly influence policy (whereas we directly implement ‘RCT-type’ interventions). But I really didn’t mean to give the impression that you might have claimed that it was somehow always going to be a great approach, so apologies.
Thanks, and as I already wrote above I fully agree that prevalence in top-5 journals is [also] an important indicator—for the reasons you mention. Any new technique is going to be popular at first, both because it’s a shiny new toy (academia over-rewards novelty) and also because there will actually be a lot of useful low-hanging fruit that it can be used for at first (if it has any merits whatsoever, which RCTs certainly do). I suspect that over the next 5/10/20 years the proportion in top journals (which is much lower than 30% overall; that’s just for development) will go down; the proportion in other journals will first go up and then down; and it will settle at something like 10% for development and well under 5% for econ as a whole. This seems about right to me; maybe you think it’s still way too high?
Yes, that seems like the main thing we disagree about. It also seems like we disagree about the likely impact of deworming.
I would also like to do that but I don’t have time to do that properly unfortunately—my time is now focused on longterm relevant stuff.
I think the point here is that the growth decelerations don’t seem like they would be big enough to make the back of the envelope calcs on the world bank, IMF and all economists in mine and Hauke’s post lower than RCT stuff.
Thanks for clarifying. First, (this is not a criticism of you) this is different to the view taken by Open Phil which does try to compare its rich world policy to AMF and is what prompted the main post. So, at least Open Phil can’t think it is an apples to orange comparison. Second, I don’t really get how it is an apples to oranges comparison. You seem to be saying that the judgement of the probability is too high. but that is different to the metric used to compare the options being incomparable. I don’t understand why you think the probability estimate is too high. The back of the envelope on the world bank and IMF seems like a pretty good guide. You really have to believe some implausible things to think that the probability of success is low enough. On the ‘all economists’ one for example, the estimate is extremely conservative because it only counts the economic gains in China and assumes that is the only thing that all economists achieved.
The reason I cited the Cochrane review was that GiveWell also cited it in their review of deworming. My main aim is to dispute GiveWell’s reasoning as that is what is driving a lot of EA money, which I think could do much more good elsewhere. On external validity, I’m going off the Vivalt paper.
It’s a natural way to interpret GiveWell’s views because the only charities they recommend, or (I think) have ever recommended, have been tested by RCTs. All of the charities on their standout list have all been tested by RCTs. This suggests at least that they put extremely high weight on RCT evidence. Even if they are not at the extreme of complete agnosticism sans RCT, they are very close to it. To use an example from Lant, Chris Blattman said that the best investment to fight world poverty would be to run an RCT comparing giving people cash or giving people chickens. I find this stance extremely hard to understand without the implicit premise of agnosticism in the absence of RCTs. I don’t think it would be difficult to find similar claims by duflo and banerjee with a bit of extra time. I do think that this goes beyond RCTs and extends to an over-reliance on empirical studies, as I argued here.
I’m not sure which direction RCTs will go in in the future. The publication incentives are to do that type of work.
Another point of agreement: the economics profession currently focuses too much on empirical work. Meanwhile my own personal view is that people like Esther and Chris B are slightly ‘too far’ in the pro-RCT camp, and that people like Lant (and you) are ‘too far’ in the anti-RCT camp. But I don’t see anyone in this discussion as being extreme (except possibly Lant...); healthy disagreement is to be expected and encouraged. Note that Esther and Abhijit’s most recent book tackles macro issues like migration, trade, climate change, and yes growth—using RCTs when possible / relevant but also plenty of other results (including lots of theory! Abhijit started life as a theorist, like I did). Meanwhile Chris has a forthcoming book on war and peace (macro level! no easy RCTs) for which he uses other approaches like machine learning. You can find all sorts of quotes, but the proof is in the pudding. Final point on this is that one can easily combine RCTs with admin data, ML, etc, and researchers (including me) are doing more and more of that, which imo is great—it’s not always one or the other.
As you say, the efficacy of deworming seems to be a point of disagreement between us. Again pulling back somewhat, you link to Eva’s paper as supporting your claim that RCTs have minimal external validity, but her paper is about all forms of impact evaluation (and she notes in the conclusion that the subset of RCTs aren’t special). So this would be extremely damning for economics if true, but her results don’t support your claim. For instance she notes that bednets and conditional cash transfers seem to do very well on this front. More relevantly, her point (as I read it) is to see how much of the nominal variation in effect sizes can be explained by other contextual variables, and she finds that typically a nontrivial amount of it can be. This is good news for external validity, since it means we can often explain / predict the differences even when they do arise.
I think I haven’t been very clear about ‘apples to oranges’ - I agree that these can & should absolutely be compared. I just felt like the way you were doing it glossed over an important difference. I can write a check to AMF and feel very confident that something will change in the world; we can then debate the expected magnitude of the impact of that change. But I can’t write a check to “growth reform in the developing world”, so even before we debate the relative benefits of changing immigration policy vs distributing bednets we have to calculate the probability that the desired policy will get implemented. I realize you’re fully aware of this, but that’s the part I keep coming back to because that’s the part where I’m pessimistic (partly having worked for the US government, although for a counterargument I liked this recent forum post) and suspect that our intuitions disagree, and mostly you keep talking about the benefits of more migration and of GDP growth (which are great!) and not so much about how we sit down and estimate the likelihoods of bringing those about. I’ll admit that the “pessimistic” estimate of 1% on ICRIER in the original post with Hauke really made me distrust everything afterward, since the pessimistic estimate in that case is a negative number and a plausible median estimate seems to be about 1 in a million.
On China I suppose my main point is still that I think it’s simply very very hard to quantitatively estimate most of this. Just because you (or I, or anyone) thinks that something is extremely conservative (when you admit you haven’t put in as much time on all this as you’d like, and indeed it’s not your job to do so) doesn’t make it so. In this specific case, if you forced me to take a stand, my best guess is to agree with you that economists have helped push policy in a better direction and that that made a big difference to global welfare. Even if I felt more confident about that, what is the counterfactual you are comparing to? Did some NGO or the WB cause that to happen on the margin, or would economists have tried to learn about the world and influence policy anyway? Are there similar opportunities going forward? The Taliban says they want economics expertise, so perhaps. But I don’t think we know the answers to these questions (yet), even within orders of magnitude, and whether or not this type of approach will beat RCT-type approaches depends entirely on those particular probabilities.
Hello,
My view is that I think it is very likely (>95% chance) that one would find something better than GiveWell’s top charities if one were to put in 4 person-years of time into the project, which was the claim in my original post. My team put in about 6 months of person time on the project. Hauke and I put in a few months writing up our post, which wasn’t about trying to find donation opportunities but just to make the case that there is reason to think we would find something if we tried. I don’t know how much time Lant has put in to actually trying to find a donation opportunities, but that doesn’t seem to be what he spends his time doing just looking at his research output. A substantial fraction of the time that Stephen and Aidan spent on the project was on establishing what happened if you analysed things in terms of life satisfaction rather than $s, as this is what Alex Berger has asked for in the response to mine and Hauke’s post. I wouldn’t expect to find a good donation opportunity after about 3-4 person months of trying from two (very smart) people without much prior experience in the area. I think that would take ~4 years, which is what I suggested in the original post. No-one has yet put that in, so I don’t think it is surprising that no-one has found anything.
My claim that we would find something better than deworming is not just based on intuition. There are numerous independent lines of argument that suggest that working on economic policy must be better than directly funding deworming: (a) economic growth explains a substantial fraction of the variance in human welfare in the world today, variations in RCT-evaluable things explain little; (b) the era of focusing on national development rather than programmatic RCT-evaluable things produced greater improvements in human welfare than all prior human history combined, so let’s keep doing that; (c) if you conservatively add up the impact of all development economists and make conservative assumptions about the effect they had on China, it looks better than directly funding RCT-type stuff; (d) similar thoughts apply to the World Bank and IMF. As I say here, I appreciate that the ICIER numbers may not be right (I haven’t checked), but the value here is illustrative and does not affect the basic arguments. Even if you think there is only a slim chance of success of any feasible campaign succeeding, the size of the rewards are so large that the expected benefits are very large, and must be larger than deworming.
I don’t think it is an apples to oranges comparison. I am asking what I would do as a philanthropist with money in eg East Asia in the 1950s, 60s and 70s, or in any developing country at any point. I have the option of something that could in principle be tested by an RCT and something that might increase growth, like privatising agriculture, legalising supermarkets or reducing tariffs. I don’t think it would ever be the correct response to fund the RCT thing. I don’t think the appeal to what is uncontroversial among economists works in this debate. It is also highly controversial whether deworming produces the posited income benefits. I don’t understand why it is not fair to compare a single intervention to the suite of things that might increase growth rates. That seems like a fair comparison to me and the benefits are measured in $s, which is not an apples to oranges comparison. I use the deworming example because the benefits are provided in monetary terms, which makes the comparison easier. I also think the same applies for GW’s other top charities. Indeed, I think the health benefits of their other charities is overrated because GiveWell overrates the life satisfaction benefits of saving lives.
I expect you and I to differ on this, but I do think there should be more criticism of the recommendation of deworming. EA is generally quite concerned about replicability and the various issues with science. But the deworming recommendation is only based on 1 RCT. Two meta-analyses have found that “deworming did not have an impact on outcomes such as cognition, height, and school performance”, so it is very unclear how deworming is meant to have produced these economic benefits. There is evidence of data mining and multiple comparisons in the paper. There is strong evidence that the external validity of all RCTs is very low—this is not accounted for systematically and formally in GW’s research. This is a classic case where we would expect the study finding to not be real.
I don’t think those would be useful RCTs because they have low value of information. With the road pricing one for example, i find it hard to see how you would set a control group in any useful sense. I know it is (very) unfashionable in economics to say this, but I think it is true that most of the case for road pricing can be known from the armchair (i’m not saying this is true of all topics!) Economists now tend to form their views based on the median published empirical study on a topic, which we have every reason to believe will be wrong. The approach of complete agnosticism on a topic before an RCT is produced on it just seems to me to be similarly wrong
I’m not sure your ‘less than <10%’ is right. Analysing the same top 5 journals for the same year (2015) Duflo suggests that a third of development economics papers are RCTs, which is what I based my claim on.
Fair enough that not all of that time (much less all of Lant’s career) was spent trying to come up with a good example here. I still think that if after a nontrivial amount of cognitive resource spent on this by very smart people (e.g. Lant is in an excellent position to have run across something and at least mentioned it), no one has even come up with a single plausible example that’s worthy of further study, that’s discouraging in a bayesian sense. I fully agree it’s worth more research; I simply doubt that anything fantastic will appear after four years of work (or quite possibly ever).
I disagree with all of your (a)-(d), not least because economists and the WB/IMF have caused lots of bad things to happen in addition to lots of good things. However even if we accept those general lines of argument, the conclusion that despite there being only a small chance of success “the size of the rewards are so large that the expected benefits are very large, and must be larger than deworming” is entirely your intuition. I don’t share it, because I think the probability of success is very very small indeed. I could be wrong, and if you have a quantitative argument regarding the probability of any given high-quality campaign being successful a priori (not the actual ex post GDP change over decades) which isn’t based on your intuition, then we can discuss that.
This is indeed apples to oranges. A philanthropist can cause more deworming or bednets to be implemented. A philanthropist cannot privatize agriculture or legalize supermarkets or reduce tariffs. That’s the whole point. China in the 1950s/60s/70s was led by Mao, so how exactly would you (an external EA philanthropist) have convinced him to alter national policy?
External validity is certainly under-studied and -appreciated (at least by most economists, in part because of publication incentives), and I hope that GW takes it into account. The Poland paper I linked to is interesting in part because it reaches very different conclusions than earlier RCTs in the same domain, and I have published specifically on this problem. Criticism is also great, and the worm wars are evidence that Miguel & Kremer has received a lot of criticism! However none of this implies that we “expect the study finding to not be real”; if you have specific criticisms in this case, what are they? The Cochrane report you refer to is highly problematic, in part precisely because it only considers RCTs. For the interested reader, here are some non-RCT papers that all point in the same direction: one two three four
First you said RCTs were not possible in those cases (which I disputed and you seem to have walked back), now you say they are not necessarily useful. Neither I nor anyone else has ever claimed that an RCT is the right way to answer every question in economics, nor that we should be agnostic before any RCTs, so this is a strawman. I’m a theorist at heart (my PhD was on repeated games), and I fully agree with you that modern econ has gone on average too far toward “let the data speak for itself” (whether RCTs or otherwise), and that progress can sometimes be made from armchairs. I just don’t think theory is going to tell us whether in the real world funding an NGO to lobby for change to Tanzania’s macro policy is in expectation a better bet than funding deworming or bednets.
Your phrase was “a substantial chunk of the economics profession (>30%?)”, which I disputed (and linked to a source showing that it is less than 10% even for development). If your claim is instead that a nontrivial minority of the top development papers are RCTs, then I agree. And that’s an important indicator, although very different than the profession as a whole. What do you think the ‘right’ proportion would be in each case? I would also repeat myself that all of those authors value non-RCTs (and many of them also work on non-RCTs), and that both JPAL and GiveWell have become increasingly interested in broader policy work (as they ought to be, imo).
You seem here to be eliding the claim that we didn’t find any leads with the claim that we didn’t recommend anything. We found many leads. FP found 7 different organisations that we thought it might be worth investigating if we had control of money. In Hauke’s appendices to our post, he lists several organisations and interventions that would be worth investigating. So, there are many plausible examples of things that are worthy of further study. If I put 3 months into a climate change report, I wouldn’t argue that the failure to recommend a climate charity proves that aren’t any good climate charities.
It is true that economists have caused some bad things, but on balance, as Hauke and I discuss in section 3.2, the net gains of growth episodes have been extremely large since 1950. If economists affected the growth accelerations and decelerations in equal measure, then the effect would be strongly positive. For what you say to be true, the economists would have to have massively disproportionately affected the growth decelerations, which I don’t think happened. Further to your criticism of intuition, your argument seems to be either that (i) all subjective probability assessments must be rejected because they are based on intuition or (ii) our subjective probability assessments are too high. Firstly, these two arguments are inconsistent. You cannot both say that you reject our argument because it is based on intuition and then say that according to your intuition the true probability is a lot lower. Secondly, argument (i) applies to all policy work, including the policy work that GiveWell are now going to investigate and that Open Phil already funds; I don’t see why economic policy would be more subjective than air quality work. Thirdly, the case for deworming is also based on subjective probability judgements. For example, the various staff replicability adjustments that GiveWell staff members used to provide varied by a factor of 10. If you read the documentation of the way the replicability discount is now done, a lot of it is based on informal and qualitative reasoning. James Snowden describes GiveWell’s mission as ‘estimating the unknowable’. I don’t think there is a sharp distinction in terms of non-subjectiveness between the things that GW recommends and policy stuff.
To avoid misunderstanding here, am I right in interpreting you as saying the comparison as apples to oranges because philanthropists can never cause beneficial policy change? Do you also think comparing San Francisco planning reform and AMF is an apples to oranges comparison? I do in fact think there were ways to influence Chinese economic policy, namely by funding growth research, which is a global public good. As Hauke and I discuss in section 4 of our post, China under Deng sought extensive advice from foreign economists before making its reforms.
I laid out several specific criticisms of the deworming estimate earlier in the thread. There are many many obviously important things that could affect the external validity of the study, and not just the worm burden which is what GW adjusts for. eg the intervention will now take place 20 years later, in a different country, with a different economy, different culture etc etc. The external validity problems are quite clear—there’s a study showing that deworming some schoolchildren in Kenya increases income (one of many outcomes they were measuring) and we then need to make the assumption that this result will apply in all other contexts even though we have evidence from lots of other domains that the external validity of all RCTs is pretty minimal. The ‘only considers RCT’ criticism of the Cochrane review is one that it is difficult for GW to make given its epistemic approach, which is the one I am criticising.
It’s a bit tricky to carve out a difference between ‘not practically possible’ and ‘not doable in such a way that you could get value of information’. Like you could sort of do an RCT where you allowed foreign investment in some companies but not in others, but it just seems like it would be a total mess to do that. The view that we should be agnostic before RCTs is I think an important guiding implicit assumption that a lot of people make. For example, it seems like by far the most obvious way to explain the approach GiveWell has taken since its inception. Lant discusses an example of a paper in the AEJ which found that building schools closer to children increases enrollment and says “In this paper, we evaluation a simple intervention entirely focused on access to primary schools. The empirical challenge is the potential endogenous relationship between the school availability and household characteristics. [Footnote 1]. Governments, for example, may place schooling either in areas of high demand for education or in areas with low demand for education, in the hopes of encouraging higher participation levels. Either will bias simple cross-sectional estimates of the relationship between access and enrollment.” But the finding that proximity to school increases enrollment is something economists have known for ages on the basis of theory and non-RCT evidence . Not all possible confounders are actual confounders. I didn’t say or imply that “theory is going to tell us whether in the real world funding an NGO to lobby for change to Tanzania’s macro policy is in expectation a better bet than funding deworming or bednets”, it is so implausible that I have never given the impression of having believed it, and explicitly said in my last comment that I wasn’t making such a claim- “(i’m not saying this is true of all topics!)”
Yes you’re right I’m happy to retract that. I had based my estimate on the Duflo numbers not on the wider field number. In answer to your question, I would have less than 5% of the field work on RCTs, maybe less than 1%. I do also think it is important that top 5 journals disproportionately publish RCTs as this does suggest that the top talent is disproportionately working on RCTs.
I’m going to try to step back first and speculate where we actually disagree, in hopes of getting at what you actually think should be happening differently, if anything. You seem to be arguing to some extent against things that do not exist, and in particular that neither I nor others are saying. I think we agree that (i) 1-5% of work in economics should be RCTs; (ii) RCTs are not the right approach for many, indeed most, questions in social science; (iii) there exists lots of policy-relevant and actionable information from non-RCT sources; (iv) intuitions can be a useful input, as long as one is transparent that that’s what they are; (v) the EA community should be spending resources studying policy interventions, including around growth but also imo health (e.g. lead paint, tobacco); and (vi) economists do more good than harm in the world.
Where I think we disagree is that your intuition is that with another three person-years of effort, the EA community will find growth policy funding opportunities (not based on RCTs...) that are far more effective than the current top GW charities, and my intuition is that we won’t (but that I still think we should look, as I’ve said many times, because the uncertainty is high and we might find them!). Neither of us knows for sure, as this hasn’t been done yet. Is it more than that?
You’re right that finding leads but not making recommendations could cause one to update upward, downward, or not at all (depending on one’s prior) so I shouldn’t have suggested that it was necessarily a bad signal. However I didn’t see anything quantitative (I did look at the appendices, but I believe the only quantitatively worked-out example was the incorrect and retrospective ICRIER one in the main text). This makes it very hard to meaningfully compare against current top interventions. I would love if you would be willing to go on the record, with as many caveats as you want, and pick one or two of the ones Hauke listed and put some back-of-the-envelope numbers behind them just to see how it plays out (you said that you “actually don’t think some of the GiveWell numbers are based on more evidence”—so where is your evidence?). Then we could have a substantive discussion. Or maybe I would be convinced once I saw the concrete reasoning.
To say that the net impact of historic policy economists has been positive (which I agree with) does not begin to tell us that it is “better than directly funding RCT-type stuff”: we need (prospective) numbers, or else we’re basing it on your intuition. I don’t reject your argument because it is based on intuition; I reject the claim that it is based on evidence, if it is based on intuition. No inconsistency from me, and certainly no claim that subjective assessments should all be rejected; indeed I have published those and am devoting a large portion of my current / upcoming research agenda to surveying people about their values in order to use those estimates in policy work. No RCTs whatsoever, except some light randomization around framing!
Thanks for checking—no you are not right. If I thought that philanthropists can never cause beneficial policy change, why would the very first sentence of the TL/DR in my very first reply have been “I 100% agree that we should be doing more research on effective ways to leverage broad policy initiatives”?? The issue is that while I can donate to AMF, I can’t donate to “San Francisco planning reform”. There’s an extra step involved in lobbying, which may be where our intuitions differ (I think it’s possible but low probability; I guess you think it’s higher but you never seem to talk about that part as much you do that GDP growth was really big). That’s the sense in which every time you compare a specific GW charity to ‘whatever it was that caused growth’ your claim is difficult to parse and evaluate from the perspective of a philanthropist.
If GW is saying to only consider RCTs, then I very much disagree with them (but see below). NB you’re the one who brought up the Cochrane review as support for your position on worms, even though exactly your own arguments about RCTs not being the holy grail—which I fully agree with—show that it is deeply flawed. I’m not sure who you’re arguing with here. Btw do you have any citations for the claim that “the external validity of all RCTs is pretty minimal”?
GW paid me some years ago specifically to do an informally bayesian analysis of the deworming results (i.e. what should our prior have been before Baird et al?), putting it in the context of everything we know about not just deworming (from RCTs and otherwise) but also the impact of early childhood interventions / shocks on adult outcomes, most of which was not from RCTs. So I honestly don’t know where you get the idea that there’s a group of people out there who only believe in RCTs (except that Lant and Angus D sometimes like to give that impression, so that they can argue against it). The fact that Lant can find one imperfect paper in one good-but-not-fantastic journal (if he had found something in a top-5 journal he would have used it instead, so clearly he didn’t) doesn’t prove much. Can you point to any randomista or organization saying that we should be agnostic before doing RCTs? I would be very surprised if so, because literally no one I know believes that. As far as Tanzanian macro policy, sorry if I gave the impression that you believed that particular example to be true. The difficulty is that you appear to be unwilling or unable to give any specific example whatsoever, so I had to make something up just to instantiate the point that we (as EAs) don’t implement policy; we fund organizations who indirectly influence policy (whereas we directly implement ‘RCT-type’ interventions). But I really didn’t mean to give the impression that you might have claimed that it was somehow always going to be a great approach, so apologies.
Thanks, and as I already wrote above I fully agree that prevalence in top-5 journals is [also] an important indicator—for the reasons you mention. Any new technique is going to be popular at first, both because it’s a shiny new toy (academia over-rewards novelty) and also because there will actually be a lot of useful low-hanging fruit that it can be used for at first (if it has any merits whatsoever, which RCTs certainly do). I suspect that over the next 5/10/20 years the proportion in top journals (which is much lower than 30% overall; that’s just for development) will go down; the proportion in other journals will first go up and then down; and it will settle at something like 10% for development and well under 5% for econ as a whole. This seems about right to me; maybe you think it’s still way too high?
Yes, that seems like the main thing we disagree about. It also seems like we disagree about the likely impact of deworming.
I would also like to do that but I don’t have time to do that properly unfortunately—my time is now focused on longterm relevant stuff.
I think the point here is that the growth decelerations don’t seem like they would be big enough to make the back of the envelope calcs on the world bank, IMF and all economists in mine and Hauke’s post lower than RCT stuff.
Thanks for clarifying. First, (this is not a criticism of you) this is different to the view taken by Open Phil which does try to compare its rich world policy to AMF and is what prompted the main post. So, at least Open Phil can’t think it is an apples to orange comparison. Second, I don’t really get how it is an apples to oranges comparison. You seem to be saying that the judgement of the probability is too high. but that is different to the metric used to compare the options being incomparable. I don’t understand why you think the probability estimate is too high. The back of the envelope on the world bank and IMF seems like a pretty good guide. You really have to believe some implausible things to think that the probability of success is low enough. On the ‘all economists’ one for example, the estimate is extremely conservative because it only counts the economic gains in China and assumes that is the only thing that all economists achieved.
The reason I cited the Cochrane review was that GiveWell also cited it in their review of deworming. My main aim is to dispute GiveWell’s reasoning as that is what is driving a lot of EA money, which I think could do much more good elsewhere. On external validity, I’m going off the Vivalt paper.
It’s a natural way to interpret GiveWell’s views because the only charities they recommend, or (I think) have ever recommended, have been tested by RCTs. All of the charities on their standout list have all been tested by RCTs. This suggests at least that they put extremely high weight on RCT evidence. Even if they are not at the extreme of complete agnosticism sans RCT, they are very close to it. To use an example from Lant, Chris Blattman said that the best investment to fight world poverty would be to run an RCT comparing giving people cash or giving people chickens. I find this stance extremely hard to understand without the implicit premise of agnosticism in the absence of RCTs. I don’t think it would be difficult to find similar claims by duflo and banerjee with a bit of extra time. I do think that this goes beyond RCTs and extends to an over-reliance on empirical studies, as I argued here.
I’m not sure which direction RCTs will go in in the future. The publication incentives are to do that type of work.
Another point of agreement: the economics profession currently focuses too much on empirical work. Meanwhile my own personal view is that people like Esther and Chris B are slightly ‘too far’ in the pro-RCT camp, and that people like Lant (and you) are ‘too far’ in the anti-RCT camp. But I don’t see anyone in this discussion as being extreme (except possibly Lant...); healthy disagreement is to be expected and encouraged. Note that Esther and Abhijit’s most recent book tackles macro issues like migration, trade, climate change, and yes growth—using RCTs when possible / relevant but also plenty of other results (including lots of theory! Abhijit started life as a theorist, like I did). Meanwhile Chris has a forthcoming book on war and peace (macro level! no easy RCTs) for which he uses other approaches like machine learning. You can find all sorts of quotes, but the proof is in the pudding. Final point on this is that one can easily combine RCTs with admin data, ML, etc, and researchers (including me) are doing more and more of that, which imo is great—it’s not always one or the other.
As you say, the efficacy of deworming seems to be a point of disagreement between us. Again pulling back somewhat, you link to Eva’s paper as supporting your claim that RCTs have minimal external validity, but her paper is about all forms of impact evaluation (and she notes in the conclusion that the subset of RCTs aren’t special). So this would be extremely damning for economics if true, but her results don’t support your claim. For instance she notes that bednets and conditional cash transfers seem to do very well on this front. More relevantly, her point (as I read it) is to see how much of the nominal variation in effect sizes can be explained by other contextual variables, and she finds that typically a nontrivial amount of it can be. This is good news for external validity, since it means we can often explain / predict the differences even when they do arise.
I think I haven’t been very clear about ‘apples to oranges’ - I agree that these can & should absolutely be compared. I just felt like the way you were doing it glossed over an important difference. I can write a check to AMF and feel very confident that something will change in the world; we can then debate the expected magnitude of the impact of that change. But I can’t write a check to “growth reform in the developing world”, so even before we debate the relative benefits of changing immigration policy vs distributing bednets we have to calculate the probability that the desired policy will get implemented. I realize you’re fully aware of this, but that’s the part I keep coming back to because that’s the part where I’m pessimistic (partly having worked for the US government, although for a counterargument I liked this recent forum post) and suspect that our intuitions disagree, and mostly you keep talking about the benefits of more migration and of GDP growth (which are great!) and not so much about how we sit down and estimate the likelihoods of bringing those about. I’ll admit that the “pessimistic” estimate of 1% on ICRIER in the original post with Hauke really made me distrust everything afterward, since the pessimistic estimate in that case is a negative number and a plausible median estimate seems to be about 1 in a million.
On China I suppose my main point is still that I think it’s simply very very hard to quantitatively estimate most of this. Just because you (or I, or anyone) thinks that something is extremely conservative (when you admit you haven’t put in as much time on all this as you’d like, and indeed it’s not your job to do so) doesn’t make it so. In this specific case, if you forced me to take a stand, my best guess is to agree with you that economists have helped push policy in a better direction and that that made a big difference to global welfare. Even if I felt more confident about that, what is the counterfactual you are comparing to? Did some NGO or the WB cause that to happen on the margin, or would economists have tried to learn about the world and influence policy anyway? Are there similar opportunities going forward? The Taliban says they want economics expertise, so perhaps. But I don’t think we know the answers to these questions (yet), even within orders of magnitude, and whether or not this type of approach will beat RCT-type approaches depends entirely on those particular probabilities.