I appreciate that you’ve taken the time to consider what I’ve said in the book at such length. However, I do think that there’s quite a lot that’s wrong in your post, and I’ll describe some of that below. Though I think you have noticed a couple of mistakes in the book, I think that most of the alleged errors are not errors.
I’ll just focus on what I take to be the main issues you highlight, and I won’t address the ‘dishonesty’ allegations, as I anticipate it wouldn’t be productive to do so; I’ll leave that charge for others to assess.
tl;dr:
Of the main issues you refer to, I think you’ve identified two mistakes in the book: I left out a caveat in my summary of the Baird et al (2016) paper, and I conflated overheads costs and CEO pay in a way that, on the latter aspect, was unfair to Charity Navigator.
In neither case are these errors egregious in the way you suggest. I think that: (i) claiming that the Baird et al (2016) should cause us to believe that there is ‘no effect’ on wages is a misrepresentation of that paper; (ii) my core argument against Charity Navigator, regarding their focus on ‘financial efficiency’ metrics like overhead costs, is both successful and accurately depicts Charity Navigator.
I don’t think that the rest of the alleged major errors are errors. In particular: (i) GiveWell were able to review the manuscript before publication and were happy with how I presented their research; the quotes you give generally conflate how to think about GiveWell’s estimates with how to think about DCP2’s estimates; (ii) There are many lines of evidence supporting the 100x multiplier, and I don’t rely at all on the DCP2 estimates, as you imply.
(Also, caveating up front: for reasons of time limitations, I’m going to have to precommit to this being my last comment on this thread.)
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
1. Deworming
Since the book came out, there has been much more debate about the efficacy of deworming. As I’ve continued to learn about the state and quality of the empirical evidence around deworming, I’ve become less happy with my presentation of the evidence around deworming in Doing Good Better; this fact has been reflected on the errata page on my website for the last two years. On your particular points, however:
Deworming vs textbooks
If textbooks have a positive effect, it’s via how much children learn in school, rather than an incentive for them to spend more time in school. So the fact that there doesn’t seem to be good evidence for textbooks increasing test scores is pretty bad.
If deworming has a positive effect, it could be via a number of mechanisms, including increased school attendance or via learning more in school, or direct health impacts, etc. If there are big gains on any of these dimensions, then deworming looks promising. I agree that more days in school certainly aren’t good in themselves, however, so the better evidence is about the long-run effects.
Deworming’s long-run effects
Here’s how GiveWell describes the study on which I base my discussion of the long-run effects of deworming:
“10-year follow-up: Baird et al. 2016 compared the first two groups of schools to receive deworming (as treatment group) to the final group (as control); the treatment group was assigned 2.41 extra years of deworming on average. The study’s headline effect is that as adults, those in the treatment group worked and earned substantially more, with increased earnings driven largely by a shift into the manufacturing sector.” Then, later: “We have done a variety of analyses to assess the robustness of the core findings from Baird et al. 2016, including reanalyzing the data and code underlying the study, and the results have held up to our scrutiny.”
You are correct that my description of the findings of the Baird et al paper was not fully accurate. When I wrote, “Moreover, when Kremer’s colleagues followed up with the children ten years later, those who had been dewormed were working an extra 3.4 hours per week and earning an extra 20 percent of income compared to those who had not been dewormed,” I should have included the caveat “among non-students with wage employment.” I’m sorry about that, and I’m updating my errata page to reflect this.
As for how much we should update on the basis of the Baird et al paper — that’s a really big discussion, and I’m not going to be able to add anything above what GiveWell have already written (here, here and here). I’ll just note that:
(i) Your gloss on the paper seems misleading to me. If you include people with zero earnings, of course it’s going to be harder to get a statistically significant effect. And the data from those who do have an income but who aren’t in wage employment are noisier, so it’s harder to get a statistically significant effect there too. In particular, see here from the 2015 version of the paper: “The data on [non-agricultural] self-employment profits are likely measured with somewhat more noise. Monthly profits are 22% larger in the treatment group, but the difference is not significant (Table 4, Panel C), in part due to large standard errors created by a few male outliers reporting extremely high profits. In a version of the profit data that trims the top 5% of observations, the difference is 28% (P < 0.10).”
(ii) GiveWell finds the Baird et al paper to be an important part of the evidence behind their support of deworming. If you disagree with that, then you’re engaged in a substantive disagreement with GiveWell’s views; it seems wrong to me to class that as a simple misrepresentation.
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
There’s a really big difference between (i) and (ii). I acknowledge that back in 2010 I was badly wrong about the reliability of DCP2 and individual studies, and that GWWC was far too slow to update its web pages after the unreliability of these estimates came to light. But the level of time, care and rigour that have gone into the GiveWell estimates are much greater than those that have gone into the DCP2 estimates. It’s still the case that there’s a huge amount of uncertainty surrounding the GiveWell estimates, but describing them as “the most rigorous estimates” we have seems reasonable to me.
More broadly: Do I really think that you do as much good or more in expectation from donating $3500 to AMF as saving a child’s life? Yes. GiveWell’s estimate of the direct benefits might be optimistic or pessimistic (though it has stayed relatively stable over many years now — the median GiveWell estimate for ‘cost for outcome as good as averting the death of an individual under 5’ is currently $1932), but I really don’t have a view on which is more likely. And, what’s more important, the biggest consideration that’s missing from GiveWell’s analysis is the long-run effects of saving a life. While of course it’s a thorny issue, I personally find it plausible that the long-run expected benefits from a donation to AMF are considerably larger than the short-run benefits — you speed up economic progress just a little bit, in expectation making those in the future just a little bit better off than they would have otherwise been. Because the future is so vast in expectation, that effect is very large. (There’s *plenty* more to discuss on this issue of long-run effects — Might those effects be negative? How should you discount future consumption? etc — but that would take us too far afield.)
3. Charity Navigator
Let’s distinguish: (i) the use of overhead ratio as a metric in assessing charities; (ii) the use of CEO pay as a metric in assessing charities. The idea of evaluating charities on overheads and on the basis of CEO pay are often run together in public discussion, and are both wrong for similar reasons, so I bundled them together in my discussion.
Regarding (ii): CN-of-2014 did talk a lot about CEO pay: they featured CEO pay, in both absolute terms and as a proportion of expenditure, prominently on their charity evaluation pages (see, e.g. their page on Books for Africa), they had top-ten lists like, “10 highly-rated charities with low paid CEOs”, and “10 highly paid CEOs at low-rated charities” (and no lists of “10 highly-rated charities with high paid CEOs” or “10 low-rated charities with low paid CEOs”). However, it is true that CEO pay was not a part of CN’s rating system. And, rereading the relevant passages of DGB, I can see how the reader would have come away with the wrong impression on that score. So I’m sorry about that. (Perhaps I was subconsciously still ornery from their spectacularly hostile hit piece on EA that came out while I was writing DGB, and was therefore less careful than I should have been.) I’ve updated my errata page to make that clear.
Regarding (i): CN’s two key metrics for charities are (a) financial health and (b) accountability and transparency. (a) is in very significant part about the charities’ overheads ratios (in several different forms), where they give a charity a higher score the lower its overheads are, breaking the scores into five broad buckets: see here for more detail. The doughnuts for police officers example shows that a really bad charity could score extremely highly on CN’s metrics, which shows that CN’s metrics must be wrong. Similarly for Books for Africa, which gets a near-perfect score from CN, and features in its ‘ten top-notch charities’ list, in significant part because of its very low overheads, despite having no good evidence to support its program.
I represent CN fairly, and make a fair criticism of its approach to assessing charities. In the extended quote you give, they caveat that very low overheads are not make-or-break for a charity. But, on their charity rating methodology, all other things being equal they give a charity a higher score the lower the charity’s overheads. If that scoring method is a bad one, which it is, then my criticism is justified.
4. Life satisfaction and income and the hundredfold multiplier
The hundredfold multiplier
You make two objections to my 100x multiplier claim: that the DCP2 deworming estimate was off by 100x, and that the Stevenson and Wolfers paper does not support it.
But there are very many lines of evidence in favour of the 100x multiplier, which I reference in Doing Good Better. I mention that there are many independent justifications for thinking that there is a logarithmic (or even more concave) relationship between income and happiness on p.25, and in the endnotes on p.261-2 (all references are to the British paperback edition—yellow cover). In addition to the Stevenson and Wolfers lifetime satisfaction approach (which I discuss later), here are some reasons for thinking that the hundredfold multiplier obtains:
The experiential sampling method of assessing happiness. I mention this in the endnote on p.262, pointing out that, on this method, my argument would be stronger, because on this method the relationship between income and wellbeing is more concave than logarithmic, and is in fact bounded above.
Imputed utility functions from the market behaviour of private individuals and the actions of government. It’s absolutely mainstream economic thought that utility varies with log of income (that is, eta=1 in an isoelastic utility function) or something more concave (eta>1). I reference a paper that takes this approach on p.261, the Groom and Maddison (2013). They estimate eta to be 1.5.
Estimates of cost to save a life. I discuss this in ch.2; I note that this is another strand of supporting evidence prior to my discussion of Stevenson and Wolfers on p.25: “It’s a basic rule of economics that money is less valuable to you the more you have of it. We should therefore expect $1 to provide a larger benefit for an extremely poor Indian farmer than it would for you or me. But how much larger? Economists have sought to answer this question through a variety of methods. We’ll look at some of these in the next chapter, but for now I’ll just discuss one [the Stevenson and Wolfers approach].” Again, you find 100x or more discrepancy in the cost to save a life in rich or poor countries.
Estimate of cost to provide one QALY. As with the previous bullet point.
Note, crucially, that the developing world estimates for cost to provide one QALY or cost to save a life come from GiveWell, not — as you imply — from DCP2 or any individual study.
Is there a causal relationship from income to wellbeing?
It’s true that there Stevenson and Wolfers only shows the correlation is between income and wellbeing. But that there is a causal relationship, from income to wellbeing, is beyond doubt. It’s perfectly obvious that, over the scales we’re talking, higher income enables you to have more wellbeing (you can buy analgesics, healthcare, shelter, eat more and better food, etc).
It’s true that we don’t know exactly the strength of the causal relationship. Understanding this could make my argument stronger or weaker. To illustrate, here’s a quote from another Stevenson and Wolfers paper, with the numerals in square brackets added in by me:
“Although our analysis provides a useful measurement of the bivariate relationship between income and well-being both within and between countries, there are good reasons to doubt that this corresponds to the causal effect of income on well-being. It seems plausible (perhaps even likely) that [i] the within-country well-being-income gradient may be biased upward by reverse causation, as happiness may well be a productive trait in some occupations, raising income. A different perspective, from offered by Kahneman, et al. (2006), suggests that [ii] within-country comparisons overstate the true relationship between subjective well-being and income because of a “focusing illusion”: the very nature of asking about life satisfaction leads people to assess their life relative to others, and they thus focus on where they fall relative to others in regard to concrete measures such as income. Although these specific biases may have a more important impact on within-country comparisons, it seems likely that [iii] the bivariate well-being-GDP relationship may also reflect the influence of third factors, such as democracy, the quality of national laws or government, health, or even favorable weather conditions, and many of these factors raise both GDP per capita and well-being (Kenny, 1999).29 [iv] Other factors, such as increased savings, reduced leisure, or even increasingly materialist values may raise GDP per capita at the expense of subjective well-being. At this stage we cannot address these shortcomings in any detail, although, given our reassessment of the stylized facts, we would suggest an urgent need for research identifying these causal parameters.”
To the extent to which (i), (ii) or (iv) are true, the case for the 100x multiplier becomes stronger. To the extent to which (iii) is true, the case for the 100x multiplier becomes weaker. We don’t know, at the moment, which of these are the most important factors. But, given that the wide variety of different strands of evidence listed in the previous section all point in the same direction, I think that estimating a 100x multiplier as a causal matter is reasonable. (Final point: noting again that all these estimates do not factor in the long-run benefits of donations, which would increase the ratio of benefits others to benefits to yourself even further in the direction of benefits to others.)
On the Stevenson and Wolfers data, is the relationship between income and happiness weaker for poor countries than for rich countries?
If it were the case that money does less to buy happiness (for any given income level) in poor countries than in rich countries, then that would be one counterargument to mine.
However, it doesn’t seem to me that this is true of the Stevenson and Wolfers data. In particular, it’s highly cherry-picked to compare Nigeria and the USA as you do, because Nigeria is a clear outlier in terms of how flat the slope is. I’m only eyeballing the graph, but it seems to me that, of the poorest countries represented (PHL, BGD, EGY, CHN, IND, PAK, NGA, ZAF, IDN), only NGA and ZAF have flatter slopes than USA (and even for ZAF, that’s only true for incomes less than $6000 or so); all the rest have slopes that are similar to or steeper than that of USA (IND, PAK, BGD, CHN, EGY, IDN all seem steeper than USA to me). Given that Nigeria is such an outlier, I’m inclined not to give it too much weight. The average trend across countries, rich and poor, is pretty clear.
Regarding to your point about Cost-effectiveness estimates. Your other objections to my article follow a similar pattern and do not address the substantive points that I raise (I invite the reader to check for themselves).
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
My reasoning regarding cost-effectiveness estimates on that page is as follows (I invite the reader to check it):
1. Quote from DGB that shows that you refer to GiveWell’s AMF cost-effectiveness estimates as to “most rigorous” (that does not show much by itself, aside from the fact that it is very strange to write “most rigorous” when GiveWell’s page specifically refers to the “significant uncertainty”)
2. Quote from GW that says:
As a general note on the limitations to this kind of cost-effectiveness analysis, we believe that cost-effectiveness estimates such as these should not be taken literally, due to the significant uncertainty around them.
3. Three quotes from DGB, which demonstrate that you interpret the GW AMF cost-effectiveness estimate literally. In the first two you write about “five hundred times” the benefit on the basis of these estimates. In the third quote you simply cite the one hundred dollars per QALY number, which does not show much by itself, and which I should not have included. Nonetheless, in the first two quotes I show that you interpret GW AMF cost-effectiveness estimates literally.
4. On the basis of these quote I conclude that you misquote GiveWell. Then I ask a question: can I be sure that GW and I mean the same thing by “the literal interpretation” of a cost-effectiveness estimate?
5. I provide quotes from Holden that demonstrate that we mean the same thing by it. In one of the quotes, Holden writes that your 100 times argument (based there on DCP2 deworming estimate) seems to mean that you interpret cost-effectiveness estimates literally.
These 5 steps constitute my argument for your misinterpretation of GW AMF cost-effectiveness estimates.
You do not address this argument in your comment.
technical edit: conflation of deworming and AMF estimates
You write:
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
If the reader takes their time and looks at the Web Archive link I provided they will see that I do not conflate these estimates. However, it is true that I did conflate them previously: in a confidential draft of the post that I sent to one of the CEA’s employees asking to look at the post prior to its publication and which I requested to not be shared with anyone besides that specific employee I did conflate them (in the end that employee declined to review my draft). I jumped from deworming estimates to AMF estimates in that draft. This fact was pointed out to me by one of my friends and I fixed it prior to the publication.
Edit: besides that CEA employee, I also shared the draft with several of my friends (also asking not to share it with anybody), so I cannot be sure to which exactly version of the post you are replying.
In your comment you write:
But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
As if I quoted you saying something about order of magnitude estimates. I did—in that confidential draft. Again, I invite the reader to check the first public version of my essay archived by Internet Archive and to check whether I provided any quotes where William talks about order of magnitude estimates.
You write:
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
I did update the essay after the first publication. However, the points you’re responding to here were removed before my publication of the essay. I am not sure why you are responding to the confidential draft.
Edit2: Here is the draft I’m referring to. Please note its status as a draft and that I did not intend it to be seen by public. It contains strong language and a variety of mistakes.
If you CTRL+F “orders of magnitude” in this draft, you will find the quote William refers to.
I wonder why my reply has so many downvotes (-8 score) and no replies. This could of course indicate that my arguments are so bad that they’re not worth engaging with, but the fact that many of the members of the community find my criticism accurate and valuable, this seems unlikely.
As a datapoint, I thought that your reply was so bad that it was not worth engaging in, although I think you did find a couple of inaccuracies in DGB and appreciate the effort you went to. I’ll briefly explain my position.
I thought MacAskill’s explanations were convincing and your counter-argument missed his points completely, to the extent that you seem to have an axe to grind with him. E.g. if GiveWell is happy with how their research was presented in DGB (as MacAskill mentioned), then I really don’t see how you, as an outsider and non-GW representative, can complain that their research is misquoted without having extremely strong evidence. You do not have extremely strong evidence. Even if you did, there’s still the matter that GW’s interpretation of their numbers is not necessarily the only reasonable one (as Jan_Kulveit points out below).
You completely ignored MacAskill’s convincing counter-arguments while simultaneously accusing him of ignoring the substance your argument, so it seemed to me that there was little point in debating it further with you.
Thank you for your response. I apologize for the stronger language that I used in the first public version of this post. I believe that here you do not address most of the points I made either in the first public version or in the version that was up here at the moment of your comment.
I will not change the post here without explicitly noting it, now that you have replied.
I’m in the process of preparing a longer reply to you.
In particular, the version of the essay that I initially posted here did not discuss the strength of the relationship between income and happiness in rich and poor countries—I agree that this was a weak argument.
A technical comment: neither Web Archive, nor archive.fo archive the comments to this post, so I archived this page manually. PDF from my site captured at 2018-11-17 16-48 GMT
Hi Alexey,
I appreciate that you’ve taken the time to consider what I’ve said in the book at such length. However, I do think that there’s quite a lot that’s wrong in your post, and I’ll describe some of that below. Though I think you have noticed a couple of mistakes in the book, I think that most of the alleged errors are not errors.
I’ll just focus on what I take to be the main issues you highlight, and I won’t address the ‘dishonesty’ allegations, as I anticipate it wouldn’t be productive to do so; I’ll leave that charge for others to assess.
tl;dr:
Of the main issues you refer to, I think you’ve identified two mistakes in the book: I left out a caveat in my summary of the Baird et al (2016) paper, and I conflated overheads costs and CEO pay in a way that, on the latter aspect, was unfair to Charity Navigator.
In neither case are these errors egregious in the way you suggest. I think that: (i) claiming that the Baird et al (2016) should cause us to believe that there is ‘no effect’ on wages is a misrepresentation of that paper; (ii) my core argument against Charity Navigator, regarding their focus on ‘financial efficiency’ metrics like overhead costs, is both successful and accurately depicts Charity Navigator.
I don’t think that the rest of the alleged major errors are errors. In particular: (i) GiveWell were able to review the manuscript before publication and were happy with how I presented their research; the quotes you give generally conflate how to think about GiveWell’s estimates with how to think about DCP2’s estimates; (ii) There are many lines of evidence supporting the 100x multiplier, and I don’t rely at all on the DCP2 estimates, as you imply.
(Also, caveating up front: for reasons of time limitations, I’m going to have to precommit to this being my last comment on this thread.)
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
1. Deworming
Since the book came out, there has been much more debate about the efficacy of deworming. As I’ve continued to learn about the state and quality of the empirical evidence around deworming, I’ve become less happy with my presentation of the evidence around deworming in Doing Good Better; this fact has been reflected on the errata page on my website for the last two years. On your particular points, however:
Deworming vs textbooks
If textbooks have a positive effect, it’s via how much children learn in school, rather than an incentive for them to spend more time in school. So the fact that there doesn’t seem to be good evidence for textbooks increasing test scores is pretty bad.
If deworming has a positive effect, it could be via a number of mechanisms, including increased school attendance or via learning more in school, or direct health impacts, etc. If there are big gains on any of these dimensions, then deworming looks promising. I agree that more days in school certainly aren’t good in themselves, however, so the better evidence is about the long-run effects.
Deworming’s long-run effects
Here’s how GiveWell describes the study on which I base my discussion of the long-run effects of deworming:
“10-year follow-up: Baird et al. 2016 compared the first two groups of schools to receive deworming (as treatment group) to the final group (as control); the treatment group was assigned 2.41 extra years of deworming on average. The study’s headline effect is that as adults, those in the treatment group worked and earned substantially more, with increased earnings driven largely by a shift into the manufacturing sector.” Then, later: “We have done a variety of analyses to assess the robustness of the core findings from Baird et al. 2016, including reanalyzing the data and code underlying the study, and the results have held up to our scrutiny.”
You are correct that my description of the findings of the Baird et al paper was not fully accurate. When I wrote, “Moreover, when Kremer’s colleagues followed up with the children ten years later, those who had been dewormed were working an extra 3.4 hours per week and earning an extra 20 percent of income compared to those who had not been dewormed,” I should have included the caveat “among non-students with wage employment.” I’m sorry about that, and I’m updating my errata page to reflect this.
As for how much we should update on the basis of the Baird et al paper — that’s a really big discussion, and I’m not going to be able to add anything above what GiveWell have already written (here, here and here). I’ll just note that:
(i) Your gloss on the paper seems misleading to me. If you include people with zero earnings, of course it’s going to be harder to get a statistically significant effect. And the data from those who do have an income but who aren’t in wage employment are noisier, so it’s harder to get a statistically significant effect there too. In particular, see here from the 2015 version of the paper: “The data on [non-agricultural] self-employment profits are likely measured with somewhat more noise. Monthly profits are 22% larger in the treatment group, but the difference is not significant (Table 4, Panel C), in part due to large standard errors created by a few male outliers reporting extremely high profits. In a version of the profit data that trims the top 5% of observations, the difference is 28% (P < 0.10).”
(ii) GiveWell finds the Baird et al paper to be an important part of the evidence behind their support of deworming. If you disagree with that, then you’re engaged in a substantive disagreement with GiveWell’s views; it seems wrong to me to class that as a simple misrepresentation.
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
There’s a really big difference between (i) and (ii). I acknowledge that back in 2010 I was badly wrong about the reliability of DCP2 and individual studies, and that GWWC was far too slow to update its web pages after the unreliability of these estimates came to light. But the level of time, care and rigour that have gone into the GiveWell estimates are much greater than those that have gone into the DCP2 estimates. It’s still the case that there’s a huge amount of uncertainty surrounding the GiveWell estimates, but describing them as “the most rigorous estimates” we have seems reasonable to me.
More broadly: Do I really think that you do as much good or more in expectation from donating $3500 to AMF as saving a child’s life? Yes. GiveWell’s estimate of the direct benefits might be optimistic or pessimistic (though it has stayed relatively stable over many years now — the median GiveWell estimate for ‘cost for outcome as good as averting the death of an individual under 5’ is currently $1932), but I really don’t have a view on which is more likely. And, what’s more important, the biggest consideration that’s missing from GiveWell’s analysis is the long-run effects of saving a life. While of course it’s a thorny issue, I personally find it plausible that the long-run expected benefits from a donation to AMF are considerably larger than the short-run benefits — you speed up economic progress just a little bit, in expectation making those in the future just a little bit better off than they would have otherwise been. Because the future is so vast in expectation, that effect is very large. (There’s *plenty* more to discuss on this issue of long-run effects — Might those effects be negative? How should you discount future consumption? etc — but that would take us too far afield.)
3. Charity Navigator
Let’s distinguish: (i) the use of overhead ratio as a metric in assessing charities; (ii) the use of CEO pay as a metric in assessing charities. The idea of evaluating charities on overheads and on the basis of CEO pay are often run together in public discussion, and are both wrong for similar reasons, so I bundled them together in my discussion.
Regarding (ii): CN-of-2014 did talk a lot about CEO pay: they featured CEO pay, in both absolute terms and as a proportion of expenditure, prominently on their charity evaluation pages (see, e.g. their page on Books for Africa), they had top-ten lists like, “10 highly-rated charities with low paid CEOs”, and “10 highly paid CEOs at low-rated charities” (and no lists of “10 highly-rated charities with high paid CEOs” or “10 low-rated charities with low paid CEOs”). However, it is true that CEO pay was not a part of CN’s rating system. And, rereading the relevant passages of DGB, I can see how the reader would have come away with the wrong impression on that score. So I’m sorry about that. (Perhaps I was subconsciously still ornery from their spectacularly hostile hit piece on EA that came out while I was writing DGB, and was therefore less careful than I should have been.) I’ve updated my errata page to make that clear.
Regarding (i): CN’s two key metrics for charities are (a) financial health and (b) accountability and transparency. (a) is in very significant part about the charities’ overheads ratios (in several different forms), where they give a charity a higher score the lower its overheads are, breaking the scores into five broad buckets: see here for more detail. The doughnuts for police officers example shows that a really bad charity could score extremely highly on CN’s metrics, which shows that CN’s metrics must be wrong. Similarly for Books for Africa, which gets a near-perfect score from CN, and features in its ‘ten top-notch charities’ list, in significant part because of its very low overheads, despite having no good evidence to support its program.
I represent CN fairly, and make a fair criticism of its approach to assessing charities. In the extended quote you give, they caveat that very low overheads are not make-or-break for a charity. But, on their charity rating methodology, all other things being equal they give a charity a higher score the lower the charity’s overheads. If that scoring method is a bad one, which it is, then my criticism is justified.
4. Life satisfaction and income and the hundredfold multiplier
The hundredfold multiplier
You make two objections to my 100x multiplier claim: that the DCP2 deworming estimate was off by 100x, and that the Stevenson and Wolfers paper does not support it.
But there are very many lines of evidence in favour of the 100x multiplier, which I reference in Doing Good Better. I mention that there are many independent justifications for thinking that there is a logarithmic (or even more concave) relationship between income and happiness on p.25, and in the endnotes on p.261-2 (all references are to the British paperback edition—yellow cover). In addition to the Stevenson and Wolfers lifetime satisfaction approach (which I discuss later), here are some reasons for thinking that the hundredfold multiplier obtains:
The experiential sampling method of assessing happiness. I mention this in the endnote on p.262, pointing out that, on this method, my argument would be stronger, because on this method the relationship between income and wellbeing is more concave than logarithmic, and is in fact bounded above.
Imputed utility functions from the market behaviour of private individuals and the actions of government. It’s absolutely mainstream economic thought that utility varies with log of income (that is, eta=1 in an isoelastic utility function) or something more concave (eta>1). I reference a paper that takes this approach on p.261, the Groom and Maddison (2013). They estimate eta to be 1.5.
Estimates of cost to save a life. I discuss this in ch.2; I note that this is another strand of supporting evidence prior to my discussion of Stevenson and Wolfers on p.25: “It’s a basic rule of economics that money is less valuable to you the more you have of it. We should therefore expect $1 to provide a larger benefit for an extremely poor Indian farmer than it would for you or me. But how much larger? Economists have sought to answer this question through a variety of methods. We’ll look at some of these in the next chapter, but for now I’ll just discuss one [the Stevenson and Wolfers approach].” Again, you find 100x or more discrepancy in the cost to save a life in rich or poor countries.
Estimate of cost to provide one QALY. As with the previous bullet point.
Note, crucially, that the developing world estimates for cost to provide one QALY or cost to save a life come from GiveWell, not — as you imply — from DCP2 or any individual study.
Is there a causal relationship from income to wellbeing?
It’s true that there Stevenson and Wolfers only shows the correlation is between income and wellbeing. But that there is a causal relationship, from income to wellbeing, is beyond doubt. It’s perfectly obvious that, over the scales we’re talking, higher income enables you to have more wellbeing (you can buy analgesics, healthcare, shelter, eat more and better food, etc).
It’s true that we don’t know exactly the strength of the causal relationship. Understanding this could make my argument stronger or weaker. To illustrate, here’s a quote from another Stevenson and Wolfers paper, with the numerals in square brackets added in by me:
“Although our analysis provides a useful measurement of the bivariate relationship between income and well-being both within and between countries, there are good reasons to doubt that this corresponds to the causal effect of income on well-being. It seems plausible (perhaps even likely) that [i] the within-country well-being-income gradient may be biased upward by reverse causation, as happiness may well be a productive trait in some occupations, raising income. A different perspective, from offered by Kahneman, et al. (2006), suggests that [ii] within-country comparisons overstate the true relationship between subjective well-being and income because of a “focusing illusion”: the very nature of asking about life satisfaction leads people to assess their life relative to others, and they thus focus on where they fall relative to others in regard to concrete measures such as income. Although these specific biases may have a more important impact on within-country comparisons, it seems likely that [iii] the bivariate well-being-GDP relationship may also reflect the influence of third factors, such as democracy, the quality of national laws or government, health, or even favorable weather conditions, and many of these factors raise both GDP per capita and well-being (Kenny, 1999).29 [iv] Other factors, such as increased savings, reduced leisure, or even increasingly materialist values may raise GDP per capita at the expense of subjective well-being. At this stage we cannot address these shortcomings in any detail, although, given our reassessment of the stylized facts, we would suggest an urgent need for research identifying these causal parameters.”
To the extent to which (i), (ii) or (iv) are true, the case for the 100x multiplier becomes stronger. To the extent to which (iii) is true, the case for the 100x multiplier becomes weaker. We don’t know, at the moment, which of these are the most important factors. But, given that the wide variety of different strands of evidence listed in the previous section all point in the same direction, I think that estimating a 100x multiplier as a causal matter is reasonable. (Final point: noting again that all these estimates do not factor in the long-run benefits of donations, which would increase the ratio of benefits others to benefits to yourself even further in the direction of benefits to others.)
On the Stevenson and Wolfers data, is the relationship between income and happiness weaker for poor countries than for rich countries?
If it were the case that money does less to buy happiness (for any given income level) in poor countries than in rich countries, then that would be one counterargument to mine.
However, it doesn’t seem to me that this is true of the Stevenson and Wolfers data. In particular, it’s highly cherry-picked to compare Nigeria and the USA as you do, because Nigeria is a clear outlier in terms of how flat the slope is. I’m only eyeballing the graph, but it seems to me that, of the poorest countries represented (PHL, BGD, EGY, CHN, IND, PAK, NGA, ZAF, IDN), only NGA and ZAF have flatter slopes than USA (and even for ZAF, that’s only true for incomes less than $6000 or so); all the rest have slopes that are similar to or steeper than that of USA (IND, PAK, BGD, CHN, EGY, IDN all seem steeper than USA to me). Given that Nigeria is such an outlier, I’m inclined not to give it too much weight. The average trend across countries, rich and poor, is pretty clear.
Regarding to your point about Cost-effectiveness estimates. Your other objections to my article follow a similar pattern and do not address the substantive points that I raise (I invite the reader to check for themselves).
You do not address my concern here. Here’s the first Web Archive version of the post
My reasoning regarding cost-effectiveness estimates on that page is as follows (I invite the reader to check it):
1. Quote from DGB that shows that you refer to GiveWell’s AMF cost-effectiveness estimates as to “most rigorous” (that does not show much by itself, aside from the fact that it is very strange to write “most rigorous” when GiveWell’s page specifically refers to the “significant uncertainty”)
2. Quote from GW that says:
3. Three quotes from DGB, which demonstrate that you interpret the GW AMF cost-effectiveness estimate literally. In the first two you write about “five hundred times” the benefit on the basis of these estimates. In the third quote you simply cite the one hundred dollars per QALY number, which does not show much by itself, and which I should not have included. Nonetheless, in the first two quotes I show that you interpret GW AMF cost-effectiveness estimates literally.
4. On the basis of these quote I conclude that you misquote GiveWell. Then I ask a question: can I be sure that GW and I mean the same thing by “the literal interpretation” of a cost-effectiveness estimate?
5. I provide quotes from Holden that demonstrate that we mean the same thing by it. In one of the quotes, Holden writes that your 100 times argument (based there on DCP2 deworming estimate) seems to mean that you interpret cost-effectiveness estimates literally.
These 5 steps constitute my argument for your misinterpretation of GW AMF cost-effectiveness estimates.
You do not address this argument in your comment.
technical edit: conflation of deworming and AMF estimates
You write:
If the reader takes their time and looks at the Web Archive link I provided they will see that I do not conflate these estimates. However, it is true that I did conflate them previously: in a confidential draft of the post that I sent to one of the CEA’s employees asking to look at the post prior to its publication and which I requested to not be shared with anyone besides that specific employee I did conflate them (in the end that employee declined to review my draft). I jumped from deworming estimates to AMF estimates in that draft. This fact was pointed out to me by one of my friends and I fixed it prior to the publication.
Edit: besides that CEA employee, I also shared the draft with several of my friends (also asking not to share it with anybody), so I cannot be sure to which exactly version of the post you are replying.
In your comment you write:
As if I quoted you saying something about order of magnitude estimates. I did—in that confidential draft. Again, I invite the reader to check the first public version of my essay archived by Internet Archive and to check whether I provided any quotes where William talks about order of magnitude estimates.
You write:
I did update the essay after the first publication. However, the points you’re responding to here were removed before my publication of the essay. I am not sure why you are responding to the confidential draft.
Edit2: Here is the draft I’m referring to. Please note its status as a draft and that I did not intend it to be seen by public. It contains strong language and a variety of mistakes.
If you CTRL+F “orders of magnitude” in this draft, you will find the quote William refers to.
I wonder why my reply has so many downvotes (-8 score) and no replies. This could of course indicate that my arguments are so bad that they’re not worth engaging with, but the fact that many of the members of the community find my criticism accurate and valuable, this seems unlikely.
As a datapoint, I thought that your reply was so bad that it was not worth engaging in, although I think you did find a couple of inaccuracies in DGB and appreciate the effort you went to. I’ll briefly explain my position.
I thought MacAskill’s explanations were convincing and your counter-argument missed his points completely, to the extent that you seem to have an axe to grind with him. E.g. if GiveWell is happy with how their research was presented in DGB (as MacAskill mentioned), then I really don’t see how you, as an outsider and non-GW representative, can complain that their research is misquoted without having extremely strong evidence. You do not have extremely strong evidence. Even if you did, there’s still the matter that GW’s interpretation of their numbers is not necessarily the only reasonable one (as Jan_Kulveit points out below).
You completely ignored MacAskill’s convincing counter-arguments while simultaneously accusing him of ignoring the substance your argument, so it seemed to me that there was little point in debating it further with you.
I guess this is a valid point of view. Just in case, I emailed GiveWell about this issue.
see edit above
see edit above
Hi William,
Thank you for your response. I apologize for the stronger language that I used in the first public version of this post. I believe that here you do not address most of the points I made either in the first public version or in the version that was up here at the moment of your comment.
I will not change the post here without explicitly noting it, now that you have replied.
I’m in the process of preparing a longer reply to you.
In particular, the version of the essay that I initially posted here did not discuss the strength of the relationship between income and happiness in rich and poor countries—I agree that this was a weak argument.
A technical comment: neither Web Archive, nor archive.fo archive the comments to this post, so I archived this page manually. PDF from my site captured at 2018-11-17 16-48 GMT
edit: a reddit user suggested this archive of this page: http://archive.fo/jUkMB