I appreciate that you’ve taken the time to consider what I’ve said in the book at such length. However, I do think that there’s quite a lot that’s wrong in your post, and I’ll describe some of that below. Though I think you have noticed a couple of mistakes in the book, I think that most of the alleged errors are not errors.
I’ll just focus on what I take to be the main issues you highlight, and I won’t address the ‘dishonesty’ allegations, as I anticipate it wouldn’t be productive to do so; I’ll leave that charge for others to assess.
tl;dr:
Of the main issues you refer to, I think you’ve identified two mistakes in the book: I left out a caveat in my summary of the Baird et al (2016) paper, and I conflated overheads costs and CEO pay in a way that, on the latter aspect, was unfair to Charity Navigator.
In neither case are these errors egregious in the way you suggest. I think that: (i) claiming that the Baird et al (2016) should cause us to believe that there is ‘no effect’ on wages is a misrepresentation of that paper; (ii) my core argument against Charity Navigator, regarding their focus on ‘financial efficiency’ metrics like overhead costs, is both successful and accurately depicts Charity Navigator.
I don’t think that the rest of the alleged major errors are errors. In particular: (i) GiveWell were able to review the manuscript before publication and were happy with how I presented their research; the quotes you give generally conflate how to think about GiveWell’s estimates with how to think about DCP2’s estimates; (ii) There are many lines of evidence supporting the 100x multiplier, and I don’t rely at all on the DCP2 estimates, as you imply.
(Also, caveating up front: for reasons of time limitations, I’m going to have to precommit to this being my last comment on this thread.)
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
1. Deworming
Since the book came out, there has been much more debate about the efficacy of deworming. As I’ve continued to learn about the state and quality of the empirical evidence around deworming, I’ve become less happy with my presentation of the evidence around deworming in Doing Good Better; this fact has been reflected on the errata page on my website for the last two years. On your particular points, however:
Deworming vs textbooks
If textbooks have a positive effect, it’s via how much children learn in school, rather than an incentive for them to spend more time in school. So the fact that there doesn’t seem to be good evidence for textbooks increasing test scores is pretty bad.
If deworming has a positive effect, it could be via a number of mechanisms, including increased school attendance or via learning more in school, or direct health impacts, etc. If there are big gains on any of these dimensions, then deworming looks promising. I agree that more days in school certainly aren’t good in themselves, however, so the better evidence is about the long-run effects.
Deworming’s long-run effects
Here’s how GiveWell describes the study on which I base my discussion of the long-run effects of deworming:
“10-year follow-up: Baird et al. 2016 compared the first two groups of schools to receive deworming (as treatment group) to the final group (as control); the treatment group was assigned 2.41 extra years of deworming on average. The study’s headline effect is that as adults, those in the treatment group worked and earned substantially more, with increased earnings driven largely by a shift into the manufacturing sector.” Then, later: “We have done a variety of analyses to assess the robustness of the core findings from Baird et al. 2016, including reanalyzing the data and code underlying the study, and the results have held up to our scrutiny.”
You are correct that my description of the findings of the Baird et al paper was not fully accurate. When I wrote, “Moreover, when Kremer’s colleagues followed up with the children ten years later, those who had been dewormed were working an extra 3.4 hours per week and earning an extra 20 percent of income compared to those who had not been dewormed,” I should have included the caveat “among non-students with wage employment.” I’m sorry about that, and I’m updating my errata page to reflect this.
As for how much we should update on the basis of the Baird et al paper — that’s a really big discussion, and I’m not going to be able to add anything above what GiveWell have already written (here, here and here). I’ll just note that:
(i) Your gloss on the paper seems misleading to me. If you include people with zero earnings, of course it’s going to be harder to get a statistically significant effect. And the data from those who do have an income but who aren’t in wage employment are noisier, so it’s harder to get a statistically significant effect there too. In particular, see here from the 2015 version of the paper: “The data on [non-agricultural] self-employment profits are likely measured with somewhat more noise. Monthly profits are 22% larger in the treatment group, but the difference is not significant (Table 4, Panel C), in part due to large standard errors created by a few male outliers reporting extremely high profits. In a version of the profit data that trims the top 5% of observations, the difference is 28% (P < 0.10).”
(ii) GiveWell finds the Baird et al paper to be an important part of the evidence behind their support of deworming. If you disagree with that, then you’re engaged in a substantive disagreement with GiveWell’s views; it seems wrong to me to class that as a simple misrepresentation.
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
There’s a really big difference between (i) and (ii). I acknowledge that back in 2010 I was badly wrong about the reliability of DCP2 and individual studies, and that GWWC was far too slow to update its web pages after the unreliability of these estimates came to light. But the level of time, care and rigour that have gone into the GiveWell estimates are much greater than those that have gone into the DCP2 estimates. It’s still the case that there’s a huge amount of uncertainty surrounding the GiveWell estimates, but describing them as “the most rigorous estimates” we have seems reasonable to me.
More broadly: Do I really think that you do as much good or more in expectation from donating $3500 to AMF as saving a child’s life? Yes. GiveWell’s estimate of the direct benefits might be optimistic or pessimistic (though it has stayed relatively stable over many years now — the median GiveWell estimate for ‘cost for outcome as good as averting the death of an individual under 5’ is currently $1932), but I really don’t have a view on which is more likely. And, what’s more important, the biggest consideration that’s missing from GiveWell’s analysis is the long-run effects of saving a life. While of course it’s a thorny issue, I personally find it plausible that the long-run expected benefits from a donation to AMF are considerably larger than the short-run benefits — you speed up economic progress just a little bit, in expectation making those in the future just a little bit better off than they would have otherwise been. Because the future is so vast in expectation, that effect is very large. (There’s *plenty* more to discuss on this issue of long-run effects — Might those effects be negative? How should you discount future consumption? etc — but that would take us too far afield.)
3. Charity Navigator
Let’s distinguish: (i) the use of overhead ratio as a metric in assessing charities; (ii) the use of CEO pay as a metric in assessing charities. The idea of evaluating charities on overheads and on the basis of CEO pay are often run together in public discussion, and are both wrong for similar reasons, so I bundled them together in my discussion.
Regarding (ii): CN-of-2014 did talk a lot about CEO pay: they featured CEO pay, in both absolute terms and as a proportion of expenditure, prominently on their charity evaluation pages (see, e.g. their page on Books for Africa), they had top-ten lists like, “10 highly-rated charities with low paid CEOs”, and “10 highly paid CEOs at low-rated charities” (and no lists of “10 highly-rated charities with high paid CEOs” or “10 low-rated charities with low paid CEOs”). However, it is true that CEO pay was not a part of CN’s rating system. And, rereading the relevant passages of DGB, I can see how the reader would have come away with the wrong impression on that score. So I’m sorry about that. (Perhaps I was subconsciously still ornery from their spectacularly hostile hit piece on EA that came out while I was writing DGB, and was therefore less careful than I should have been.) I’ve updated my errata page to make that clear.
Regarding (i): CN’s two key metrics for charities are (a) financial health and (b) accountability and transparency. (a) is in very significant part about the charities’ overheads ratios (in several different forms), where they give a charity a higher score the lower its overheads are, breaking the scores into five broad buckets: see here for more detail. The doughnuts for police officers example shows that a really bad charity could score extremely highly on CN’s metrics, which shows that CN’s metrics must be wrong. Similarly for Books for Africa, which gets a near-perfect score from CN, and features in its ‘ten top-notch charities’ list, in significant part because of its very low overheads, despite having no good evidence to support its program.
I represent CN fairly, and make a fair criticism of its approach to assessing charities. In the extended quote you give, they caveat that very low overheads are not make-or-break for a charity. But, on their charity rating methodology, all other things being equal they give a charity a higher score the lower the charity’s overheads. If that scoring method is a bad one, which it is, then my criticism is justified.
4. Life satisfaction and income and the hundredfold multiplier
The hundredfold multiplier
You make two objections to my 100x multiplier claim: that the DCP2 deworming estimate was off by 100x, and that the Stevenson and Wolfers paper does not support it.
But there are very many lines of evidence in favour of the 100x multiplier, which I reference in Doing Good Better. I mention that there are many independent justifications for thinking that there is a logarithmic (or even more concave) relationship between income and happiness on p.25, and in the endnotes on p.261-2 (all references are to the British paperback edition—yellow cover). In addition to the Stevenson and Wolfers lifetime satisfaction approach (which I discuss later), here are some reasons for thinking that the hundredfold multiplier obtains:
The experiential sampling method of assessing happiness. I mention this in the endnote on p.262, pointing out that, on this method, my argument would be stronger, because on this method the relationship between income and wellbeing is more concave than logarithmic, and is in fact bounded above.
Imputed utility functions from the market behaviour of private individuals and the actions of government. It’s absolutely mainstream economic thought that utility varies with log of income (that is, eta=1 in an isoelastic utility function) or something more concave (eta>1). I reference a paper that takes this approach on p.261, the Groom and Maddison (2013). They estimate eta to be 1.5.
Estimates of cost to save a life. I discuss this in ch.2; I note that this is another strand of supporting evidence prior to my discussion of Stevenson and Wolfers on p.25: “It’s a basic rule of economics that money is less valuable to you the more you have of it. We should therefore expect $1 to provide a larger benefit for an extremely poor Indian farmer than it would for you or me. But how much larger? Economists have sought to answer this question through a variety of methods. We’ll look at some of these in the next chapter, but for now I’ll just discuss one [the Stevenson and Wolfers approach].” Again, you find 100x or more discrepancy in the cost to save a life in rich or poor countries.
Estimate of cost to provide one QALY. As with the previous bullet point.
Note, crucially, that the developing world estimates for cost to provide one QALY or cost to save a life come from GiveWell, not — as you imply — from DCP2 or any individual study.
Is there a causal relationship from income to wellbeing?
It’s true that there Stevenson and Wolfers only shows the correlation is between income and wellbeing. But that there is a causal relationship, from income to wellbeing, is beyond doubt. It’s perfectly obvious that, over the scales we’re talking, higher income enables you to have more wellbeing (you can buy analgesics, healthcare, shelter, eat more and better food, etc).
It’s true that we don’t know exactly the strength of the causal relationship. Understanding this could make my argument stronger or weaker. To illustrate, here’s a quote from another Stevenson and Wolfers paper, with the numerals in square brackets added in by me:
“Although our analysis provides a useful measurement of the bivariate relationship between income and well-being both within and between countries, there are good reasons to doubt that this corresponds to the causal effect of income on well-being. It seems plausible (perhaps even likely) that [i] the within-country well-being-income gradient may be biased upward by reverse causation, as happiness may well be a productive trait in some occupations, raising income. A different perspective, from offered by Kahneman, et al. (2006), suggests that [ii] within-country comparisons overstate the true relationship between subjective well-being and income because of a “focusing illusion”: the very nature of asking about life satisfaction leads people to assess their life relative to others, and they thus focus on where they fall relative to others in regard to concrete measures such as income. Although these specific biases may have a more important impact on within-country comparisons, it seems likely that [iii] the bivariate well-being-GDP relationship may also reflect the influence of third factors, such as democracy, the quality of national laws or government, health, or even favorable weather conditions, and many of these factors raise both GDP per capita and well-being (Kenny, 1999).29 [iv] Other factors, such as increased savings, reduced leisure, or even increasingly materialist values may raise GDP per capita at the expense of subjective well-being. At this stage we cannot address these shortcomings in any detail, although, given our reassessment of the stylized facts, we would suggest an urgent need for research identifying these causal parameters.”
To the extent to which (i), (ii) or (iv) are true, the case for the 100x multiplier becomes stronger. To the extent to which (iii) is true, the case for the 100x multiplier becomes weaker. We don’t know, at the moment, which of these are the most important factors. But, given that the wide variety of different strands of evidence listed in the previous section all point in the same direction, I think that estimating a 100x multiplier as a causal matter is reasonable. (Final point: noting again that all these estimates do not factor in the long-run benefits of donations, which would increase the ratio of benefits others to benefits to yourself even further in the direction of benefits to others.)
On the Stevenson and Wolfers data, is the relationship between income and happiness weaker for poor countries than for rich countries?
If it were the case that money does less to buy happiness (for any given income level) in poor countries than in rich countries, then that would be one counterargument to mine.
However, it doesn’t seem to me that this is true of the Stevenson and Wolfers data. In particular, it’s highly cherry-picked to compare Nigeria and the USA as you do, because Nigeria is a clear outlier in terms of how flat the slope is. I’m only eyeballing the graph, but it seems to me that, of the poorest countries represented (PHL, BGD, EGY, CHN, IND, PAK, NGA, ZAF, IDN), only NGA and ZAF have flatter slopes than USA (and even for ZAF, that’s only true for incomes less than $6000 or so); all the rest have slopes that are similar to or steeper than that of USA (IND, PAK, BGD, CHN, EGY, IDN all seem steeper than USA to me). Given that Nigeria is such an outlier, I’m inclined not to give it too much weight. The average trend across countries, rich and poor, is pretty clear.
Regarding to your point about Cost-effectiveness estimates. Your other objections to my article follow a similar pattern and do not address the substantive points that I raise (I invite the reader to check for themselves).
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
My reasoning regarding cost-effectiveness estimates on that page is as follows (I invite the reader to check it):
1. Quote from DGB that shows that you refer to GiveWell’s AMF cost-effectiveness estimates as to “most rigorous” (that does not show much by itself, aside from the fact that it is very strange to write “most rigorous” when GiveWell’s page specifically refers to the “significant uncertainty”)
2. Quote from GW that says:
As a general note on the limitations to this kind of cost-effectiveness analysis, we believe that cost-effectiveness estimates such as these should not be taken literally, due to the significant uncertainty around them.
3. Three quotes from DGB, which demonstrate that you interpret the GW AMF cost-effectiveness estimate literally. In the first two you write about “five hundred times” the benefit on the basis of these estimates. In the third quote you simply cite the one hundred dollars per QALY number, which does not show much by itself, and which I should not have included. Nonetheless, in the first two quotes I show that you interpret GW AMF cost-effectiveness estimates literally.
4. On the basis of these quote I conclude that you misquote GiveWell. Then I ask a question: can I be sure that GW and I mean the same thing by “the literal interpretation” of a cost-effectiveness estimate?
5. I provide quotes from Holden that demonstrate that we mean the same thing by it. In one of the quotes, Holden writes that your 100 times argument (based there on DCP2 deworming estimate) seems to mean that you interpret cost-effectiveness estimates literally.
These 5 steps constitute my argument for your misinterpretation of GW AMF cost-effectiveness estimates.
You do not address this argument in your comment.
technical edit: conflation of deworming and AMF estimates
You write:
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
If the reader takes their time and looks at the Web Archive link I provided they will see that I do not conflate these estimates. However, it is true that I did conflate them previously: in a confidential draft of the post that I sent to one of the CEA’s employees asking to look at the post prior to its publication and which I requested to not be shared with anyone besides that specific employee I did conflate them (in the end that employee declined to review my draft). I jumped from deworming estimates to AMF estimates in that draft. This fact was pointed out to me by one of my friends and I fixed it prior to the publication.
Edit: besides that CEA employee, I also shared the draft with several of my friends (also asking not to share it with anybody), so I cannot be sure to which exactly version of the post you are replying.
In your comment you write:
But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
As if I quoted you saying something about order of magnitude estimates. I did—in that confidential draft. Again, I invite the reader to check the first public version of my essay archived by Internet Archive and to check whether I provided any quotes where William talks about order of magnitude estimates.
You write:
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
I did update the essay after the first publication. However, the points you’re responding to here were removed before my publication of the essay. I am not sure why you are responding to the confidential draft.
Edit2: Here is the draft I’m referring to. Please note its status as a draft and that I did not intend it to be seen by public. It contains strong language and a variety of mistakes.
If you CTRL+F “orders of magnitude” in this draft, you will find the quote William refers to.
I wonder why my reply has so many downvotes (-8 score) and no replies. This could of course indicate that my arguments are so bad that they’re not worth engaging with, but the fact that many of the members of the community find my criticism accurate and valuable, this seems unlikely.
As a datapoint, I thought that your reply was so bad that it was not worth engaging in, although I think you did find a couple of inaccuracies in DGB and appreciate the effort you went to. I’ll briefly explain my position.
I thought MacAskill’s explanations were convincing and your counter-argument missed his points completely, to the extent that you seem to have an axe to grind with him. E.g. if GiveWell is happy with how their research was presented in DGB (as MacAskill mentioned), then I really don’t see how you, as an outsider and non-GW representative, can complain that their research is misquoted without having extremely strong evidence. You do not have extremely strong evidence. Even if you did, there’s still the matter that GW’s interpretation of their numbers is not necessarily the only reasonable one (as Jan_Kulveit points out below).
You completely ignored MacAskill’s convincing counter-arguments while simultaneously accusing him of ignoring the substance your argument, so it seemed to me that there was little point in debating it further with you.
Thank you for your response. I apologize for the stronger language that I used in the first public version of this post. I believe that here you do not address most of the points I made either in the first public version or in the version that was up here at the moment of your comment.
I will not change the post here without explicitly noting it, now that you have replied.
I’m in the process of preparing a longer reply to you.
In particular, the version of the essay that I initially posted here did not discuss the strength of the relationship between income and happiness in rich and poor countries—I agree that this was a weak argument.
A technical comment: neither Web Archive, nor archive.fo archive the comments to this post, so I archived this page manually. PDF from my site captured at 2018-11-17 16-48 GMT
[comment I’m likely to regret writing; still seems right]
It seems lot of people are reacting by voting, but the karma of the post is 0. It seems to me up-votes and down-votes are really not expressive enough, so I want to add a more complex reaction.
It is really very unfortunate that the post is framed around the question whether Will MacAskill is or is not honest. This is wrong, and makes any subsequent discussion difficult. (strong down-vote) (Also the conclusion (“he is not”) is not really supported by the evidence.)
It is (and was even more in the blog version) over-zealous, interpreting things uncharitably, and suggesting extreme actions. (downvote)
At the same time, it seems really important to have an open and critical discussion, and culture where people can challenge ‘canonical’ EA books and movement leaders. (upvote)
Carefully going through the sources and checking if papers are not cherry-picked and represented truthfully is commendable. (upvote)
Having really good epistemics is really important, in particular with the focus on long-term. Vigilance in this direction seems good. (upvote)
So it seems really a pity the post was not framed as a question somewhere in the direction “do you thing this is epistemically good?”
If I try to imagine something like “steel-maned version of the post”, without questioning honesty, and without making uncharitable inferences, the reaction could have been some useful discussion.
It seems to me
“Doing Good Better” is sometimes more on the “explaining & advocacy of ideas” side than “dispassionate representation of research”.
Given the genre, I would bet the book is in the top quartile on the metric of representing research correctly.
In some of the examples, it seems adding more caveats and reporting in more detail would have been better for readers interested in precision. Likely at the cost of making the text more dry.
Some emotions sometimes creep in: In case of the somewhat uncharitable part about Charity Evaluator, I remembered their much more uncharitable / misrepresenting text attacking effective altruism and GiveWell. Also while they talk about importance of other things, what they actually measure is actually wrong, and criticized correctly. In case of the whole topic… well a lot of evidence points toward things like 100x multiplier being true, meaning that yes, actually it is possible to save many more people. It seems hard to not to have some passion.
Given that several books about long-term future are now written, the update I would take from that is the books mostly about long-term should err more the side of caveating and describing disagreements and explaining uncertainty more, but my feeling is shift in this direction already happened between 2014 and 2018.
I agree with all the points you make here, including on the suggested upvote/downvote distribution, and on the nature of DGB. FWIW, my (current, defeasible) plan for any future trade books I write is that they’d be more highbrow (and more caveated, and therefore drier) than DGB.
I think that’s the right approach for me, at the moment. But presumably at some point the best thing to do (for some people) will be wider advocacy (wider than DGB), which will inevitably involve simplification of ideas. So we’ll have to figure out what epistemic standards are appropriate in that context (given that GiveWell-level detail is off the table).
Some preliminary thoughts on heuristics for this (these are suggestions only):
Standards we’d want to keep as high as ever:
Is the broad brush strokes picture of what is being conveyed accurate? Is there any easy way the broad brush of what is conveyed could have been made more accurate?
Are the sentences being used to support this broad brush strokes picture warranted by the evidence?
Is this the way of communicating the core message about as caveated and detailed as one can reasonably manage?
Standards we’d need to relax:
Does this communicate as much detail as possible with respect to the relevant claims?
Does this communicate all the strongest possible counterarguments to the key claim?
Thanks. I think the criteria which standards to keep and which to relax you propose are reasonable.
It seems an important question. I would like someone trying it to study more formally, using for example “value of information” or “rational inattention” frameworks. I can imagine experiments like giving people a longer list of arguments and trying to gather feedback on what was the value for them and then making decisions based on that. (Now this seems to be done mainly based on author’s intuitions.)
In some of the examples, it seems adding more caveats and reporting in more detail would have been better for readers interested in precision.
I should point out that in the post I show not just a lack of caveats and details. William misrepresents the evidence. Among other things, he:
cherry picks the variables from a deworming paper he cites
interprets GW’s AMF estimate in a way they specifically asked not to interpret them (“five hundred times” more effective thing — Holden wrote specifically about such arguments that they seem to require taking cost-effectiveness estimates literally)
quotes two sentences from Charity Navigator’s site when the very next sentence shows that the interpretation of the previous sentences is wrong
In a long response William posted here, he did not address any of these points:
he doesn’t mention cherry picking (and neither does his errata page)
he doesn’t mention the fact that GiveWell asked not to interpret their AMF estimate literally
and he writes “I represent CN fairly, and make a fair criticism of its approach to assessing charities.”, which may be true about some general CN’s position, but which has nothing to do with misquoting Charity Navigator.
If the issue was just a lack of detail, of course I would not have written the post in such a tone. Initially, I considered simply emailing him a list of mistakes that I found, but as I mentioned in the post, the volume and egregiousness of misrepresentations lead me to conclude that he argued in bad faith.
edit: I will email GiveWell to clarify what they think about William making claims about 500 times more benefit on the basis of their AMF estimate.
I think I understand you gradually become upset, but it seems in the process you started to miss the more favorable interpretations.
For example, with the “interpretation of the GiveWell estimates”: based on reading a bunch of old discussions on archive, my _impression_ is there was at least in some point of time a genuine disagreement about how to interpret the numbers between Will, Tobi, Holden and possibly others (there was much less disagreement about the numeric values). So if this is the case, it is plausible Will was using his interpretation of the numbers, which was in some sense “bolder” than the GW interpretation. My sense of good epistemic standard is you certainly can do this, but should add a caveat with warning that the authors of the numbers have a different interpretation of them (so it is a miss of caveat). At the same time I can imagine how you can fail to do this without any bad faith—for example, if you are at some point of the discussion confused whether some object-level disagreement continues or not (especially if you ask the other party in the disagreement to check the text). Also, if my impression is correct and the core of the object-level disagreement was quite technical question regarding proper use of Bayesian statistics and EV calculations, it does not seem obvious how to report the disagreement to general public.
In general: switching to the assumption someone is deliberately misleading is a highly slippery slope: it seems with this sort of assumption you can kind of explain everything, often easily, and if you can’t e.g. speak to people in person it may be quite difficult to find anything which would make you the update in the opposite direction.
About cost-effectiveness estimates: I don’t think your interpretation is plausible. The GiveWell page that gives the $3400 estimate, specifically asks not to interpret it literally.
About me deciding that MacAskill is deliberately misleading. Please see my comment in /r/slatestarcodex in response to /u/scottalexander about it. Would love to know what you think.
[because of time constrains, I will focus on just one example now]
Yes, but GiveWell is not some sort of ultimate authority on how their numbers should be interpreted. Take an ab absurdum example: NRA publishes some numbers about guns, gun-related violence, and their interpretation that there are not enough guns in the US and gun violence is low. If you basically agree with numbers, but disagree with their interpretation, surely you can use the numbers and interpret them in a different way.
GiveWell reasoning is explained in this article. Technically speaking you _can_ use the numbers directly as EV estimates if you have a very broad prior, and the prior is all the same across all the actions you are comparing. (You can argue this is technically not the right thing to do, or you can argue that GiveWell advises people not to do it.) As I stated in my original comment, I’d appreciate if such disagreements are reported. At the same time it seems difficult to do it properly in a popular text. I can imagine something like this
According to the most rigorous estimates by GiveWell, the cost to save a life in the developing world is about $3,400 (or $100 for one QALY [Quality-adjusted life year]). However, this depends on a literal interpretation of the numbers, which GiveWell does not recommend. But if you start with a very broad prior distribution over action impacts, uniform across actions, even if you use the correct Bayesian statistics, the mean expectation value of the cost will be the number we use (we can see that from the estimate being unbiased ). About $3,400 is a small enough amount that most of us in affluent countries could donate that amount every year while maintaining about the same quality of life. …
being more precise, but you can probably see it is a very different book now. I’d be quite interested in how you would write the paragraph if you wanted to use the number, wanted to give numerical estimate of the cost per live saved and did not want to explain to the reader Bayesian estimates.
Guzey, would you consider rewriting this post, framing it not as questioning MacAskill’s honesty but rather just pointing out some flaws in the representation of research? I fully buy some of your criticisms (it was an epistemic failure to not report that deworming has no effect on test scores, misrepresent Charity Navigator’s views, and misrepresent the “ethical employer” poll). And I think Jan’s views accurately reflect the community’s views: we want to be able to have open discussion and criticism, even of the EA “canon.” But it’s absolutely correct that the personal attacks on MacAskill’s integrity make it near impossible to have this open discussion.
Even if you’re still convinced that MacAskill is dishonest, wouldn’t the best way to prove it to the community be to have a thorough, open debate over these factual question? Then, if it becomes clear that your criticisms are correct, people will be able to judge the honesty issue themselves. I think you’re limiting your own potential here by making people not want to engage with your ideas.
I’d be happy to engage with the individual criticisms here and have some back and forth, if only this was written in a less ad hominem way.
Separately, does anyone have thoughts on the John Bunker DALY estimate? MacAskill claims that a developed world doctor only creates 7 DALYs, Bunker’s paper doesn’t seem to say anything like this, and this 80,000 Hours blog estimates instead that a developed world doctor creates 600 QALYs. Was MacAskill wrong on the effectiveness of becoming a doctor?
I do wonder if I should’ve written this post in a less personal tone. I will consider writing a follow up to it.
About me deciding that MacAskill is deliberately misleading, please see my comment in /r/slatestarcodex in response to /u/scottalexander about it. Would love to know what you think.
I’ll headline this by saying that I completely believe you’re doing this in good faith, I agree with several of your criticisms, and I think this deserves to be openly discussed. But I also strongly disagree with your conclusion about MacAskill’s honesty, and, even if I thought it was plausible, it still would be an unnecessary breach of etiquette that makes open conversation near impossible. I really think you should stop making this an argument about MacAskill’s personal honesty. Have the facts debate, leave ad hominem aside so everyone can fully engage, and if you’re proven right on the facts, then raise your honesty concerns.
First I’d like to address your individual points, then your claims about MacAskill.
Misreporting the deworming study. I think this is your best point. It seems entirely correct that if textbooks fail because they don’t improve test scores, that deworming should fail by the same metric. But I agree with /u/ScottAlexander that, in popular writing, you often don’t have the space to specifically go through all the literature on why deworming is better. MacAskill’s deworming claims were misleading on one level, in that the specific argument he provided is not a good one, but also fair on another level: MacAskill/GiveWell has looked tons into deworming, concluded that it’s better than textbooks, and this is the easiest way to illustrate why in a single sentence. Nobody reading this is looking for a survey of the evidence base on deworming; they’re reading it as an introduction to thinking critically about interventions. Bottom line: MacAskill probably should’ve found a better example/line of defense that was literally true, but even this literally false claim serves its purpose in making a broader, true point.
Interpreting GiveWell literally. Jan’s comment was perfect: GiveWell is not the supreme authority on how to interpret their numbers. Holden prefers to give extra weight to expected values with low uncertainty, MacAskill doesn’t, and that’s a legitimate disagreement. In any case, if you think people shouldn’t ever interpret GiveWell’s estimates literally when pitching EA, that’s not a problem with MacAskill, it’s a problem with >90% of the EA community. Bottom line: I think you should drop this argument, I just don’t think it’s correct.
Misrepresenting Charity Navigator. As MacAskill admits, it’s inaccurate to conflate overhead costs and CEO pay. Good find, the specific criticism was correct. But after thinking it through, I think MacAskill’s argument, while botching that single detail, is still a fair criticism of an accurate overall characterization of Charity Navigator. Let’s focus on the donut example. MacAskill says that if a donut charity had a low-paid CEO, CN would rate them highly. You correctly identify that CN cares about things other than CEO pay, and is willing to give good ratings to charities with highly paid CEOs if they do well on other metrics, namely financial stability, accountability, and transparency. BUT, MacAskill’s point I believe would be that none of those other CN metrics have to do with the effectiveness of the intervention or the cause area. CN will let financial stability and low employee costs outweigh a highly-paid CEO, but they won’t let a terrible cause bring down your rating. So if you had a highly efficient, financial well-managed donut charity, CN really would give them a good rating. Bottom line: MacAskill mistakenly conflates CEO pay with overhead costs. But that’s incredibly minor, and no reader is going to be annoyed by it. His fundamental point is correct: CN doesn’t care about cause area or intervention effectiveness, and that’s silly to the point of absurdity.
Further, even if you still think MacAskill unfairly represented CN’s position, I’m willing to cut him a bit slack on it. Do check out their hit piece on effective altruism. It’s aggressive, demeaning, and rude. Yes, it would’ve been better if MacAskill took the perfect high road, but if the inaccuracy really is minor, I think we can excuse it.
Exaggerating PlayPump’s failures. At first, I bought what you said in your comment. Everyone can read what you have to say themselves, but basically, it seems like MacAskill may have exaggerated the reports he cites discussing the failures of the PlayPump. But after a quick Google, it seems like this is another example of a specific line of argumentation that really isn’t rigorous, but that tries to make a fair point in a single sentence. PlayPump was a disaster, everyone agrees, and MacAskill was absolutely not the first to say so. So although MacAskill could’ve better explained specifically why it was a failure, without exaggerating reports, his conclusion is completely fair. I absolutely agree with the importance of honesty, and that bad arguments for a good conclusion are not justified. But this is popular writing, and he really doesn’t have space to fully review all the ins and outs of PlayPumps. Bottom line: I wish MacAskill more accurately justified his view, but nobody who looks into this should feel misled about the overall point of the failure of PlayPumps.
Conclusion: I think you correctly identify several inaccuracies in DGB. But after looking into them myself, I think you really overestimated the importance of these inaccuracies. Except perhaps the deworming example, none of these inaccuracies, if corrected, would change anything important about the conclusions of the book.
Even if you think I’m underestimating the level of inaccuracy, it seems near impossible that this is a sign of malice. If you go into a Barnes and Noble and pick out the popular nonfiction sitting to the left and right of DGB, I think you’d find dozens of inaccuracies far more important than these. Popular writing needs to oversimplify complex debates. DGB does an admirable job of preserving truth while simplifying.
I’ll reiterate that I really do believe in your good faith. You found inaccuracies, and you began worrying about MacAskill’s honesty, which drove you to find more inaccuracies. I think if you step back and consider the charitable interpretation of these flaws, though, you’ll realize that there are good reasons why they’re minor, and that it’s highly unlikely that this is the result of malice.
But finally, regardless of your conclusions on MacAskill’s honesty, I’ll say again that it’s absolutely destructive to open discourse and everyone’s goals to headline your post calling MacAskill a liar. If you want the community to engage this conversation, you have to stick to the substantive disagreements. If consensus concludes that MacAskill importantly and repeatedly fails, people will question his honesty on their own. But I think if the open debate is had, you’ll eventually come around to thinking that these inaccuracies are minor, inconsequential, and accidental.
2. GiveWell. This seems like a good argument. I will think about it.
3. CN. If you read my post and not William’s response to it, I never accuse him of conflating CEO pay and overhead. He deflects my argument by writing about this. This is indeed a minor point.
I specifically accuse him of misquoting CN. As I wrote in other comments here, yes this might indeed be CN’s position and in the end, they would judge the doughnuts charity highly. I do not contend this point and never did. I only wrote that MacAskill (1) quotes CN, (2) makes conclusions based on this quote about CN, (3) the very page that MacAskill takes the quote from says that their position does not lead to these conclusions. And maybe CN is being completely hypocritical! This is not a point. It is still dishonest to misquote them.
4. PlayPumps: I feel like you’re kind of missing the point and I’m wondering if it might be some sort of a fundamental disagreement about unstated assumptions? I think that making dishonest argument that lead to the right conclusions is still dishonest. It seems that you (and many other EAs) feel that if the conclusion is correct, then the fact that the argument was dishonest is not so important (same as with CN). Here’s what you say:
But this is popular writing, and he really doesn’t have space to fully review all the ins and outs of PlayPumps.
And here’s what I wrote in that comment specifically about this argument:
All of what you say seems reasonable. If Doing Good Better was just a popular book—I would not care about all of this stuff. But this book serves as an introduction to Effective Altruism and the whole premise of the book is that it’s objective and uses evidence to arrive to conclusions, etc, and advocates and evidence-based approach to philanthropy. And, although I don’t consider myself EA, a lot of my friends do, and I care about the movement. …
So we cannot judge the book as we would any other popular book where the author has a narrative and peppers it with random studies they found. I’m not so bothered by the misrepresentations per se but by the hypocrisy. …
just in the Introduction, William first trashes PlayPumps (not saying a single good word about them and very liberally exaggerating his sources) and then praises deworming almost as a salvation. And again, this is entirely natural for a popular book—but not for a book that introduces Effective Altruism and evidence-based approach to philanthropy. …
3. MacAskill:
According to the UNICEF report, children sometimes fell off and broke limbs, and some vomited from the spinning. [emphasis mine]
UNICEF report:
Some users reported that children had fallen off and been injured with bruises and cuts, and in one case a child fractured their arm.[emphasis mine]
This is a very good example of a point I’m making—of course a popular book will exaggerate things like that. But again—not a book that advocates an even-handed, evidence-based approach to philanthropy.
And in your conclusion you write:
Except perhaps the deworming example, none of these inaccuracies, if corrected, would change anything important about the conclusions of the book.
Yes! I mostly agree with this! But (1) these are not just inaccuracies. I point out misrepresentations. (2) I believe that making dishonest arguments that advance the right conclusions is dishonest.
Do I understand you correctly that you disagree with me on point (2)?
First, on honesty. As I said above, I completely agree with you on honesty: “bad arguments for a good conclusion are not justified.” This is one of my (and I’d say the EA community as a whole) strongest values. Arguments are not soldiers, their only value is in their own truth. SSC’s In Favor of Niceness, Community, and Civilization sums up my views very well. I’m glad we’re after the same goal.
That said, in popular writing, it’s impossible to reflect the true complexity of what’s being described. So the goal is to simplify as much as possible, while losing as little truth as possible. If someone simplifies in a way that’s importantly misleading, that’s an important failure and should be condemned. But the more I dig into each of these arguments, the more I’m convinced MacAskill is doing a very good job maintaining truth while simplifying.
Charity Navigator. MacAskill says “One popular way of evaluating a charity is to look at financial information regarding how the charity spends its money.” He says that CN takes this approach, and then quotes CN saying that many of the best charities spend 25% or less on overhead. You say this is a misquote, because CN later says that high overhead can be OK if balanced by other indicators of financial health. CN says they like to see charities “that are able to grow their revenue at least at the rate of inflation, that continue to invest in their programs and that have some money saved for a rainy day.”
I see absolutely no misrepresentation here. MacAskill says CN evaluates based on financials such as overhead pay, and quotes CN saying that. He never says that CN only looks at overhead pay, neglecting other financials. In fact, his quote of CN says that overhead indicator is a “strong indicator” in “most” charities, which nobody would interpret as claiming that CN literally only evaluates overhead. The fact that CN does in fact care about financials other than overhead is abundantly clear when reading MacAskill’s summary. MacAskill perfectly represents their view. I doubt someone from CN would ever take issue with that first paragraph.
Playpumps. Charge by charge: 1. After checking out both the UN and SKAT reports, I agree with MacAskill: they’re “damning”. 2. MacAskill says “But in order to pump water, PlayPumps need constant force, and children playing on them would quickly get exhausted.” You quote UNICEF saying “Some primary school children complained of becoming tired very quickly after pushing the pump, particularly as additional torque is required with each rotation to commence the upstroke of the piston.” Look at a video of one in motion, it’s clear that it spins easy for a little while but also constantly requires new force. No misrepresentation. 3. “Children sometimes fell off and broke limbs” is an exaggeration. One child fractured their arm, not multiple. MacAskill misrepresented the number of injuries. 4. The reporter said that PlayPump requires 27 hours of pumping a day in order to meet its ambition of supplying 15 liters a day to 10 million people using 4000 PlayPumps. Assuming one PlayPump per village, that means a village of 2500 would require 27 hours a day of PlayPump to meet their water needs. The only editorializing MacAskill does is call a village of 2500 “typical”. No misrepresentation. 5. MacAskill that PlayPumps often replaced old pumps. You correctly point out that in most countries, that did not happen. Bottom line: You’re right that (i) MacAskill exaggerates the number of children who broke bones; it was one reported case, not multiple; and (ii) MacAskill incorrectly implies that PlayPumps often replaced old pumps, when in fact they rarely did.
Again, thank you for continuing to engage in this in a fair and receptive way. But after spending a lot of time looking into this, I’m less convinced than I ever was of your argument. You have four good points: (i) MacAskill should’ve used other deworming evidence; (ii) MacAskill exaggerated the number of children who broke bones on PlayPumps; (iii) MacAskill incorrectly implies that PlayPumps often replaced old pumps, when in fact they rarely did; (iv) MacAskill incorrectly reported the question asked by a survey on ethical companies. You might have a good point with the John Bunker DALY estimates, but I haven’t looked into it enough.
Framed in the right way, these four points would be helpful, useful feedback for MacAskill. Four slips in 200 pages seems impressively good, but MacAskill surely would have promptly updated his Errata page, and that would be that. Nothing significant whatsoever about the book would’ve changed. But because they were framed as “William MacAskill is a liar”, nobody else has been willing to engage your points, lest they legitimize clearly unfair criticism. Yes, he didn’t make the best response to your points, but to be frank, they were quite unorganized and hard to follow—it’s taken me upwards of 5 hours in sum to get to the bottom of your claims.
At this point, I really don’t think you can justifiably continue to hold your either of your positions: that DGB is significantly inaccurate, or that MacAskill is dishonest. I really do believe that you’re in this in good faith, and that your main error (save the ad hominem attack, likely a judgement error) was in not getting to the bottom of these questions. But now the questions feel very well resolved. Unless the four issues listed above constitute systemic inaccuracy, I really don’t see an argument for it.
Sincerely, thank you for engaging, and if you find these arguments correct, I hope you’ll uphold our value of honesty and apologize to MacAskill for the ad hominem attacks, as well as give him a kinder, more accurate explanation of his inaccuracies. I hope I’ve helped.
Thank you a ton for the time and effort you put into this. I find myself disagreeing with you, but this may reflect my investment in my arguments. I will write to you later, once I reflect on this further.
PlayPumps: I don’t agree with your assessment of points 1, 2, 4.
At this point, I really don’t think you can justifiably continue to hold your either of your positions: that DGB is significantly inaccurate, or that MacAskill is dishonest. I really do believe that you’re in this in good faith, and that your main error (save the ad hominem attack, likely a judgement error) was in not getting to the bottom of these questions. But now the questions feel very well resolved. Unless the four issues listed above constitute systemic inaccuracy, I really don’t see an argument for it.
Sincerely, thank you for engaging, and if you find these arguments correct, I hope you’ll uphold our value of honesty and apologize to MacAskill for the ad hominem attacks, as well as give him a kinder, more accurate explanation of his inaccuracies. I hope I’ve helped.
I have already apologized to MacAskill for the first, even harsher, version of the post. I will certainly apologize to him, if I conclude that the arguments he made were not made in bad faith, but at this point I find that my central point stands.
As I wrote in another comment, thank you for your time and I will let you know later about my conclusions. I will likely rewrite the post after this.
There, I point out that MacAskill responds not to any of the published versions of the essay but to a confidential draft (since he says that I’m quoting him on something that I only quoted him about in a draft).
What do you think about it? Is my interpretation here plausible? What are the other plausible explanations for this? Maybe I fail to see charitable interpretations of how that happened.
I’m not sure how EA Forum displays drafts. It seems very plausible that, on this sometimes confusing platform, you’re mistaken as to which draft was available where and when. If you’re implying that the CEA employee sent MacAskill the draft, then yes, they should not have done that, but MacAskill played no part in that. Further, it seems basic courtesy to let someone respond to your arguments before you publicly call them a liar—you should’ve allowed MacAskill a chance to respond without immediate time pressure.
I’m sorry, this was my fault. You sent me a draft and asked me not to share it, and a few days later in rereading the email and deciding what to do with it, I wasn’t careful and failed to read the part where you asked me not to share it. I shared it with Will at that point, and I apologize for my carelessness.
Well, happens. Although if you forwarded it to Will, then he probably read the part of an email where I ask not to share it with anybody, but proceeded to read that draft and respond to a confidential draft anyway.
I’ve defended MacAskill extensively here, but why are people downvoting to hide this legitimate criticism? MacAskill acknowledged that he did this and apologized.
If there’s a reason please say so, I might be missing something. But downvoting a comment until it disappears without explaining why seems harsh. Thanks!
I didn’t downvote the comment, but it did seem a little harsh to me. I can easily imagine being forwarded a draft article, and reading the text the person forwarding wrote, then looking at the draft, without reading the text in the email they were originally sent. (Hence missing text saying the draft was supposed to be confidential.) Assuming that Will read the part saying it was confidential seemed uncharitable to me (though it turns out to be correct). That seemed in surprising contrast to the understanding attitude taken to Julia’s mistake.
I should note that now we know that William did in fact know that the draft was confidential. Quoting a comment of his above:
In hindsight, once I’d seen that you didn’t want the post shared I should have simply ignored it, and ensured you knew that it had been accidentally shared with me.
I second Julia in her apology. In hindsight, once I’d seen that you didn’t want the post shared I should have simply ignored it, and ensured you knew that it had been accidentally shared with me.
When it was shared with me, the damage had already been done, so I thought it made sense to start prepping a response. I didn’t think your post would change significantly, and at the time I thought it would be good for me to start going through your critique to see if there were indeed grave mistakes in DGB, and offer a speedy response for a more fruitful discussion. I’m sorry that I therefore misrepresented you. As you know, the draft you sent to Julia was quite a bit more hostile than the published version; I can only say that as a result of this I felt under attack, and that clouded my judgment.
As you know, the draft you sent to Julia was quite a bit more hostile than the published version
And the first draft that I sent to my friends was much more hostile than that. Every draft gets toned down and corrected a lot. This is precisely why I ask everybody not to share them.
As I nominate this, Holden Karnofsky recently wrote about “Minimal Trust Investigations” (124 upvotes), similar to Epistemic Spot Checks. This post is an example of such a minimal trust investigation.
The reason why I am nominating this post is that
It seems to me that Guzey was right on several object-level points
The EA community failed both Guzey and itself in a variety of ways, but chiefly by not rewarding good criticism that bites.
That said, as other commenters point out, the post could perhaps use a re-write. Perhaps this decade review would be a good time.
This post is an example of such a minimal trust investigation.
It’s an example of a minimal trust investigation done badly.
In order to get smarter from minimal trust investigations, you need to (a) have a sense of proportion about how to update based on your conclusions, and (b) engage with counterarguments productively. Guzey has demonstrated a wild lack of (a), and while I respect his willingness to engage at length with these counterarguments on this Forum and elsewhere, the apparent lack of any updating (and his continued fixation on the leak screwup years later) speaks pretty badly.
To be clear, I do think this post provided some value, and that versions of this post quite similar to the one that actually exists would have provided much more value. But Guzey’s actual behaviour here is not something we should emulate in the community, beyond the very basic idea of epistemic spot checks on EA books (which I support).
The EA community failed both Guzey and itself in a variety of ways
CEA’s screwup with the essay draft was pretty bad (I’ve said before I think it was sufficiently bad that it should be on their mistakes page). But I was actually quite proud of the way the rest of the community (at least on the Forum) responded to this. Lots of people responded thoughtfully and at length, did the requisite background reading, and acknowledged the basic validity of some of his points. The fact that people didn’t agree with his wildly-out-of-proportion conclusions doesn’t mean they failed him.
Ahhh, but it is not clear to me that this is that disproportionate. In particular, I think this is a problem of EA people having more positive priors about MacAskill. Guzey then starts with more neutral priors, and then correctly updates downwards with his review, and then even more downwards when a promise of confidentiality was breached.
With regards to the contents of the book, I think the size of the downward updates exhibited in the essay dramatically exceeds the actual badness of what was found. Identifying errors is only the first step in an exercise like this – you then have to accurately update based on what those errors tell you. I think e.g. David Roodman’s discussion of this here is a much better example of the kind of work we want to see more of on the Forum.
With regards to the confidentiality screw-up, sure, it’s rational to update downwards in some general sense, but given that the actual consequences were so minor and that the alternative hypothesis (that it was just a mistake) is so plausible, I don’t respect Guzey’s presentation of this incident in his more recent writings (e.g. here).
Do you believe that the following representation of the incident is unfair?
Yes, at present I do.
I haven’t yet seen evidence to support the strong claims you are making about Julia Wise’s knowledge and intentions at various stages in this process. If your depiction of events is true (i.e. Wise both knowingly concealed the leak from you after realising what had happened, and explicitly lied about it somewhere) that seems very bad, but I haven’t seen evidence for that. Her own explanation of what happened seems quite plausible to me.
(Conversely, we do have evidence that MacAskill read your draft, and realised it was confidential, but didn’t tell you he’d seen it. That does seem bad to me, but much less bad than the leak itself – and Will has apologised for it pretty thoroughly.)
Your initial response to Julia’s apology seemed quite reasonable, so I was surprised to see you revert so strongly in your LessWrong comment a few months back. What new evidence did you get that hardened your views here so much?
And that since “the actual consequences were so minor and that the alternative hypothesis (that it was just a mistake) is so plausible” this doesn’t really matter?
It matters – it was a serious error and breach of Wise’s duty of confidentiality, and she has acknowledged it as such (it is now listed on CEA’s mistakes page). But I do think it is important to point out that, other than having your expectation of confidentiality breached per se, nothing bad happened to you.
One reason I think this is important is because it makes the strong “conspiracy” interpretation of these events much less plausible. You present these events as though the intent of these actions was to in some way undermine or discredit your criticisms (you’ve used the word “sabotage”) in order to protect MacAskill’s reputation. But nobody did this, and it’s not clear to me what they plausibly could have done – so what’s the motive?
What sharing the draft with MacAskill did enable was a prepared response – but that’s normal in EA and generally considered good practice when posting public criticism. Said norm is likely a big part of the reason this screw-up happened.
I’m commenting because you are really good in every sense, also your comment is upvoted and together this is a sign that I am wrong. I want to understand more.
Also, the consequent discussion would do as you suggest in giving attention to Guzey’s ideas (although in my comment I don’t find much content in them).
This is technically true but seems to be a nitpick.
What is going on is that MacAskill is probably pointing out that the focus on expenses and not theory of change/effectiveness is a massive hurdle that contributes to the culture of scarcity and “winning two games”. This undermines effectiveness of charities.
I guess that the truth is Charity Navigator has little ability or interest in examining the uses of overhead or understanding theory of change of charities.
It seems that Guzey objects to MacAskill using 0.1% as an exaggeration, and seems to point out Charity Navigator is “ok” with 25%. This isn’t that substantive (and I’m skeptical this is the complete truth of Charity Navigator’s position).
2. MacAskill cites different sets of evidence to support deworming compared to other interventions, deworming fails some metrics :
The truth is that leading scientists often point to different sets of evidence when comparing to different interventions (and there can be bitter disputes between two respected scientists). What probably happened is that MacAskill believed this was a good intervention and cited the current evidence at the time.
To be clear, it’s sometimes the truth and correct to cite different sets of evidence when comparing different interventions.
Even if this was wrong, and this occurred more than once, it’s not clear this is a defect in science or epistemics.
There’s a lot of talk around GiveWell and I’m not an expert, but sometimes I hear some people say GiveWell is conservative. Maybe it’s because of the intense politics that it has to deal with while carrying the flag of rigor. If this is true, maybe focusing on episodes like this is counterproductive.
Also, there seems to be some misfires in criticism and other things that are ungenerous or wrong with the content in Guzey’s essay. It’s impractical to list them all.
Leaking
The vast majority of Guzey’s comment is focused on this episode where his book was “leaked”. It seems this leak really happened. Also, it seems pretty plausible it was an accident, even reading the quotes that Guzey cites to suggests something nefarious is going on.
Guzey suggests this was really bad and hints at retaliation, insinuating how critical or harmful this was to him many times “What happened around the publication of the essay, however, completely dwarfs the issues in the book itself”). But doesn’t describe any issues besides the leaking of the document itself (which seems like it would go to MacAskill soon enough anyways).
Even rounding down all of the other things that are negative signals to me, this fixation on this episode after these years seems like a strong sign to me, and most people I know would be a strong signal to me, and would be a strong signal to most people I know, of the low value of the ideas from this person.
Another way of looking at this is that sometimes, there can be a large supply of people with critical ideas and many of these turn out to be wrong, and vexatious, really. There would be no work done if we didn’t use these heuristics before engaging with their ideas.
For me, I think a crux is that I suspect Guzey faced no retaliation, and that really undermines his apparent fixation. My opinion would be changed if he faced actual repercussions because of the leak or because of ideas in his book.
Even rounding down all of the other things that are negative signals to me, this fixation on this episode after these years seems like a strong sign to me, and most people I know, of the low value of the ideas from this person.
This part sounds deeply wrong to me, but that’s probably because I’ve read more of Guzey’s work besides this one piece.
I occasionally encounter people who would have been happy to burn their youth and spend their efforts on projects that would good bets from an EA perspective. When they don’t get to for one reason or another (maybe they were being too status-seeking too soon, maybe they would have been a great chief of staff or a co-founder together with someone who was epistemically stronger, maybe they didn’t have the right support networks, etc.), it strikes me as regrettable.
I think that some of Guzey’s later work is valuable, and in particular his New Science organization looks like a good bet to take. I think he got some funding from the Survival and Flourishing Fund, which is EA adjacent.
Below is my reply to this comment and your other one.
I’m not sure this is valuable or wise for me to write all this, but it seems better to communicate.
I was sincere about when I said I didn’t understand and wanted to learn why you rated Guzey’s criticism highly. I think I learned a lot more now.
_________________________
You said “This part sounds deeply wrong to me, but that’s probably because I’ve read more of Guzey’s work besides this one piece”:
Note that I made a writing error in the relevant paragraph. It is possible this changed the meaning of my comment and this rightfully offended you. When I said:
Even rounding down all of the other things that are negative signals to me, this fixation on this episode after these years seems like a strong sign to me, and most people I know, of the low value of the ideas from this person.
I meant:
Even rounding down all of the other things that are negative signals to me, this fixation on this episode after these years would be a strong signal to me, and would be a strong signal to most people I know, of the low value of the ideas from this person.
The first version could imply there is some “establishment” (yuck) and that these people share my negative opinion. This is incorrect, I had no other prior knowledge of opinions about Guzey and knew nothing about him.
_________________________
You said here:
I occasionally encounter people who would have been happy to burn their youth and spend their efforts on projects that would good bets from an EA perspective. When they don’t get to for one reason or another (maybe they were being too status-seeking too soon, maybe they would have been a great chief of staff or a co-founder together with someone who was epistemically stronger, maybe they didn’t have the right support networks, etc.), it strikes me as regrettable.
I think that some of Guzey’s later work is valuable, and in particular his New Science organization looks like a good bet to take. I think he got some funding from the Survival and Flourishing Fund, which is EA adjacent.
This seems wise and thoughtful. I didn’t know about this.
_________________________
You made another comment:
I find it surprising that your comment only provides one-sided considerations. As an intuition pump, consider reading this unrelated review, also by Guzey, and checking if you think it is also low quality.
I skimmed this, in the spirit of what you suggested. The truth is that I find reviews like this often on the internet, and I use reviews of this quality for a lot of beliefs.
If I didn’t know anything about Guzey, I would use his review to update in favor his ideas. But at the same time, I find many of Guzey’s choices in content and style different than people who successfully advance scientific arguments.
_________________________
I think that, combined with virtue and good judgement, being loyal to someone is good.
In non-EA contexts, I have tried to back up friends who need support in hostile situations.
In these situations, the truth is that I can become strategic. When being strategic, I try to minimize or even rewrite the content where they were wrong. This can lead to compromise and closure, but this needs coordination and maturity.
I hadn’t realised that your comment on LessWrong was your first public comment on the incident for 3 years. That is an update for me.
But also, I do find it quite strange to say nothing about the incident for years, then come back with a very long and personal (and to me, bitter-seeming) comment, deep in the middle of a lengthy and mostly-unrelated conversation about a completely different organisation.
Commenting on this post after it got nominated for review is, I agree, completely reasonable and expected. That said, your review isn’t exactly very reflective – it reads more as just another chance to rehash the same grievance in great detail. I’d expect a review of a post that generated so much in-depth discussion and argument to mention and incorporate some of that discussion and argument; yours gives the impression that the post was simply ignored, a lone voice in the wilderness. If 72 comments represents deafening silence, I don’t know what noise would look like.
I find it surprising that your comment only provides one-sided considerations. As an intuition pump, consider reading this unrelated review, also by Guzey, and checking if you think it is also low quality.
The low quality of Guzey’s arguments around Doing Good Better (and his unwillingness to update in the face of strong counterarguments) substantially reduced my credence in his (similarly strong) claims about Why We Sleep, and I was confused about why so many people I know put so much credence in the latter after the former.
I think Guzey is very honest in these discussions (and subsequently), and is trying to engage with pushback from the community, which is laudable.
But I don’t think he’s actually changed his views to nearly the degree I would expect a well-meaning rational actor to do so, and I don’t think his views about MacAskill being a bad actor are remotely in proportion to the evidence he’s presented.
For example, relating to your first link, he still makes a big deal of the “interpretation of GiveWell cost-effectiveness estimates” angle, even though everyone (even GiveWell!) think he’s off base here.
On the second link, he has removed most PlayPump material from the current version of his essay, which suggests he has genuinely updated there. So that’s good. That said, if I found out I was as wrong about something as he originally was about PlayPumps, I hope I’d be much more willing to believe that other people might make honest errors of similar magnitude without condemning them as bad actors.
I don’t think unsuccessful applications at organizations that are distantly related to the content you’re criticizing constitute a conflict of interest.
If everybody listed their unsuccessful applications at the start of every EA Forum post, it would take up a lot of reader attention.
I heavily criticize one of the founders of CEA and heavily use the words of the founder of Open Phil in my post, which lead me to believe that I need to disclose that I applied to both organizations.
[EDIT: this was not a very careful comment and multiple claims were stated more strongly than I believed them, as well that my beliefs might have been not so well-supported]
I admire the amount of effort that has gone into this post and its level of rigor. I think it’s very important for an epistemically healthy movement that high-status people can be criticised successfully.
I think your premises do not fully support the conclusion that MacAskill is completely untrustworthy. However, I agree that the book misrepresents sources structurally, and this is a convincing sign it is written in bad faith.
I hope that MacAskill has already realized the book was not up to the standards he now promotes. Writing an introduction to effective altruism was and remains a very difficult task, and at the time there was still a mindset of “push EA even if it’s at the cost of some epistemic honesty”. I think the community has been moving away from this mindset since, and this post is a good addition to that.
We need a better introductory book. (Also because it’s outdated.)
I think the piece is very over the top. Even if all the points were correct, it wouldn’t support the damning conclusion. Some of the points seem fair, some are wrong, and some are extremely uncharitable. If you are going to level accusations of dishonesty and deliberate misrepresentation, then you need to have very strong arguments. This post falls very far short of that
To be clear: Did you downvote Siebe’s post because you disagree, the main post because you disagree, Siebe’s post because you think it’s unhelpful, or the main post because you think it’s unhelpful?
When I saw this comment, it was at “0”points. I’m surprised, because it seems like it is helpful and written in good faith. If someone down-voted it, could you explain why?
I don’t take, “[DGB] misrepresents sources structurally, and this is a convincing sign it is written in bad faith.” to be either:
True. The OP strikes me as tendentiously uncharitable and ‘out for blood’ (given the earlier versions was calling for Will to be disavowed by EA per Gleb Tsipursky, trust in Will down to 0, etc.), and the very worst that should be inferred, even if we grant all the matters under dispute in its favour—which we shouldn’t—would be something like “sloppy, and perhaps with a subconscious finger on the scale tilting the errors to be favourable to the thesis of the book” rather than deceit, malice, or other ‘bad faith’.
Helpful. False accusations of bad faith are obviously toxic. But even true ones should be made with care. I was one of the co-authors on the Intentional Insights document, and in that case (with much stronger evidence suggestive of ‘structural misrepresentation’ or ‘writing things in bad faith’) we refrained as far as practicable from making these adverse inferences. We were criticised for this at the time (perhaps rightly), but I think this is the better direction to err in.
Kind. Self explanatory.
I’m sure Siebe makes their comment in good faith, and I agree some parts of the comment are worthwhile (e.g. I agree it is important that folks in EA can be criticised). But not overall.
I agree with this take on the comment as it’s literally written. I think there’s a chance that Siebe meant ‘written in bad faith’ as something more like ‘written with less attention to detail than it could have been’, which seems like a very reasonable conclusion to come to.
(I just wanted to add a possibly more charitable interpretation, since otherwise the description of why the comment is unhelpful might seem a little harsh)
That seems like taking charitableness too far. I’m alright with finding different interpretations based on the words written, but ultimately, Siebe wrote what we wrote, and it cannot be intepreted as you suggest. It’s quite a big accusation, so caution is required when making it
Okay, points taken. I should have been much more careful given the strength of the accusation, and the accusation that DGB was written “in bad faith” seems (far) too strong.
I guess I have a tendency to support efforts that challenge common beliefs that might not be held for the right reasons (in this case “DGB is a rigourously written book, and a good introduction to effective altruism”). This seemed to outweigh the costs of criticism, likely because my intuition often underestimates the costs of criticism. However, the OP challenged a much stronger common belief (“Will MacAskill is not an untrustworthy person”) and I should have better distinguished those (both in my mind and in writing).
When I was writing it, I was very doubtful about whether I was phrasing it correctly, and I don’t think I succeeded. I think my intention for “written in bad faith” was meant less strongly, but a bit more than ‘written with less attention to detail than it could have been’: i.e. that less attention was given to details that wouldn’t pan out in favour of EA. More along the lines of this:
“sloppy, and perhaps with a subconscious finger on the scale tilting the errors to be favourable to the thesis of the book” rather than deceit, malice, or other ‘bad faith’.
I also have a lower credence in this now. I should add that my use of “convincing” was also too strong a term, as it might be interpreted as >95% credence, instead of the >60% credence I observed at the time of writing.
the essay posted to the Effective Altruism Forum never contained the bit about disavowing Will. I did write this in the version that I posted on my site, and I removed it, after much feedback elsewhere and wrote:
I updated this post significantly, based on feedback from the community. Several of my points were wrong and my tone and conclusions were sometimes inappropriate. I believe that my central point stands, but I apologize to William MacAskill for the first versions of the essay. For previous versions please see Web Archive.
As I wrote in a comment above responding to Will, prior to the publication of my essay I reached out to one of the employees of the CEA and asked them to review my draft. They first agreed, but after I sent the draft, they declined to review it.
As it happens, I found numerous cases of truly egregious cherry-picking, demonstrably false statements, and (no, I’m not kidding) out-of-context mined quotes in just a few pages of Pinker’s “Enlightenment Now.” Take a look for yourself. The terrible scholarship is shocking. https://docs.wixstatic.com/ugd/d9aaad_8b76c6c86f314d0288161ae8a47a9821.pdf
Would this be a good top-level post for the forum? I imagine lots of EAs have read Enlightenment Now or are planning to read it. It seems relevant to highlight the flaws that an influential book might have relating to its treatment of existential risks.
Hi Alexey,
I appreciate that you’ve taken the time to consider what I’ve said in the book at such length. However, I do think that there’s quite a lot that’s wrong in your post, and I’ll describe some of that below. Though I think you have noticed a couple of mistakes in the book, I think that most of the alleged errors are not errors.
I’ll just focus on what I take to be the main issues you highlight, and I won’t address the ‘dishonesty’ allegations, as I anticipate it wouldn’t be productive to do so; I’ll leave that charge for others to assess.
tl;dr:
Of the main issues you refer to, I think you’ve identified two mistakes in the book: I left out a caveat in my summary of the Baird et al (2016) paper, and I conflated overheads costs and CEO pay in a way that, on the latter aspect, was unfair to Charity Navigator.
In neither case are these errors egregious in the way you suggest. I think that: (i) claiming that the Baird et al (2016) should cause us to believe that there is ‘no effect’ on wages is a misrepresentation of that paper; (ii) my core argument against Charity Navigator, regarding their focus on ‘financial efficiency’ metrics like overhead costs, is both successful and accurately depicts Charity Navigator.
I don’t think that the rest of the alleged major errors are errors. In particular: (i) GiveWell were able to review the manuscript before publication and were happy with how I presented their research; the quotes you give generally conflate how to think about GiveWell’s estimates with how to think about DCP2’s estimates; (ii) There are many lines of evidence supporting the 100x multiplier, and I don’t rely at all on the DCP2 estimates, as you imply.
(Also, caveating up front: for reasons of time limitations, I’m going to have to precommit to this being my last comment on this thread.)
(Also, Alexey’s post keeps changing, so if it looks like I’m responding to something that’s no longer there, that’s why.)
1. Deworming
Since the book came out, there has been much more debate about the efficacy of deworming. As I’ve continued to learn about the state and quality of the empirical evidence around deworming, I’ve become less happy with my presentation of the evidence around deworming in Doing Good Better; this fact has been reflected on the errata page on my website for the last two years. On your particular points, however:
Deworming vs textbooks
If textbooks have a positive effect, it’s via how much children learn in school, rather than an incentive for them to spend more time in school. So the fact that there doesn’t seem to be good evidence for textbooks increasing test scores is pretty bad.
If deworming has a positive effect, it could be via a number of mechanisms, including increased school attendance or via learning more in school, or direct health impacts, etc. If there are big gains on any of these dimensions, then deworming looks promising. I agree that more days in school certainly aren’t good in themselves, however, so the better evidence is about the long-run effects.
Deworming’s long-run effects
Here’s how GiveWell describes the study on which I base my discussion of the long-run effects of deworming:
“10-year follow-up: Baird et al. 2016 compared the first two groups of schools to receive deworming (as treatment group) to the final group (as control); the treatment group was assigned 2.41 extra years of deworming on average. The study’s headline effect is that as adults, those in the treatment group worked and earned substantially more, with increased earnings driven largely by a shift into the manufacturing sector.” Then, later: “We have done a variety of analyses to assess the robustness of the core findings from Baird et al. 2016, including reanalyzing the data and code underlying the study, and the results have held up to our scrutiny.”
You are correct that my description of the findings of the Baird et al paper was not fully accurate. When I wrote, “Moreover, when Kremer’s colleagues followed up with the children ten years later, those who had been dewormed were working an extra 3.4 hours per week and earning an extra 20 percent of income compared to those who had not been dewormed,” I should have included the caveat “among non-students with wage employment.” I’m sorry about that, and I’m updating my errata page to reflect this.
As for how much we should update on the basis of the Baird et al paper — that’s a really big discussion, and I’m not going to be able to add anything above what GiveWell have already written (here, here and here). I’ll just note that:
(i) Your gloss on the paper seems misleading to me. If you include people with zero earnings, of course it’s going to be harder to get a statistically significant effect. And the data from those who do have an income but who aren’t in wage employment are noisier, so it’s harder to get a statistically significant effect there too. In particular, see here from the 2015 version of the paper: “The data on [non-agricultural] self-employment profits are likely measured with somewhat more noise. Monthly profits are 22% larger in the treatment group, but the difference is not significant (Table 4, Panel C), in part due to large standard errors created by a few male outliers reporting extremely high profits. In a version of the profit data that trims the top 5% of observations, the difference is 28% (P < 0.10).”
(ii) GiveWell finds the Baird et al paper to be an important part of the evidence behind their support of deworming. If you disagree with that, then you’re engaged in a substantive disagreement with GiveWell’s views; it seems wrong to me to class that as a simple misrepresentation.
2. Cost-effectiveness estimates
Given the previous debate that had occurred between us on how to think and talk about cost-effectiveness estimates, and the mistakes I had made in this regard, I wanted to be sure that I was presenting these estimates in a way that those at GiveWell would be happy with. So I asked an employee of GiveWell to look over the relevant parts of the manuscript of DGB before it was published; in the end five employees did so, and they were happy with how I presented GiveWell’s views and research.
How can that fact be reconciled with the quotes you give in your blog post? It’s because, in your discussion, you conflate two quite different issues: (i) how to represent that cost-effectiveness estimates provided by DCP2, or by single studies; (ii) how to represent the (in my view much more rigorous) cost-effectiveness estimates provided by GiveWell. Almost all the quotes from Holden that you give are about (i). But the quotes you criticise me for are about (ii). So, for example, when I say ‘these estimates’ are order of magnitude estimates that’s referring to (i), not to (ii).
There’s a really big difference between (i) and (ii). I acknowledge that back in 2010 I was badly wrong about the reliability of DCP2 and individual studies, and that GWWC was far too slow to update its web pages after the unreliability of these estimates came to light. But the level of time, care and rigour that have gone into the GiveWell estimates are much greater than those that have gone into the DCP2 estimates. It’s still the case that there’s a huge amount of uncertainty surrounding the GiveWell estimates, but describing them as “the most rigorous estimates” we have seems reasonable to me.
More broadly: Do I really think that you do as much good or more in expectation from donating $3500 to AMF as saving a child’s life? Yes. GiveWell’s estimate of the direct benefits might be optimistic or pessimistic (though it has stayed relatively stable over many years now — the median GiveWell estimate for ‘cost for outcome as good as averting the death of an individual under 5’ is currently $1932), but I really don’t have a view on which is more likely. And, what’s more important, the biggest consideration that’s missing from GiveWell’s analysis is the long-run effects of saving a life. While of course it’s a thorny issue, I personally find it plausible that the long-run expected benefits from a donation to AMF are considerably larger than the short-run benefits — you speed up economic progress just a little bit, in expectation making those in the future just a little bit better off than they would have otherwise been. Because the future is so vast in expectation, that effect is very large. (There’s *plenty* more to discuss on this issue of long-run effects — Might those effects be negative? How should you discount future consumption? etc — but that would take us too far afield.)
3. Charity Navigator
Let’s distinguish: (i) the use of overhead ratio as a metric in assessing charities; (ii) the use of CEO pay as a metric in assessing charities. The idea of evaluating charities on overheads and on the basis of CEO pay are often run together in public discussion, and are both wrong for similar reasons, so I bundled them together in my discussion.
Regarding (ii): CN-of-2014 did talk a lot about CEO pay: they featured CEO pay, in both absolute terms and as a proportion of expenditure, prominently on their charity evaluation pages (see, e.g. their page on Books for Africa), they had top-ten lists like, “10 highly-rated charities with low paid CEOs”, and “10 highly paid CEOs at low-rated charities” (and no lists of “10 highly-rated charities with high paid CEOs” or “10 low-rated charities with low paid CEOs”). However, it is true that CEO pay was not a part of CN’s rating system. And, rereading the relevant passages of DGB, I can see how the reader would have come away with the wrong impression on that score. So I’m sorry about that. (Perhaps I was subconsciously still ornery from their spectacularly hostile hit piece on EA that came out while I was writing DGB, and was therefore less careful than I should have been.) I’ve updated my errata page to make that clear.
Regarding (i): CN’s two key metrics for charities are (a) financial health and (b) accountability and transparency. (a) is in very significant part about the charities’ overheads ratios (in several different forms), where they give a charity a higher score the lower its overheads are, breaking the scores into five broad buckets: see here for more detail. The doughnuts for police officers example shows that a really bad charity could score extremely highly on CN’s metrics, which shows that CN’s metrics must be wrong. Similarly for Books for Africa, which gets a near-perfect score from CN, and features in its ‘ten top-notch charities’ list, in significant part because of its very low overheads, despite having no good evidence to support its program.
I represent CN fairly, and make a fair criticism of its approach to assessing charities. In the extended quote you give, they caveat that very low overheads are not make-or-break for a charity. But, on their charity rating methodology, all other things being equal they give a charity a higher score the lower the charity’s overheads. If that scoring method is a bad one, which it is, then my criticism is justified.
4. Life satisfaction and income and the hundredfold multiplier
The hundredfold multiplier
You make two objections to my 100x multiplier claim: that the DCP2 deworming estimate was off by 100x, and that the Stevenson and Wolfers paper does not support it.
But there are very many lines of evidence in favour of the 100x multiplier, which I reference in Doing Good Better. I mention that there are many independent justifications for thinking that there is a logarithmic (or even more concave) relationship between income and happiness on p.25, and in the endnotes on p.261-2 (all references are to the British paperback edition—yellow cover). In addition to the Stevenson and Wolfers lifetime satisfaction approach (which I discuss later), here are some reasons for thinking that the hundredfold multiplier obtains:
The experiential sampling method of assessing happiness. I mention this in the endnote on p.262, pointing out that, on this method, my argument would be stronger, because on this method the relationship between income and wellbeing is more concave than logarithmic, and is in fact bounded above.
Imputed utility functions from the market behaviour of private individuals and the actions of government. It’s absolutely mainstream economic thought that utility varies with log of income (that is, eta=1 in an isoelastic utility function) or something more concave (eta>1). I reference a paper that takes this approach on p.261, the Groom and Maddison (2013). They estimate eta to be 1.5.
Estimates of cost to save a life. I discuss this in ch.2; I note that this is another strand of supporting evidence prior to my discussion of Stevenson and Wolfers on p.25: “It’s a basic rule of economics that money is less valuable to you the more you have of it. We should therefore expect $1 to provide a larger benefit for an extremely poor Indian farmer than it would for you or me. But how much larger? Economists have sought to answer this question through a variety of methods. We’ll look at some of these in the next chapter, but for now I’ll just discuss one [the Stevenson and Wolfers approach].” Again, you find 100x or more discrepancy in the cost to save a life in rich or poor countries.
Estimate of cost to provide one QALY. As with the previous bullet point.
Note, crucially, that the developing world estimates for cost to provide one QALY or cost to save a life come from GiveWell, not — as you imply — from DCP2 or any individual study.
Is there a causal relationship from income to wellbeing?
It’s true that there Stevenson and Wolfers only shows the correlation is between income and wellbeing. But that there is a causal relationship, from income to wellbeing, is beyond doubt. It’s perfectly obvious that, over the scales we’re talking, higher income enables you to have more wellbeing (you can buy analgesics, healthcare, shelter, eat more and better food, etc).
It’s true that we don’t know exactly the strength of the causal relationship. Understanding this could make my argument stronger or weaker. To illustrate, here’s a quote from another Stevenson and Wolfers paper, with the numerals in square brackets added in by me:
“Although our analysis provides a useful measurement of the bivariate relationship between income and well-being both within and between countries, there are good reasons to doubt that this corresponds to the causal effect of income on well-being. It seems plausible (perhaps even likely) that [i] the within-country well-being-income gradient may be biased upward by reverse causation, as happiness may well be a productive trait in some occupations, raising income. A different perspective, from offered by Kahneman, et al. (2006), suggests that [ii] within-country comparisons overstate the true relationship between subjective well-being and income because of a “focusing illusion”: the very nature of asking about life satisfaction leads people to assess their life relative to others, and they thus focus on where they fall relative to others in regard to concrete measures such as income. Although these specific biases may have a more important impact on within-country comparisons, it seems likely that [iii] the bivariate well-being-GDP relationship may also reflect the influence of third factors, such as democracy, the quality of national laws or government, health, or even favorable weather conditions, and many of these factors raise both GDP per capita and well-being (Kenny, 1999).29 [iv] Other factors, such as increased savings, reduced leisure, or even increasingly materialist values may raise GDP per capita at the expense of subjective well-being. At this stage we cannot address these shortcomings in any detail, although, given our reassessment of the stylized facts, we would suggest an urgent need for research identifying these causal parameters.”
To the extent to which (i), (ii) or (iv) are true, the case for the 100x multiplier becomes stronger. To the extent to which (iii) is true, the case for the 100x multiplier becomes weaker. We don’t know, at the moment, which of these are the most important factors. But, given that the wide variety of different strands of evidence listed in the previous section all point in the same direction, I think that estimating a 100x multiplier as a causal matter is reasonable. (Final point: noting again that all these estimates do not factor in the long-run benefits of donations, which would increase the ratio of benefits others to benefits to yourself even further in the direction of benefits to others.)
On the Stevenson and Wolfers data, is the relationship between income and happiness weaker for poor countries than for rich countries?
If it were the case that money does less to buy happiness (for any given income level) in poor countries than in rich countries, then that would be one counterargument to mine.
However, it doesn’t seem to me that this is true of the Stevenson and Wolfers data. In particular, it’s highly cherry-picked to compare Nigeria and the USA as you do, because Nigeria is a clear outlier in terms of how flat the slope is. I’m only eyeballing the graph, but it seems to me that, of the poorest countries represented (PHL, BGD, EGY, CHN, IND, PAK, NGA, ZAF, IDN), only NGA and ZAF have flatter slopes than USA (and even for ZAF, that’s only true for incomes less than $6000 or so); all the rest have slopes that are similar to or steeper than that of USA (IND, PAK, BGD, CHN, EGY, IDN all seem steeper than USA to me). Given that Nigeria is such an outlier, I’m inclined not to give it too much weight. The average trend across countries, rich and poor, is pretty clear.
Regarding to your point about Cost-effectiveness estimates. Your other objections to my article follow a similar pattern and do not address the substantive points that I raise (I invite the reader to check for themselves).
You do not address my concern here. Here’s the first Web Archive version of the post
My reasoning regarding cost-effectiveness estimates on that page is as follows (I invite the reader to check it):
1. Quote from DGB that shows that you refer to GiveWell’s AMF cost-effectiveness estimates as to “most rigorous” (that does not show much by itself, aside from the fact that it is very strange to write “most rigorous” when GiveWell’s page specifically refers to the “significant uncertainty”)
2. Quote from GW that says:
3. Three quotes from DGB, which demonstrate that you interpret the GW AMF cost-effectiveness estimate literally. In the first two you write about “five hundred times” the benefit on the basis of these estimates. In the third quote you simply cite the one hundred dollars per QALY number, which does not show much by itself, and which I should not have included. Nonetheless, in the first two quotes I show that you interpret GW AMF cost-effectiveness estimates literally.
4. On the basis of these quote I conclude that you misquote GiveWell. Then I ask a question: can I be sure that GW and I mean the same thing by “the literal interpretation” of a cost-effectiveness estimate?
5. I provide quotes from Holden that demonstrate that we mean the same thing by it. In one of the quotes, Holden writes that your 100 times argument (based there on DCP2 deworming estimate) seems to mean that you interpret cost-effectiveness estimates literally.
These 5 steps constitute my argument for your misinterpretation of GW AMF cost-effectiveness estimates.
You do not address this argument in your comment.
technical edit: conflation of deworming and AMF estimates
You write:
If the reader takes their time and looks at the Web Archive link I provided they will see that I do not conflate these estimates. However, it is true that I did conflate them previously: in a confidential draft of the post that I sent to one of the CEA’s employees asking to look at the post prior to its publication and which I requested to not be shared with anyone besides that specific employee I did conflate them (in the end that employee declined to review my draft). I jumped from deworming estimates to AMF estimates in that draft. This fact was pointed out to me by one of my friends and I fixed it prior to the publication.
Edit: besides that CEA employee, I also shared the draft with several of my friends (also asking not to share it with anybody), so I cannot be sure to which exactly version of the post you are replying.
In your comment you write:
As if I quoted you saying something about order of magnitude estimates. I did—in that confidential draft. Again, I invite the reader to check the first public version of my essay archived by Internet Archive and to check whether I provided any quotes where William talks about order of magnitude estimates.
You write:
I did update the essay after the first publication. However, the points you’re responding to here were removed before my publication of the essay. I am not sure why you are responding to the confidential draft.
Edit2: Here is the draft I’m referring to. Please note its status as a draft and that I did not intend it to be seen by public. It contains strong language and a variety of mistakes.
If you CTRL+F “orders of magnitude” in this draft, you will find the quote William refers to.
I wonder why my reply has so many downvotes (-8 score) and no replies. This could of course indicate that my arguments are so bad that they’re not worth engaging with, but the fact that many of the members of the community find my criticism accurate and valuable, this seems unlikely.
As a datapoint, I thought that your reply was so bad that it was not worth engaging in, although I think you did find a couple of inaccuracies in DGB and appreciate the effort you went to. I’ll briefly explain my position.
I thought MacAskill’s explanations were convincing and your counter-argument missed his points completely, to the extent that you seem to have an axe to grind with him. E.g. if GiveWell is happy with how their research was presented in DGB (as MacAskill mentioned), then I really don’t see how you, as an outsider and non-GW representative, can complain that their research is misquoted without having extremely strong evidence. You do not have extremely strong evidence. Even if you did, there’s still the matter that GW’s interpretation of their numbers is not necessarily the only reasonable one (as Jan_Kulveit points out below).
You completely ignored MacAskill’s convincing counter-arguments while simultaneously accusing him of ignoring the substance your argument, so it seemed to me that there was little point in debating it further with you.
I guess this is a valid point of view. Just in case, I emailed GiveWell about this issue.
see edit above
see edit above
Hi William,
Thank you for your response. I apologize for the stronger language that I used in the first public version of this post. I believe that here you do not address most of the points I made either in the first public version or in the version that was up here at the moment of your comment.
I will not change the post here without explicitly noting it, now that you have replied.
I’m in the process of preparing a longer reply to you.
In particular, the version of the essay that I initially posted here did not discuss the strength of the relationship between income and happiness in rich and poor countries—I agree that this was a weak argument.
A technical comment: neither Web Archive, nor archive.fo archive the comments to this post, so I archived this page manually. PDF from my site captured at 2018-11-17 16-48 GMT
edit: a reddit user suggested this archive of this page: http://archive.fo/jUkMB
[comment I’m likely to regret writing; still seems right]
It seems lot of people are reacting by voting, but the karma of the post is 0. It seems to me up-votes and down-votes are really not expressive enough, so I want to add a more complex reaction.
It is really very unfortunate that the post is framed around the question whether Will MacAskill is or is not honest. This is wrong, and makes any subsequent discussion difficult. (strong down-vote) (Also the conclusion (“he is not”) is not really supported by the evidence.)
It is (and was even more in the blog version) over-zealous, interpreting things uncharitably, and suggesting extreme actions. (downvote)
At the same time, it seems really important to have an open and critical discussion, and culture where people can challenge ‘canonical’ EA books and movement leaders. (upvote)
Carefully going through the sources and checking if papers are not cherry-picked and represented truthfully is commendable. (upvote)
Having really good epistemics is really important, in particular with the focus on long-term. Vigilance in this direction seems good. (upvote)
So it seems really a pity the post was not framed as a question somewhere in the direction “do you thing this is epistemically good?”
If I try to imagine something like “steel-maned version of the post”, without questioning honesty, and without making uncharitable inferences, the reaction could have been some useful discussion.
It seems to me
“Doing Good Better” is sometimes more on the “explaining & advocacy of ideas” side than “dispassionate representation of research”.
Given the genre, I would bet the book is in the top quartile on the metric of representing research correctly.
In some of the examples, it seems adding more caveats and reporting in more detail would have been better for readers interested in precision. Likely at the cost of making the text more dry.
Some emotions sometimes creep in: In case of the somewhat uncharitable part about Charity Evaluator, I remembered their much more uncharitable / misrepresenting text attacking effective altruism and GiveWell. Also while they talk about importance of other things, what they actually measure is actually wrong, and criticized correctly. In case of the whole topic… well a lot of evidence points toward things like 100x multiplier being true, meaning that yes, actually it is possible to save many more people. It seems hard to not to have some passion.
Given that several books about long-term future are now written, the update I would take from that is the books mostly about long-term should err more the side of caveating and describing disagreements and explaining uncertainty more, but my feeling is shift in this direction already happened between 2014 and 2018.
I agree with all the points you make here, including on the suggested upvote/downvote distribution, and on the nature of DGB. FWIW, my (current, defeasible) plan for any future trade books I write is that they’d be more highbrow (and more caveated, and therefore drier) than DGB.
I think that’s the right approach for me, at the moment. But presumably at some point the best thing to do (for some people) will be wider advocacy (wider than DGB), which will inevitably involve simplification of ideas. So we’ll have to figure out what epistemic standards are appropriate in that context (given that GiveWell-level detail is off the table).
Some preliminary thoughts on heuristics for this (these are suggestions only):
Standards we’d want to keep as high as ever:
Is the broad brush strokes picture of what is being conveyed accurate? Is there any easy way the broad brush of what is conveyed could have been made more accurate?
Are the sentences being used to support this broad brush strokes picture warranted by the evidence?
Is this the way of communicating the core message about as caveated and detailed as one can reasonably manage?
Standards we’d need to relax:
Does this communicate as much detail as possible with respect to the relevant claims?
Does this communicate all the strongest possible counterarguments to the key claim?
Does this include every reasonable caveat?
I think that a blogpost that does very well with respect to the above, without compromising on the clarity of the core message, is Max Roser’s recent post: ‘The world is much better; The world is awful; The world can be much better’.
Thanks. I think the criteria which standards to keep and which to relax you propose are reasonable.
It seems an important question. I would like someone trying it to study more formally, using for example “value of information” or “rational inattention” frameworks. I can imagine experiments like giving people a longer list of arguments and trying to gather feedback on what was the value for them and then making decisions based on that. (Now this seems to be done mainly based on author’s intuitions.)
I agree Max’s post is doing a really good job!
Hi Jan,
Thanks for the feedback.
You write:
I should point out that in the post I show not just a lack of caveats and details. William misrepresents the evidence. Among other things, he:
cherry picks the variables from a deworming paper he cites
interprets GW’s AMF estimate in a way they specifically asked not to interpret them (“five hundred times” more effective thing — Holden wrote specifically about such arguments that they seem to require taking cost-effectiveness estimates literally)
quotes two sentences from Charity Navigator’s site when the very next sentence shows that the interpretation of the previous sentences is wrong
In a long response William posted here, he did not address any of these points:
he doesn’t mention cherry picking (and neither does his errata page)
he doesn’t mention the fact that GiveWell asked not to interpret their AMF estimate literally
and he writes “I represent CN fairly, and make a fair criticism of its approach to assessing charities.”, which may be true about some general CN’s position, but which has nothing to do with misquoting Charity Navigator.
If the issue was just a lack of detail, of course I would not have written the post in such a tone. Initially, I considered simply emailing him a list of mistakes that I found, but as I mentioned in the post, the volume and egregiousness of misrepresentations lead me to conclude that he argued in bad faith.
edit: I will email GiveWell to clarify what they think about William making claims about 500 times more benefit on the basis of their AMF estimate.
I think I understand you gradually become upset, but it seems in the process you started to miss the more favorable interpretations.
For example, with the “interpretation of the GiveWell estimates”: based on reading a bunch of old discussions on archive, my _impression_ is there was at least in some point of time a genuine disagreement about how to interpret the numbers between Will, Tobi, Holden and possibly others (there was much less disagreement about the numeric values). So if this is the case, it is plausible Will was using his interpretation of the numbers, which was in some sense “bolder” than the GW interpretation. My sense of good epistemic standard is you certainly can do this, but should add a caveat with warning that the authors of the numbers have a different interpretation of them (so it is a miss of caveat). At the same time I can imagine how you can fail to do this without any bad faith—for example, if you are at some point of the discussion confused whether some object-level disagreement continues or not (especially if you ask the other party in the disagreement to check the text). Also, if my impression is correct and the core of the object-level disagreement was quite technical question regarding proper use of Bayesian statistics and EV calculations, it does not seem obvious how to report the disagreement to general public.
In general: switching to the assumption someone is deliberately misleading is a highly slippery slope: it seems with this sort of assumption you can kind of explain everything, often easily, and if you can’t e.g. speak to people in person it may be quite difficult to find anything which would make you the update in the opposite direction.
About cost-effectiveness estimates: I don’t think your interpretation is plausible. The GiveWell page that gives the $3400 estimate, specifically asks not to interpret it literally.
About me deciding that MacAskill is deliberately misleading. Please see my comment in /r/slatestarcodex in response to /u/scottalexander about it. Would love to know what you think.
[because of time constrains, I will focus on just one example now]
Yes, but GiveWell is not some sort of ultimate authority on how their numbers should be interpreted. Take an ab absurdum example: NRA publishes some numbers about guns, gun-related violence, and their interpretation that there are not enough guns in the US and gun violence is low. If you basically agree with numbers, but disagree with their interpretation, surely you can use the numbers and interpret them in a different way.
GiveWell reasoning is explained in this article. Technically speaking you _can_ use the numbers directly as EV estimates if you have a very broad prior, and the prior is all the same across all the actions you are comparing. (You can argue this is technically not the right thing to do, or you can argue that GiveWell advises people not to do it.) As I stated in my original comment, I’d appreciate if such disagreements are reported. At the same time it seems difficult to do it properly in a popular text. I can imagine something like this
being more precise, but you can probably see it is a very different book now. I’d be quite interested in how you would write the paragraph if you wanted to use the number, wanted to give numerical estimate of the cost per live saved and did not want to explain to the reader Bayesian estimates.
This seems like a good argument. Thank you. I will think about it.
Guzey, would you consider rewriting this post, framing it not as questioning MacAskill’s honesty but rather just pointing out some flaws in the representation of research? I fully buy some of your criticisms (it was an epistemic failure to not report that deworming has no effect on test scores, misrepresent Charity Navigator’s views, and misrepresent the “ethical employer” poll). And I think Jan’s views accurately reflect the community’s views: we want to be able to have open discussion and criticism, even of the EA “canon.” But it’s absolutely correct that the personal attacks on MacAskill’s integrity make it near impossible to have this open discussion.
Even if you’re still convinced that MacAskill is dishonest, wouldn’t the best way to prove it to the community be to have a thorough, open debate over these factual question? Then, if it becomes clear that your criticisms are correct, people will be able to judge the honesty issue themselves. I think you’re limiting your own potential here by making people not want to engage with your ideas.
I’d be happy to engage with the individual criticisms here and have some back and forth, if only this was written in a less ad hominem way.
Separately, does anyone have thoughts on the John Bunker DALY estimate? MacAskill claims that a developed world doctor only creates 7 DALYs, Bunker’s paper doesn’t seem to say anything like this, and this 80,000 Hours blog estimates instead that a developed world doctor creates 600 QALYs. Was MacAskill wrong on the effectiveness of becoming a doctor?
Hi smithee,
I do wonder if I should’ve written this post in a less personal tone. I will consider writing a follow up to it.
About me deciding that MacAskill is deliberately misleading, please see my comment in /r/slatestarcodex in response to /u/scottalexander about it. Would love to know what you think.
I’ll headline this by saying that I completely believe you’re doing this in good faith, I agree with several of your criticisms, and I think this deserves to be openly discussed. But I also strongly disagree with your conclusion about MacAskill’s honesty, and, even if I thought it was plausible, it still would be an unnecessary breach of etiquette that makes open conversation near impossible. I really think you should stop making this an argument about MacAskill’s personal honesty. Have the facts debate, leave ad hominem aside so everyone can fully engage, and if you’re proven right on the facts, then raise your honesty concerns.
First I’d like to address your individual points, then your claims about MacAskill.
Misreporting the deworming study. I think this is your best point. It seems entirely correct that if textbooks fail because they don’t improve test scores, that deworming should fail by the same metric. But I agree with /u/ScottAlexander that, in popular writing, you often don’t have the space to specifically go through all the literature on why deworming is better. MacAskill’s deworming claims were misleading on one level, in that the specific argument he provided is not a good one, but also fair on another level: MacAskill/GiveWell has looked tons into deworming, concluded that it’s better than textbooks, and this is the easiest way to illustrate why in a single sentence. Nobody reading this is looking for a survey of the evidence base on deworming; they’re reading it as an introduction to thinking critically about interventions. Bottom line: MacAskill probably should’ve found a better example/line of defense that was literally true, but even this literally false claim serves its purpose in making a broader, true point.
Interpreting GiveWell literally. Jan’s comment was perfect: GiveWell is not the supreme authority on how to interpret their numbers. Holden prefers to give extra weight to expected values with low uncertainty, MacAskill doesn’t, and that’s a legitimate disagreement. In any case, if you think people shouldn’t ever interpret GiveWell’s estimates literally when pitching EA, that’s not a problem with MacAskill, it’s a problem with >90% of the EA community. Bottom line: I think you should drop this argument, I just don’t think it’s correct.
Misrepresenting Charity Navigator. As MacAskill admits, it’s inaccurate to conflate overhead costs and CEO pay. Good find, the specific criticism was correct. But after thinking it through, I think MacAskill’s argument, while botching that single detail, is still a fair criticism of an accurate overall characterization of Charity Navigator. Let’s focus on the donut example. MacAskill says that if a donut charity had a low-paid CEO, CN would rate them highly. You correctly identify that CN cares about things other than CEO pay, and is willing to give good ratings to charities with highly paid CEOs if they do well on other metrics, namely financial stability, accountability, and transparency. BUT, MacAskill’s point I believe would be that none of those other CN metrics have to do with the effectiveness of the intervention or the cause area. CN will let financial stability and low employee costs outweigh a highly-paid CEO, but they won’t let a terrible cause bring down your rating. So if you had a highly efficient, financial well-managed donut charity, CN really would give them a good rating. Bottom line: MacAskill mistakenly conflates CEO pay with overhead costs. But that’s incredibly minor, and no reader is going to be annoyed by it. His fundamental point is correct: CN doesn’t care about cause area or intervention effectiveness, and that’s silly to the point of absurdity.
Further, even if you still think MacAskill unfairly represented CN’s position, I’m willing to cut him a bit slack on it. Do check out their hit piece on effective altruism. It’s aggressive, demeaning, and rude. Yes, it would’ve been better if MacAskill took the perfect high road, but if the inaccuracy really is minor, I think we can excuse it.
Exaggerating PlayPump’s failures. At first, I bought what you said in your comment. Everyone can read what you have to say themselves, but basically, it seems like MacAskill may have exaggerated the reports he cites discussing the failures of the PlayPump. But after a quick Google, it seems like this is another example of a specific line of argumentation that really isn’t rigorous, but that tries to make a fair point in a single sentence. PlayPump was a disaster, everyone agrees, and MacAskill was absolutely not the first to say so. So although MacAskill could’ve better explained specifically why it was a failure, without exaggerating reports, his conclusion is completely fair. I absolutely agree with the importance of honesty, and that bad arguments for a good conclusion are not justified. But this is popular writing, and he really doesn’t have space to fully review all the ins and outs of PlayPumps. Bottom line: I wish MacAskill more accurately justified his view, but nobody who looks into this should feel misled about the overall point of the failure of PlayPumps.
Conclusion: I think you correctly identify several inaccuracies in DGB. But after looking into them myself, I think you really overestimated the importance of these inaccuracies. Except perhaps the deworming example, none of these inaccuracies, if corrected, would change anything important about the conclusions of the book.
Even if you think I’m underestimating the level of inaccuracy, it seems near impossible that this is a sign of malice. If you go into a Barnes and Noble and pick out the popular nonfiction sitting to the left and right of DGB, I think you’d find dozens of inaccuracies far more important than these. Popular writing needs to oversimplify complex debates. DGB does an admirable job of preserving truth while simplifying.
I’ll reiterate that I really do believe in your good faith. You found inaccuracies, and you began worrying about MacAskill’s honesty, which drove you to find more inaccuracies. I think if you step back and consider the charitable interpretation of these flaws, though, you’ll realize that there are good reasons why they’re minor, and that it’s highly unlikely that this is the result of malice.
But finally, regardless of your conclusions on MacAskill’s honesty, I’ll say again that it’s absolutely destructive to open discourse and everyone’s goals to headline your post calling MacAskill a liar. If you want the community to engage this conversation, you have to stick to the substantive disagreements. If consensus concludes that MacAskill importantly and repeatedly fails, people will question his honesty on their own. But I think if the open debate is had, you’ll eventually come around to thinking that these inaccuracies are minor, inconsequential, and accidental.
Thank you for a thoughtful response.
1. Deworming. Seems fair.
2. GiveWell. This seems like a good argument. I will think about it.
3. CN. If you read my post and not William’s response to it, I never accuse him of conflating CEO pay and overhead. He deflects my argument by writing about this. This is indeed a minor point.
I specifically accuse him of misquoting CN. As I wrote in other comments here, yes this might indeed be CN’s position and in the end, they would judge the doughnuts charity highly. I do not contend this point and never did. I only wrote that MacAskill (1) quotes CN, (2) makes conclusions based on this quote about CN, (3) the very page that MacAskill takes the quote from says that their position does not lead to these conclusions. And maybe CN is being completely hypocritical! This is not a point. It is still dishonest to misquote them.
4. PlayPumps: I feel like you’re kind of missing the point and I’m wondering if it might be some sort of a fundamental disagreement about unstated assumptions? I think that making dishonest argument that lead to the right conclusions is still dishonest. It seems that you (and many other EAs) feel that if the conclusion is correct, then the fact that the argument was dishonest is not so important (same as with CN). Here’s what you say:
And here’s what I wrote in that comment specifically about this argument:
And in your conclusion you write:
Yes! I mostly agree with this! But (1) these are not just inaccuracies. I point out misrepresentations. (2) I believe that making dishonest arguments that advance the right conclusions is dishonest.
Do I understand you correctly that you disagree with me on point (2)?
First, on honesty. As I said above, I completely agree with you on honesty: “bad arguments for a good conclusion are not justified.” This is one of my (and I’d say the EA community as a whole) strongest values. Arguments are not soldiers, their only value is in their own truth. SSC’s In Favor of Niceness, Community, and Civilization sums up my views very well. I’m glad we’re after the same goal.
That said, in popular writing, it’s impossible to reflect the true complexity of what’s being described. So the goal is to simplify as much as possible, while losing as little truth as possible. If someone simplifies in a way that’s importantly misleading, that’s an important failure and should be condemned. But the more I dig into each of these arguments, the more I’m convinced MacAskill is doing a very good job maintaining truth while simplifying.
Charity Navigator. MacAskill says “One popular way of evaluating a charity is to look at financial information regarding how the charity spends its money.” He says that CN takes this approach, and then quotes CN saying that many of the best charities spend 25% or less on overhead. You say this is a misquote, because CN later says that high overhead can be OK if balanced by other indicators of financial health. CN says they like to see charities “that are able to grow their revenue at least at the rate of inflation, that continue to invest in their programs and that have some money saved for a rainy day.”
I see absolutely no misrepresentation here. MacAskill says CN evaluates based on financials such as overhead pay, and quotes CN saying that. He never says that CN only looks at overhead pay, neglecting other financials. In fact, his quote of CN says that overhead indicator is a “strong indicator” in “most” charities, which nobody would interpret as claiming that CN literally only evaluates overhead. The fact that CN does in fact care about financials other than overhead is abundantly clear when reading MacAskill’s summary. MacAskill perfectly represents their view. I doubt someone from CN would ever take issue with that first paragraph.
Playpumps. Charge by charge: 1. After checking out both the UN and SKAT reports, I agree with MacAskill: they’re “damning”. 2. MacAskill says “But in order to pump water, PlayPumps need constant force, and children playing on them would quickly get exhausted.” You quote UNICEF saying “Some primary school children complained of becoming tired very quickly after pushing the pump, particularly as additional torque is required with each rotation to commence the upstroke of the piston.” Look at a video of one in motion, it’s clear that it spins easy for a little while but also constantly requires new force. No misrepresentation. 3. “Children sometimes fell off and broke limbs” is an exaggeration. One child fractured their arm, not multiple. MacAskill misrepresented the number of injuries. 4. The reporter said that PlayPump requires 27 hours of pumping a day in order to meet its ambition of supplying 15 liters a day to 10 million people using 4000 PlayPumps. Assuming one PlayPump per village, that means a village of 2500 would require 27 hours a day of PlayPump to meet their water needs. The only editorializing MacAskill does is call a village of 2500 “typical”. No misrepresentation. 5. MacAskill that PlayPumps often replaced old pumps. You correctly point out that in most countries, that did not happen. Bottom line: You’re right that (i) MacAskill exaggerates the number of children who broke bones; it was one reported case, not multiple; and (ii) MacAskill incorrectly implies that PlayPumps often replaced old pumps, when in fact they rarely did.
Again, thank you for continuing to engage in this in a fair and receptive way. But after spending a lot of time looking into this, I’m less convinced than I ever was of your argument. You have four good points: (i) MacAskill should’ve used other deworming evidence; (ii) MacAskill exaggerated the number of children who broke bones on PlayPumps; (iii) MacAskill incorrectly implies that PlayPumps often replaced old pumps, when in fact they rarely did; (iv) MacAskill incorrectly reported the question asked by a survey on ethical companies. You might have a good point with the John Bunker DALY estimates, but I haven’t looked into it enough.
Framed in the right way, these four points would be helpful, useful feedback for MacAskill. Four slips in 200 pages seems impressively good, but MacAskill surely would have promptly updated his Errata page, and that would be that. Nothing significant whatsoever about the book would’ve changed. But because they were framed as “William MacAskill is a liar”, nobody else has been willing to engage your points, lest they legitimize clearly unfair criticism. Yes, he didn’t make the best response to your points, but to be frank, they were quite unorganized and hard to follow—it’s taken me upwards of 5 hours in sum to get to the bottom of your claims.
At this point, I really don’t think you can justifiably continue to hold your either of your positions: that DGB is significantly inaccurate, or that MacAskill is dishonest. I really do believe that you’re in this in good faith, and that your main error (save the ad hominem attack, likely a judgement error) was in not getting to the bottom of these questions. But now the questions feel very well resolved. Unless the four issues listed above constitute systemic inaccuracy, I really don’t see an argument for it.
Sincerely, thank you for engaging, and if you find these arguments correct, I hope you’ll uphold our value of honesty and apologize to MacAskill for the ad hominem attacks, as well as give him a kinder, more accurate explanation of his inaccuracies. I hope I’ve helped.
Thank you a ton for the time and effort you put into this. I find myself disagreeing with you, but this may reflect my investment in my arguments. I will write to you later, once I reflect on this further.
CN: I don’t agree with you
PlayPumps: I don’t agree with your assessment of points 1, 2, 4.
I have already apologized to MacAskill for the first, even harsher, version of the post. I will certainly apologize to him, if I conclude that the arguments he made were not made in bad faith, but at this point I find that my central point stands.
As I wrote in another comment, thank you for your time and I will let you know later about my conclusions. I will likely rewrite the post after this.
Also, I wonder what you think about the second half of this comment of mine in this thread.
There, I point out that MacAskill responds not to any of the published versions of the essay but to a confidential draft (since he says that I’m quoting him on something that I only quoted him about in a draft).
What do you think about it? Is my interpretation here plausible? What are the other plausible explanations for this? Maybe I fail to see charitable interpretations of how that happened.
I’m not sure how EA Forum displays drafts. It seems very plausible that, on this sometimes confusing platform, you’re mistaken as to which draft was available where and when. If you’re implying that the CEA employee sent MacAskill the draft, then yes, they should not have done that, but MacAskill played no part in that. Further, it seems basic courtesy to let someone respond to your arguments before you publicly call them a liar—you should’ve allowed MacAskill a chance to respond without immediate time pressure.
I never posted the draft that had this quote on EA Forum. Further, I clearly asked everyone I sent the drafts not to share them with anybody.
I’m sorry, this was my fault. You sent me a draft and asked me not to share it, and a few days later in rereading the email and deciding what to do with it, I wasn’t careful and failed to read the part where you asked me not to share it. I shared it with Will at that point, and I apologize for my carelessness.
Well, happens. Although if you forwarded it to Will, then he probably read the part of an email where I ask not to share it with anybody, but proceeded to read that draft and respond to a confidential draft anyway.
I’ve defended MacAskill extensively here, but why are people downvoting to hide this legitimate criticism? MacAskill acknowledged that he did this and apologized.
If there’s a reason please say so, I might be missing something. But downvoting a comment until it disappears without explaining why seems harsh. Thanks!
I didn’t downvote the comment, but it did seem a little harsh to me. I can easily imagine being forwarded a draft article, and reading the text the person forwarding wrote, then looking at the draft, without reading the text in the email they were originally sent. (Hence missing text saying the draft was supposed to be confidential.) Assuming that Will read the part saying it was confidential seemed uncharitable to me (though it turns out to be correct). That seemed in surprising contrast to the understanding attitude taken to Julia’s mistake.
I should note that now we know that William did in fact know that the draft was confidential. Quoting a comment of his above:
That’s what I meant by ‘though it turns out to be correct’. Sorry for being unclear.
comment above has 3 votes, −7 score, 0 replies
I second Julia in her apology. In hindsight, once I’d seen that you didn’t want the post shared I should have simply ignored it, and ensured you knew that it had been accidentally shared with me.
When it was shared with me, the damage had already been done, so I thought it made sense to start prepping a response. I didn’t think your post would change significantly, and at the time I thought it would be good for me to start going through your critique to see if there were indeed grave mistakes in DGB, and offer a speedy response for a more fruitful discussion. I’m sorry that I therefore misrepresented you. As you know, the draft you sent to Julia was quite a bit more hostile than the published version; I can only say that as a result of this I felt under attack, and that clouded my judgment.
And the first draft that I sent to my friends was much more hostile than that. Every draft gets toned down and corrected a lot. This is precisely why I ask everybody not to share them.
Just wanted to note that now we know that MacAskill knew that the draft was confidential.
(deleted)
As I nominate this, Holden Karnofsky recently wrote about “Minimal Trust Investigations” (124 upvotes), similar to Epistemic Spot Checks. This post is an example of such a minimal trust investigation.
The reason why I am nominating this post is that
It seems to me that Guzey was right on several object-level points
The EA community failed both Guzey and itself in a variety of ways, but chiefly by not rewarding good criticism that bites.
That said, as other commenters point out, the post could perhaps use a re-write. Perhaps this decade review would be a good time.
It’s an example of a minimal trust investigation done badly.
In order to get smarter from minimal trust investigations, you need to (a) have a sense of proportion about how to update based on your conclusions, and (b) engage with counterarguments productively. Guzey has demonstrated a wild lack of (a), and while I respect his willingness to engage at length with these counterarguments on this Forum and elsewhere, the apparent lack of any updating (and his continued fixation on the leak screwup years later) speaks pretty badly.
To be clear, I do think this post provided some value, and that versions of this post quite similar to the one that actually exists would have provided much more value. But Guzey’s actual behaviour here is not something we should emulate in the community, beyond the very basic idea of epistemic spot checks on EA books (which I support).
CEA’s screwup with the essay draft was pretty bad (I’ve said before I think it was sufficiently bad that it should be on their mistakes page). But I was actually quite proud of the way the rest of the community (at least on the Forum) responded to this. Lots of people responded thoughtfully and at length, did the requisite background reading, and acknowledged the basic validity of some of his points. The fact that people didn’t agree with his wildly-out-of-proportion conclusions doesn’t mean they failed him.
Ahhh, but it is not clear to me that this is that disproportionate. In particular, I think this is a problem of EA people having more positive priors about MacAskill. Guzey then starts with more neutral priors, and then correctly updates downwards with his review, and then even more downwards when a promise of confidentiality was breached.
Am I missing something here?
With regards to the contents of the book, I think the size of the downward updates exhibited in the essay dramatically exceeds the actual badness of what was found. Identifying errors is only the first step in an exercise like this – you then have to accurately update based on what those errors tell you. I think e.g. David Roodman’s discussion of this here is a much better example of the kind of work we want to see more of on the Forum.
With regards to the confidentiality screw-up, sure, it’s rational to update downwards in some general sense, but given that the actual consequences were so minor and that the alternative hypothesis (that it was just a mistake) is so plausible, I don’t respect Guzey’s presentation of this incident in his more recent writings (e.g. here).
(deleted)
Yes, at present I do.
I haven’t yet seen evidence to support the strong claims you are making about Julia Wise’s knowledge and intentions at various stages in this process. If your depiction of events is true (i.e. Wise both knowingly concealed the leak from you after realising what had happened, and explicitly lied about it somewhere) that seems very bad, but I haven’t seen evidence for that. Her own explanation of what happened seems quite plausible to me.
(Conversely, we do have evidence that MacAskill read your draft, and realised it was confidential, but didn’t tell you he’d seen it. That does seem bad to me, but much less bad than the leak itself – and Will has apologised for it pretty thoroughly.)
Your initial response to Julia’s apology seemed quite reasonable, so I was surprised to see you revert so strongly in your LessWrong comment a few months back. What new evidence did you get that hardened your views here so much?
It matters – it was a serious error and breach of Wise’s duty of confidentiality, and she has acknowledged it as such (it is now listed on CEA’s mistakes page). But I do think it is important to point out that, other than having your expectation of confidentiality breached per se, nothing bad happened to you.
One reason I think this is important is because it makes the strong “conspiracy” interpretation of these events much less plausible. You present these events as though the intent of these actions was to in some way undermine or discredit your criticisms (you’ve used the word “sabotage”) in order to protect MacAskill’s reputation. But nobody did this, and it’s not clear to me what they plausibly could have done – so what’s the motive?
What sharing the draft with MacAskill did enable was a prepared response – but that’s normal in EA and generally considered good practice when posting public criticism. Said norm is likely a big part of the reason this screw-up happened.
(deleted)
I don’t agree with this review at all.
I’m commenting because you are really good in every sense, also your comment is upvoted and together this is a sign that I am wrong. I want to understand more.
Also, the consequent discussion would do as you suggest in giving attention to Guzey’s ideas (although in my comment I don’t find much content in them).
Here are comments on object level points in Guzey’s recent reply:
The book misrepresented Charity Navigator’s emphasis on reducing overhead per https://guzey.com/books/doing-good-better/#charity-navigator
This is technically true but seems to be a nitpick.
What is going on is that MacAskill is probably pointing out that the focus on expenses and not theory of change/effectiveness is a massive hurdle that contributes to the culture of scarcity and “winning two games”. This undermines effectiveness of charities.
I guess that the truth is Charity Navigator has little ability or interest in examining the uses of overhead or understanding theory of change of charities.
It seems that Guzey objects to MacAskill using 0.1% as an exaggeration, and seems to point out Charity Navigator is “ok” with 25%. This isn’t that substantive (and I’m skeptical this is the complete truth of Charity Navigator’s position).
2. MacAskill cites different sets of evidence to support deworming compared to other interventions, deworming fails some metrics :
https://guzey.com/books/doing-good-better/#educational-benefits-of-distributing-textbooks-and-deworming
The truth is that leading scientists often point to different sets of evidence when comparing to different interventions (and there can be bitter disputes between two respected scientists). What probably happened is that MacAskill believed this was a good intervention and cited the current evidence at the time.
To be clear, it’s sometimes the truth and correct to cite different sets of evidence when comparing different interventions.
Even if this was wrong, and this occurred more than once, it’s not clear this is a defect in science or epistemics.
There’s a lot of talk around GiveWell and I’m not an expert, but sometimes I hear some people say GiveWell is conservative. Maybe it’s because of the intense politics that it has to deal with while carrying the flag of rigor. If this is true, maybe focusing on episodes like this is counterproductive.
Also, there seems to be some misfires in criticism and other things that are ungenerous or wrong with the content in Guzey’s essay. It’s impractical to list them all.
Leaking
The vast majority of Guzey’s comment is focused on this episode where his book was “leaked”. It seems this leak really happened. Also, it seems pretty plausible it was an accident, even reading the quotes that Guzey cites to suggests something nefarious is going on.
Guzey suggests this was really bad and hints at retaliation, insinuating how critical or harmful this was to him many times “What happened around the publication of the essay, however, completely dwarfs the issues in the book itself”). But doesn’t describe any issues besides the leaking of the document itself (which seems like it would go to MacAskill soon enough anyways).
Even rounding down all of the other things that are negative signals to me, this fixation on this episode after these years
seems like a strong sign to me, and most people I knowwould be a strong signal to me, and would be a strong signal to most people I know, of the low value of the ideas from this person.Another way of looking at this is that sometimes, there can be a large supply of people with critical ideas and many of these turn out to be wrong, and vexatious, really. There would be no work done if we didn’t use these heuristics before engaging with their ideas.
For me, I think a crux is that I suspect Guzey faced no retaliation, and that really undermines his apparent fixation. My opinion would be changed if he faced actual repercussions because of the leak or because of ideas in his book.
This part sounds deeply wrong to me, but that’s probably because I’ve read more of Guzey’s work besides this one piece.
I occasionally encounter people who would have been happy to burn their youth and spend their efforts on projects that would good bets from an EA perspective. When they don’t get to for one reason or another (maybe they were being too status-seeking too soon, maybe they would have been a great chief of staff or a co-founder together with someone who was epistemically stronger, maybe they didn’t have the right support networks, etc.), it strikes me as regrettable.
I think that some of Guzey’s later work is valuable, and in particular his New Science organization looks like a good bet to take. I think he got some funding from the Survival and Flourishing Fund, which is EA adjacent.
Below is my reply to this comment and your other one.
I’m not sure this is valuable or wise for me to write all this, but it seems better to communicate.
I was sincere about when I said I didn’t understand and wanted to learn why you rated Guzey’s criticism highly. I think I learned a lot more now.
_________________________
You said “This part sounds deeply wrong to me, but that’s probably because I’ve read more of Guzey’s work besides this one piece”:
Note that I made a writing error in the relevant paragraph. It is possible this changed the meaning of my comment and this rightfully offended you. When I said:
I meant:
The first version could imply there is some “establishment” (yuck) and that these people share my negative opinion. This is incorrect, I had no other prior knowledge of opinions about Guzey and knew nothing about him.
_________________________
You said here:
This seems wise and thoughtful. I didn’t know about this.
_________________________
You made another comment:
I skimmed this, in the spirit of what you suggested. The truth is that I find reviews like this often on the internet, and I use reviews of this quality for a lot of beliefs.
If I didn’t know anything about Guzey, I would use his review to update in favor his ideas. But at the same time, I find many of Guzey’s choices in content and style different than people who successfully advance scientific arguments.
_________________________
I think that, combined with virtue and good judgement, being loyal to someone is good.
In non-EA contexts, I have tried to back up friends who need support in hostile situations.
In these situations, the truth is that I can become strategic. When being strategic, I try to minimize or even rewrite the content where they were wrong. This can lead to compromise and closure, but this needs coordination and maturity.
(deleted)
I hadn’t realised that your comment on LessWrong was your first public comment on the incident for 3 years. That is an update for me.
But also, I do find it quite strange to say nothing about the incident for years, then come back with a very long and personal (and to me, bitter-seeming) comment, deep in the middle of a lengthy and mostly-unrelated conversation about a completely different organisation.
Commenting on this post after it got nominated for review is, I agree, completely reasonable and expected. That said, your review isn’t exactly very reflective – it reads more as just another chance to rehash the same grievance in great detail. I’d expect a review of a post that generated so much in-depth discussion and argument to mention and incorporate some of that discussion and argument; yours gives the impression that the post was simply ignored, a lone voice in the wilderness. If 72 comments represents deafening silence, I don’t know what noise would look like.
[Edited to soften language.]
I find it surprising that your comment only provides one-sided considerations. As an intuition pump, consider reading this unrelated review, also by Guzey, and checking if you think it is also low quality.
The low quality of Guzey’s arguments around Doing Good Better (and his unwillingness to update in the face of strong counterarguments) substantially reduced my credence in his (similarly strong) claims about Why We Sleep, and I was confused about why so many people I know put so much credence in the latter after the former.
Here are two examples of Guzey updating in response to specific points:
https://forum.effectivealtruism.org/posts/7aqGFHirEvHTMD5w5/william-macaskill-misrepresents-much-of-the-evidence?commentId=mHnp8t97EfwrRA3vg
https://www.reddit.com/r/slatestarcodex/comments/9xluu2/william_macaskill_misrepresents_much_of_the/eby9cwz/
It depends what you mean by “updating”.
I think Guzey is very honest in these discussions (and subsequently), and is trying to engage with pushback from the community, which is laudable.
But I don’t think he’s actually changed his views to nearly the degree I would expect a well-meaning rational actor to do so, and I don’t think his views about MacAskill being a bad actor are remotely in proportion to the evidence he’s presented.
For example, relating to your first link, he still makes a big deal of the “interpretation of GiveWell cost-effectiveness estimates” angle, even though everyone (even GiveWell!) think he’s off base here.
On the second link, he has removed most PlayPump material from the current version of his essay, which suggests he has genuinely updated there. So that’s good. That said, if I found out I was as wrong about something as he originally was about PlayPumps, I hope I’d be much more willing to believe that other people might make honest errors of similar magnitude without condemning them as bad actors.
(deleted)
Is it so hard to believe reasonable people can disagree with you, for reasons other than corruption or conspiracy?
What is your credence that you’re wrong about this?
(deleted)
I don’t think unsuccessful applications at organizations that are distantly related to the content you’re criticizing constitute a conflict of interest.
If everybody listed their unsuccessful applications at the start of every EA Forum post, it would take up a lot of reader attention.
I heavily criticize one of the founders of CEA and heavily use the words of the founder of Open Phil in my post, which lead me to believe that I need to disclose that I applied to both organizations.
[EDIT: this was not a very careful comment and multiple claims were stated more strongly than I believed them, as well that my beliefs might have been not so well-supported]
I admire the amount of effort that has gone into this post and its level of rigor. I think it’s very important for an epistemically healthy movement that high-status people can be criticised successfully.
I think your premises do not fully support the conclusion that MacAskill is completely untrustworthy. However, I agree that the book misrepresents sources structurally, and this is a convincing sign it is written in bad faith.
I hope that MacAskill has already realized the book was not up to the standards he now promotes. Writing an introduction to effective altruism was and remains a very difficult task, and at the time there was still a mindset of “push EA even if it’s at the cost of some epistemic honesty”. I think the community has been moving away from this mindset since, and this post is a good addition to that.
We need a better introductory book. (Also because it’s outdated.)
I think the piece is very over the top. Even if all the points were correct, it wouldn’t support the damning conclusion. Some of the points seem fair, some are wrong, and some are extremely uncharitable. If you are going to level accusations of dishonesty and deliberate misrepresentation, then you need to have very strong arguments. This post falls very far short of that
To be clear: Did you downvote Siebe’s post because you disagree, the main post because you disagree, Siebe’s post because you think it’s unhelpful, or the main post because you think it’s unhelpful?
Thanks. I agree with you that it does not show complete untrustworthiness. Adjusted the language a little bit.
When I saw this comment, it was at “0”points. I’m surprised, because it seems like it is helpful and written in good faith. If someone down-voted it, could you explain why?
Sure.
I don’t take, “[DGB] misrepresents sources structurally, and this is a convincing sign it is written in bad faith.” to be either:
True. The OP strikes me as tendentiously uncharitable and ‘out for blood’ (given the earlier versions was calling for Will to be disavowed by EA per Gleb Tsipursky, trust in Will down to 0, etc.), and the very worst that should be inferred, even if we grant all the matters under dispute in its favour—which we shouldn’t—would be something like “sloppy, and perhaps with a subconscious finger on the scale tilting the errors to be favourable to the thesis of the book” rather than deceit, malice, or other ‘bad faith’.
Helpful. False accusations of bad faith are obviously toxic. But even true ones should be made with care. I was one of the co-authors on the Intentional Insights document, and in that case (with much stronger evidence suggestive of ‘structural misrepresentation’ or ‘writing things in bad faith’) we refrained as far as practicable from making these adverse inferences. We were criticised for this at the time (perhaps rightly), but I think this is the better direction to err in.
Kind. Self explanatory.
I’m sure Siebe makes their comment in good faith, and I agree some parts of the comment are worthwhile (e.g. I agree it is important that folks in EA can be criticised). But not overall.
I agree with this take on the comment as it’s literally written. I think there’s a chance that Siebe meant ‘written in bad faith’ as something more like ‘written with less attention to detail than it could have been’, which seems like a very reasonable conclusion to come to.
(I just wanted to add a possibly more charitable interpretation, since otherwise the description of why the comment is unhelpful might seem a little harsh)
That seems like taking charitableness too far. I’m alright with finding different interpretations based on the words written, but ultimately, Siebe wrote what we wrote, and it cannot be intepreted as you suggest. It’s quite a big accusation, so caution is required when making it
Okay, points taken. I should have been much more careful given the strength of the accusation, and the accusation that DGB was written “in bad faith” seems (far) too strong.
I guess I have a tendency to support efforts that challenge common beliefs that might not be held for the right reasons (in this case “DGB is a rigourously written book, and a good introduction to effective altruism”). This seemed to outweigh the costs of criticism, likely because my intuition often underestimates the costs of criticism. However, the OP challenged a much stronger common belief (“Will MacAskill is not an untrustworthy person”) and I should have better distinguished those (both in my mind and in writing).
When I was writing it, I was very doubtful about whether I was phrasing it correctly, and I don’t think I succeeded. I think my intention for “written in bad faith” was meant less strongly, but a bit more than ‘written with less attention to detail than it could have been’: i.e. that less attention was given to details that wouldn’t pan out in favour of EA. More along the lines of this:
I also have a lower credence in this now. I should add that my use of “convincing” was also too strong a term, as it might be interpreted as >95% credence, instead of the >60% credence I observed at the time of writing.
Thanks!
Hi Gregory,
I should point out that
the essay posted to the Effective Altruism Forum never contained the bit about disavowing Will. I did write this in the version that I posted on my site, and I removed it, after much feedback elsewhere and wrote:
As I wrote in a comment above responding to Will, prior to the publication of my essay I reached out to one of the employees of the CEA and asked them to review my draft. They first agreed, but after I sent the draft, they declined to review it.
As it happens, I found numerous cases of truly egregious cherry-picking, demonstrably false statements, and (no, I’m not kidding) out-of-context mined quotes in just a few pages of Pinker’s “Enlightenment Now.” Take a look for yourself. The terrible scholarship is shocking. https://docs.wixstatic.com/ugd/d9aaad_8b76c6c86f314d0288161ae8a47a9821.pdf
Would this be a good top-level post for the forum? I imagine lots of EAs have read Enlightenment Now or are planning to read it. It seems relevant to highlight the flaws that an influential book might have relating to its treatment of existential risks.