I think this was the part of the report that made me distrust the methodology the most:
Our research partner GiveWell[69] was an expert in the subfield and/or was building further expertise, and we thought it unlikely that we would find donation opportunities better than or equivalent to their current or near-future top charities within our timeframe for this research project (in the case of maternal health, family planning, HIV and other STDs, and health (other)).
Even in the specific cause area, it seemed from the beginning likely that existing GiveWell top charities outperform the ones that this report might find (and from a casual glance at the actual impact values, this has been confirmed, with the impact from GiveWell top charities being at least 2x the impact of the top recommended charities here, such that even if you only care about women’s health you will probably get more value per dollar).
It seems clear to me that in that case, the correct choice would have been to suggest GiveWell top charities as good interventions in this space, even if they are not explicitly targeting women’s empowerment. The fact that no single existing top-GiveWell charity was chosen suggests to me that a major filter that was applied to the prioritization was whether the charity explicitly branded itself as a charity dedicated to women’s empowerment, which I think should clearly be completely irrelevant, and made me highly suspicious of the broader process.
Habryka: Did you see this line in the introduction of this post?
We also recommend charities that are highly cost-effective in improving women’s lives but do not focus exclusively on women’s empowerment. We discuss these organisations, including those recommended by our research partner GiveWell, in other research reports on our website.
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWell’s higher overall impact (if the impact actually was higher; it seems like GiveDirectly isn’t clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhan’s major RCT, though there are lots of other factors in play).
I think I’ve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FP’s selections).
I’d also have to think more about whether second-order effects of the FP recommendations might be important enough to offset differences in the benefits GiveWell measures (e.g. systemic change in norms around sexual assault in some areas—I don’t think I’d end up being convinced without more data, though).
Finally, I’ll point out that this post had some good features worth learning from, even if the language around recommending organizations wasn’t great:
The “why is our recommendation provisional” section around NMNW, which helped me better understand the purpose and audience of FP’s evaluation, and also seems like a useful idea in general (“if your values are X, this seems really good; if Y, maybe not good enough”).
The discussion of how organizations were chosen, and the ways in which they were whittled down (found in the full report).
On the other hand, I didn’t like the introduction, which used a set of unrelated facts to make a general point about “challenges” without making an argument for focusing on “women’s empowerment” over “human empowerment”. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I can’t tell what FP thinks of it.
Note that GiveDirectly in general is a bit of a weird outlier in terms of GiveWell top recommendations, because it’s a lot less cost-effective than the other charities, but is very useful as a “standard candle” for evaluating whether an intervention is potentially a good target for donations. I think being better than GiveDirectly is not sufficient to be a top recommendation for a cause area.
Methodologically, I do think there are a variety of reasons for why you should estimate a regression to the mean in these impact estimates, more so than for GiveDirectly, in large parts because the number of studies in the space is lot lower, and the method of impact is a lot more complicated in a way that allows for selective reporting.
No problem, thanks for your comments anyway and please let me know if any part of your critique remains that I haven’t engaged with. (Please see edit in main post which should have cleared most up)
I think most of my critique still stands, and I am still confused why the report does not actually recommend any GiveWell top charities. The fact that the report is limiting itself to charities that exclusively focus on women’s empowerment seems like a major constraint that makes the investigation a lot less valuable from a broad cause-prioritization perspective (and also for donors who actually care about reducing women’s empowerment, since it seems very likely that the best charities that achieve that aim do not aim to achieve that target exclusively).
Habryka: Did you see this line in the introduction of this post?
Thanks for pointing this out, Aaron! Happy that’s cleared up.
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWell’s higher overall impact (if the impact actually was higher; it seems like GiveDirectly isn’t clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhan’s major RCT, though there are lots of other factors in play).
I think I’ve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FP’s selections).
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didn’t recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
On the other hand, I didn’t like the introduction, which used a set of unrelated facts to make a general point about “challenges” without making an argument for focusing on “women’s empowerment” over “human empowerment”. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I can’t tell what FP thinks of it.
I hope the reason for this is now also clearer, given the purpose of the report.
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didn’t recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
As I mentioned in the other comment, I am still not sure why you do not recommend any GW top charities directly. It seems like your report should answer the question “what charities improve women’s health the most?” not the question “what charities that exclusively focus on women’s health are most effective?”. The second one is a much narrower question and its answer will probably not overlap much with the answer to the first question.
You mention them, but only in a single paragraph. It seems that even from the narrow value perspective of “I only care about women’s empowerment” the question of “are women helped more by GiveWell charities or the charities recommended here?” is a really key question that your report should try to answer.
The top of your report also says the following:
We researched charity programmes to find those that most cost-effectively improve the lives of women and girls.
This however does not actually seem to be the question you are answering, as I mentioned above. I expect the best interventions for women’s empowerment to not exclusively focus on doing so (because there are many many more charities trying to improve overall health, because women’s empowerment seems like it would overlap a lot with general health goals, etc). I even expect them to not overlap that much with GiveWell’s recommendations, though that’s a critique on a higher level that I think we can ignore for now.
To be transparent about my criticism here, the feeling that I’ve gotten from this report, is that the goal of the report was not to answer the question of “how can we best achieve the most good for the value of women’s empowerment?” but was instead focusing on the question “what set of charity recommendations will most satisfy our potential donors, by being rigorous and seeming to cover most of the areas we are supposed to check”.
To be clear, I think the vast majority of organizations fall into this space, even in EA, and I have roughly similar (though weaker) criticisms for GiveWell itself, which focuses on global development charities in a pretty unprincipled way that I think has a lot to do with global development being transparent in a way that more speculative interventions are not (though most of the key staff has switched from GiveWell to OpenPhil now, I think in parts because of the problems of that approach that I am criticizing here).
I think focusing on that transparency can sometimes be worth it for an individual organization in the long run by demonstrating good judgement and therefore attracting additional resources (as it did in the case of GiveWell), but generally results in the work not being particularly useful for answering the real question of “how can we do the most good?”.
And on the margin I think that that kind of research is net-harmful for the overall quality of research and discussion on general cause-prioritization by spreading a methodology that is badly suited for answering the much more difficult questions of that domain (similarly to how p-testing has had a negative effect on psychology research, by it being a methodology that is badly suited for the actual complexity of the domain, while still being well-suited to answer questions in a much narrower domain).
I think overall this report is pretty high-quality by the standards of global development research, but a large number of small things (the choice of focus area, limiting yourself to charities exclusively focused on women’s empowerment, the narrow methodological focus, and I guess my priors for orgs working in this space) give me the sense that this report was not primarily written with the goal of answering the question “what interventions will actually improve women’s lives?” but was instead more trying to do a broad thing, a large part of which was to look rigorous and principled, conform to what your potential donors expect from a rigorous report, be broadly defensible, and fit with the skills and methodologies that your current team has (because those are the skills that are prevalent in the global development community).
And I think all of those aims are reasonable aims for the goal of FP, I just think they together make me expect that EAs with a different set of aims will not benefit much from engaging with this research, and because you can’t be fully transparent about those aims (because doing so would confuse your primary audience or be perceived as deceptive), it will inevitably confuse at least some of the people trying to do something that is more aligned with my aims and detract from what I consider key cause-prioritization work.
This overall leaves me in a place where I am happy about this research and FP existing, and think it will cause valuable resources to be allocated towards important projects, but where I don’t really want a lot more of it to show up on the EA Forum. I respect your work and think what you are doing is broadly good (though I obviously always have recommendations for things I would do differently).
This is to thank you (and others) once more for all your comments here, and to let you know they have been useful and we have incorporated some changes to account for them in a new version of the report, which will be published in March or April. They were also useful in our internal discussion on how to frame our research, and we plan to keep improving our communication around this throughout the rest of the year, e.g. by publishing a blog post / brief on cause prioritisation for our members.
I also largely agree with the views you express in your last post above, insofar as they pertain to the contents of this report specifically. However, very importantly, I should stress that your comments do not apply to FP research generally: we generally choose the areas we research through cause prioritisation / in a cause neutral way, and we do try to answer the question ‘how can we achieve the most good’ in the areas we investigate, not (even) shying away from harder-to-measure impact. In fact, we are moving more and more in the latter direction, and are developing research methodology to do so (see e.g. our recently published methodology brief on policy interventions).
Some of our reports so far have been an exception to these rules for pragmatic (though impact-motivated) reasons, mainly:
We quickly needed to build a large enough ‘basic’ portfolio of relatively high-impact charities, so that we could make good recommendations to our members.
There are some causes our members ask lots of questions about / are extra interested in, and we want to be able to say something about those areas, even if we in the end recommend them to focus on other areas instead, when we find better opportunities there.
But there’s definitely ways in which we can improve the framing of these exceptions, and the comments you provided have already been helpful in that way.
I think this was the part of the report that made me distrust the methodology the most:
Even in the specific cause area, it seemed from the beginning likely that existing GiveWell top charities outperform the ones that this report might find (and from a casual glance at the actual impact values, this has been confirmed, with the impact from GiveWell top charities being at least 2x the impact of the top recommended charities here, such that even if you only care about women’s health you will probably get more value per dollar).
It seems clear to me that in that case, the correct choice would have been to suggest GiveWell top charities as good interventions in this space, even if they are not explicitly targeting women’s empowerment. The fact that no single existing top-GiveWell charity was chosen suggests to me that a major filter that was applied to the prioritization was whether the charity explicitly branded itself as a charity dedicated to women’s empowerment, which I think should clearly be completely irrelevant, and made me highly suspicious of the broader process.
Habryka: Did you see this line in the introduction of this post?
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWell’s higher overall impact (if the impact actually was higher; it seems like GiveDirectly isn’t clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhan’s major RCT, though there are lots of other factors in play).
I think I’ve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FP’s selections).
I’d also have to think more about whether second-order effects of the FP recommendations might be important enough to offset differences in the benefits GiveWell measures (e.g. systemic change in norms around sexual assault in some areas—I don’t think I’d end up being convinced without more data, though).
Finally, I’ll point out that this post had some good features worth learning from, even if the language around recommending organizations wasn’t great:
The “why is our recommendation provisional” section around NMNW, which helped me better understand the purpose and audience of FP’s evaluation, and also seems like a useful idea in general (“if your values are X, this seems really good; if Y, maybe not good enough”).
The discussion of how organizations were chosen, and the ways in which they were whittled down (found in the full report).
On the other hand, I didn’t like the introduction, which used a set of unrelated facts to make a general point about “challenges” without making an argument for focusing on “women’s empowerment” over “human empowerment”. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I can’t tell what FP thinks of it.
Note that GiveDirectly in general is a bit of a weird outlier in terms of GiveWell top recommendations, because it’s a lot less cost-effective than the other charities, but is very useful as a “standard candle” for evaluating whether an intervention is potentially a good target for donations. I think being better than GiveDirectly is not sufficient to be a top recommendation for a cause area.
Methodologically, I do think there are a variety of reasons for why you should estimate a regression to the mean in these impact estimates, more so than for GiveDirectly, in large parts because the number of studies in the space is lot lower, and the method of impact is a lot more complicated in a way that allows for selective reporting.
I did not see that line! I apologize for not reading thoroughly enough.
I do think that makes a pretty big difference, and I retract at least part of my critique, though basically agree with the points you made.
No problem, thanks for your comments anyway and please let me know if any part of your critique remains that I haven’t engaged with. (Please see edit in main post which should have cleared most up)
I think most of my critique still stands, and I am still confused why the report does not actually recommend any GiveWell top charities. The fact that the report is limiting itself to charities that exclusively focus on women’s empowerment seems like a major constraint that makes the investigation a lot less valuable from a broad cause-prioritization perspective (and also for donors who actually care about reducing women’s empowerment, since it seems very likely that the best charities that achieve that aim do not aim to achieve that target exclusively).
Thanks for pointing this out, Aaron! Happy that’s cleared up.
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didn’t recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
I hope the reason for this is now also clearer, given the purpose of the report.
As I mentioned in the other comment, I am still not sure why you do not recommend any GW top charities directly. It seems like your report should answer the question “what charities improve women’s health the most?” not the question “what charities that exclusively focus on women’s health are most effective?”. The second one is a much narrower question and its answer will probably not overlap much with the answer to the first question.
You mention them, but only in a single paragraph. It seems that even from the narrow value perspective of “I only care about women’s empowerment” the question of “are women helped more by GiveWell charities or the charities recommended here?” is a really key question that your report should try to answer.
The top of your report also says the following:
This however does not actually seem to be the question you are answering, as I mentioned above. I expect the best interventions for women’s empowerment to not exclusively focus on doing so (because there are many many more charities trying to improve overall health, because women’s empowerment seems like it would overlap a lot with general health goals, etc). I even expect them to not overlap that much with GiveWell’s recommendations, though that’s a critique on a higher level that I think we can ignore for now.
To be transparent about my criticism here, the feeling that I’ve gotten from this report, is that the goal of the report was not to answer the question of “how can we best achieve the most good for the value of women’s empowerment?” but was instead focusing on the question “what set of charity recommendations will most satisfy our potential donors, by being rigorous and seeming to cover most of the areas we are supposed to check”.
To be clear, I think the vast majority of organizations fall into this space, even in EA, and I have roughly similar (though weaker) criticisms for GiveWell itself, which focuses on global development charities in a pretty unprincipled way that I think has a lot to do with global development being transparent in a way that more speculative interventions are not (though most of the key staff has switched from GiveWell to OpenPhil now, I think in parts because of the problems of that approach that I am criticizing here).
I think focusing on that transparency can sometimes be worth it for an individual organization in the long run by demonstrating good judgement and therefore attracting additional resources (as it did in the case of GiveWell), but generally results in the work not being particularly useful for answering the real question of “how can we do the most good?”.
And on the margin I think that that kind of research is net-harmful for the overall quality of research and discussion on general cause-prioritization by spreading a methodology that is badly suited for answering the much more difficult questions of that domain (similarly to how p-testing has had a negative effect on psychology research, by it being a methodology that is badly suited for the actual complexity of the domain, while still being well-suited to answer questions in a much narrower domain).
I think overall this report is pretty high-quality by the standards of global development research, but a large number of small things (the choice of focus area, limiting yourself to charities exclusively focused on women’s empowerment, the narrow methodological focus, and I guess my priors for orgs working in this space) give me the sense that this report was not primarily written with the goal of answering the question “what interventions will actually improve women’s lives?” but was instead more trying to do a broad thing, a large part of which was to look rigorous and principled, conform to what your potential donors expect from a rigorous report, be broadly defensible, and fit with the skills and methodologies that your current team has (because those are the skills that are prevalent in the global development community).
And I think all of those aims are reasonable aims for the goal of FP, I just think they together make me expect that EAs with a different set of aims will not benefit much from engaging with this research, and because you can’t be fully transparent about those aims (because doing so would confuse your primary audience or be perceived as deceptive), it will inevitably confuse at least some of the people trying to do something that is more aligned with my aims and detract from what I consider key cause-prioritization work.
This overall leaves me in a place where I am happy about this research and FP existing, and think it will cause valuable resources to be allocated towards important projects, but where I don’t really want a lot more of it to show up on the EA Forum. I respect your work and think what you are doing is broadly good (though I obviously always have recommendations for things I would do differently).
Hi Habryka,
This is to thank you (and others) once more for all your comments here, and to let you know they have been useful and we have incorporated some changes to account for them in a new version of the report, which will be published in March or April. They were also useful in our internal discussion on how to frame our research, and we plan to keep improving our communication around this throughout the rest of the year, e.g. by publishing a blog post / brief on cause prioritisation for our members.
I also largely agree with the views you express in your last post above, insofar as they pertain to the contents of this report specifically. However, very importantly, I should stress that your comments do not apply to FP research generally: we generally choose the areas we research through cause prioritisation / in a cause neutral way, and we do try to answer the question ‘how can we achieve the most good’ in the areas we investigate, not (even) shying away from harder-to-measure impact. In fact, we are moving more and more in the latter direction, and are developing research methodology to do so (see e.g. our recently published methodology brief on policy interventions).
Some of our reports so far have been an exception to these rules for pragmatic (though impact-motivated) reasons, mainly:
We quickly needed to build a large enough ‘basic’ portfolio of relatively high-impact charities, so that we could make good recommendations to our members.
There are some causes our members ask lots of questions about / are extra interested in, and we want to be able to say something about those areas, even if we in the end recommend them to focus on other areas instead, when we find better opportunities there.
But there’s definitely ways in which we can improve the framing of these exceptions, and the comments you provided have already been helpful in that way.
Good point, though what about the $60/sexual assault one? That impact even seems better than AMF for combined impact.