These are all important considerations, and while I disagree about the strength of the methodology (it seems stronger than that of many posts Iâve seen be popular on the Forum), I agree that having a more comparison-friendly impact measure would have been good, as well as a justification for why we should care about this subfield within global development.
----
Iâm not sure how the Forum should generally regard âresearch into the best X charityâ for values of âXâ that donât return organizations with metrics comparable to the best charities we know of.
On the one hand, it can be genuinely useful for the community to be able to reach people who care about X by saying âwith our tools, hereâs what we might tell you, but if you trust this work, maybe also look at Yâ.
On the other hand, it may drain time and energy from research into causes that are more promising, or dilute the overall message of EA.
I guess Iâll keep taking posts like this on a case-by-case basis for now, and I thought this particular case was worth a (non-strong) upvote. But I have a better understanding of why one might come to the opposite conclusion.
I think this was the part of the report that made me distrust the methodology the most:
Our research partner GiveWell[69] was an expert in the subfield and/âor was building further expertise, and we thought it unlikely that we would find donation opportunities better than or equivalent to their current or near-future top charities within our timeframe for this research project (in the case of maternal health, family planning, HIV and other STDs, and health (other)).
Even in the specific cause area, it seemed from the beginning likely that existing GiveWell top charities outperform the ones that this report might find (and from a casual glance at the actual impact values, this has been confirmed, with the impact from GiveWell top charities being at least 2x the impact of the top recommended charities here, such that even if you only care about womenâs health you will probably get more value per dollar).
It seems clear to me that in that case, the correct choice would have been to suggest GiveWell top charities as good interventions in this space, even if they are not explicitly targeting womenâs empowerment. The fact that no single existing top-GiveWell charity was chosen suggests to me that a major filter that was applied to the prioritization was whether the charity explicitly branded itself as a charity dedicated to womenâs empowerment, which I think should clearly be completely irrelevant, and made me highly suspicious of the broader process.
Habryka: Did you see this line in the introduction of this post?
We also recommend charities that are highly cost-effective in improving womenâs lives but do not focus exclusively on womenâs empowerment. We discuss these organisations, including those recommended by our research partner GiveWell, in other research reports on our website.
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWellâs higher overall impact (if the impact actually was higher; it seems like GiveDirectly isnât clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhanâs major RCT, though there are lots of other factors in play).
I think Iâve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FPâs selections).
Iâd also have to think more about whether second-order effects of the FP recommendations might be important enough to offset differences in the benefits GiveWell measures (e.g. systemic change in norms around sexual assault in some areasâI donât think Iâd end up being convinced without more data, though).
Finally, Iâll point out that this post had some good features worth learning from, even if the language around recommending organizations wasnât great:
The âwhy is our recommendation provisionalâ section around NMNW, which helped me better understand the purpose and audience of FPâs evaluation, and also seems like a useful idea in general (âif your values are X, this seems really good; if Y, maybe not good enoughâ).
The discussion of how organizations were chosen, and the ways in which they were whittled down (found in the full report).
On the other hand, I didnât like the introduction, which used a set of unrelated facts to make a general point about âchallengesâ without making an argument for focusing on âwomenâs empowermentâ over âhuman empowermentâ. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I canât tell what FP thinks of it.
Note that GiveDirectly in general is a bit of a weird outlier in terms of GiveWell top recommendations, because itâs a lot less cost-effective than the other charities, but is very useful as a âstandard candleâ for evaluating whether an intervention is potentially a good target for donations. I think being better than GiveDirectly is not sufficient to be a top recommendation for a cause area.
Methodologically, I do think there are a variety of reasons for why you should estimate a regression to the mean in these impact estimates, more so than for GiveDirectly, in large parts because the number of studies in the space is lot lower, and the method of impact is a lot more complicated in a way that allows for selective reporting.
No problem, thanks for your comments anyway and please let me know if any part of your critique remains that I havenât engaged with. (Please see edit in main post which should have cleared most up)
I think most of my critique still stands, and I am still confused why the report does not actually recommend any GiveWell top charities. The fact that the report is limiting itself to charities that exclusively focus on womenâs empowerment seems like a major constraint that makes the investigation a lot less valuable from a broad cause-prioritization perspective (and also for donors who actually care about reducing womenâs empowerment, since it seems very likely that the best charities that achieve that aim do not aim to achieve that target exclusively).
Habryka: Did you see this line in the introduction of this post?
Thanks for pointing this out, Aaron! Happy thatâs cleared up.
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWellâs higher overall impact (if the impact actually was higher; it seems like GiveDirectly isnât clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhanâs major RCT, though there are lots of other factors in play).
I think Iâve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FPâs selections).
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didnât recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
On the other hand, I didnât like the introduction, which used a set of unrelated facts to make a general point about âchallengesâ without making an argument for focusing on âwomenâs empowermentâ over âhuman empowermentâ. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I canât tell what FP thinks of it.
I hope the reason for this is now also clearer, given the purpose of the report.
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didnât recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
As I mentioned in the other comment, I am still not sure why you do not recommend any GW top charities directly. It seems like your report should answer the question âwhat charities improve womenâs health the most?â not the question âwhat charities that exclusively focus on womenâs health are most effective?â. The second one is a much narrower question and its answer will probably not overlap much with the answer to the first question.
You mention them, but only in a single paragraph. It seems that even from the narrow value perspective of âI only care about womenâs empowermentâ the question of âare women helped more by GiveWell charities or the charities recommended here?â is a really key question that your report should try to answer.
The top of your report also says the following:
We researched charity programmes to find those that most cost-effectively improve the lives of women and girls.
This however does not actually seem to be the question you are answering, as I mentioned above. I expect the best interventions for womenâs empowerment to not exclusively focus on doing so (because there are many many more charities trying to improve overall health, because womenâs empowerment seems like it would overlap a lot with general health goals, etc). I even expect them to not overlap that much with GiveWellâs recommendations, though thatâs a critique on a higher level that I think we can ignore for now.
To be transparent about my criticism here, the feeling that Iâve gotten from this report, is that the goal of the report was not to answer the question of âhow can we best achieve the most good for the value of womenâs empowerment?â but was instead focusing on the question âwhat set of charity recommendations will most satisfy our potential donors, by being rigorous and seeming to cover most of the areas we are supposed to checkâ.
To be clear, I think the vast majority of organizations fall into this space, even in EA, and I have roughly similar (though weaker) criticisms for GiveWell itself, which focuses on global development charities in a pretty unprincipled way that I think has a lot to do with global development being transparent in a way that more speculative interventions are not (though most of the key staff has switched from GiveWell to OpenPhil now, I think in parts because of the problems of that approach that I am criticizing here).
I think focusing on that transparency can sometimes be worth it for an individual organization in the long run by demonstrating good judgement and therefore attracting additional resources (as it did in the case of GiveWell), but generally results in the work not being particularly useful for answering the real question of âhow can we do the most good?â.
And on the margin I think that that kind of research is net-harmful for the overall quality of research and discussion on general cause-prioritization by spreading a methodology that is badly suited for answering the much more difficult questions of that domain (similarly to how p-testing has had a negative effect on psychology research, by it being a methodology that is badly suited for the actual complexity of the domain, while still being well-suited to answer questions in a much narrower domain).
I think overall this report is pretty high-quality by the standards of global development research, but a large number of small things (the choice of focus area, limiting yourself to charities exclusively focused on womenâs empowerment, the narrow methodological focus, and I guess my priors for orgs working in this space) give me the sense that this report was not primarily written with the goal of answering the question âwhat interventions will actually improve womenâs lives?â but was instead more trying to do a broad thing, a large part of which was to look rigorous and principled, conform to what your potential donors expect from a rigorous report, be broadly defensible, and fit with the skills and methodologies that your current team has (because those are the skills that are prevalent in the global development community).
And I think all of those aims are reasonable aims for the goal of FP, I just think they together make me expect that EAs with a different set of aims will not benefit much from engaging with this research, and because you canât be fully transparent about those aims (because doing so would confuse your primary audience or be perceived as deceptive), it will inevitably confuse at least some of the people trying to do something that is more aligned with my aims and detract from what I consider key cause-prioritization work.
This overall leaves me in a place where I am happy about this research and FP existing, and think it will cause valuable resources to be allocated towards important projects, but where I donât really want a lot more of it to show up on the EA Forum. I respect your work and think what you are doing is broadly good (though I obviously always have recommendations for things I would do differently).
This is to thank you (and others) once more for all your comments here, and to let you know they have been useful and we have incorporated some changes to account for them in a new version of the report, which will be published in March or April. They were also useful in our internal discussion on how to frame our research, and we plan to keep improving our communication around this throughout the rest of the year, e.g. by publishing a blog post /â brief on cause prioritisation for our members.
I also largely agree with the views you express in your last post above, insofar as they pertain to the contents of this report specifically. However, very importantly, I should stress that your comments do not apply to FP research generally: we generally choose the areas we research through cause prioritisation /â in a cause neutral way, and we do try to answer the question âhow can we achieve the most goodâ in the areas we investigate, not (even) shying away from harder-to-measure impact. In fact, we are moving more and more in the latter direction, and are developing research methodology to do so (see e.g. our recently published methodology brief on policy interventions).
Some of our reports so far have been an exception to these rules for pragmatic (though impact-motivated) reasons, mainly:
We quickly needed to build a large enough âbasicâ portfolio of relatively high-impact charities, so that we could make good recommendations to our members.
There are some causes our members ask lots of questions about /â are extra interested in, and we want to be able to say something about those areas, even if we in the end recommend them to focus on other areas instead, when we find better opportunities there.
But thereâs definitely ways in which we can improve the framing of these exceptions, and the comments you provided have already been helpful in that way.
Thanks for writing this out, Habryka!
These are all important considerations, and while I disagree about the strength of the methodology (it seems stronger than that of many posts Iâve seen be popular on the Forum), I agree that having a more comparison-friendly impact measure would have been good, as well as a justification for why we should care about this subfield within global development.
----
Iâm not sure how the Forum should generally regard âresearch into the best X charityâ for values of âXâ that donât return organizations with metrics comparable to the best charities we know of.
On the one hand, it can be genuinely useful for the community to be able to reach people who care about X by saying âwith our tools, hereâs what we might tell you, but if you trust this work, maybe also look at Yâ.
On the other hand, it may drain time and energy from research into causes that are more promising, or dilute the overall message of EA.
I guess Iâll keep taking posts like this on a case-by-case basis for now, and I thought this particular case was worth a (non-strong) upvote. But I have a better understanding of why one might come to the opposite conclusion.
I think this was the part of the report that made me distrust the methodology the most:
Even in the specific cause area, it seemed from the beginning likely that existing GiveWell top charities outperform the ones that this report might find (and from a casual glance at the actual impact values, this has been confirmed, with the impact from GiveWell top charities being at least 2x the impact of the top recommended charities here, such that even if you only care about womenâs health you will probably get more value per dollar).
It seems clear to me that in that case, the correct choice would have been to suggest GiveWell top charities as good interventions in this space, even if they are not explicitly targeting womenâs empowerment. The fact that no single existing top-GiveWell charity was chosen suggests to me that a major filter that was applied to the prioritization was whether the charity explicitly branded itself as a charity dedicated to womenâs empowerment, which I think should clearly be completely irrelevant, and made me highly suspicious of the broader process.
Habryka: Did you see this line in the introduction of this post?
On the other hand, it does seem like a specific GiveWell charity or two should have shown up on this list, or that FP should have explicitly noted GiveWellâs higher overall impact (if the impact actually was higher; it seems like GiveDirectly isnât clearly better than Village Enterprise or Bandhan at boosting consumption, at least based on my reading of p. 5o of the 2018 GD study, which showed a boost of roughly 0.3 standard deviations in monthly consumption vs. 0.2-0.4 SDs for Bandhanâs major RCT, though there are lots of other factors in play).
I think Iâve come halfway around to your view, and would need to read GiveWell and FP studies much more carefully to figure out how I feel about the other half (that is, whether GiveWell charities really do dominate FPâs selections).
Iâd also have to think more about whether second-order effects of the FP recommendations might be important enough to offset differences in the benefits GiveWell measures (e.g. systemic change in norms around sexual assault in some areasâI donât think Iâd end up being convinced without more data, though).
Finally, Iâll point out that this post had some good features worth learning from, even if the language around recommending organizations wasnât great:
The âwhy is our recommendation provisionalâ section around NMNW, which helped me better understand the purpose and audience of FPâs evaluation, and also seems like a useful idea in general (âif your values are X, this seems really good; if Y, maybe not good enoughâ).
The discussion of how organizations were chosen, and the ways in which they were whittled down (found in the full report).
On the other hand, I didnât like the introduction, which used a set of unrelated facts to make a general point about âchallengesâ without making an argument for focusing on âwomenâs empowermentâ over âhuman empowermentâ. I can imagine such an argument being possible (e.g. women are an easy group to target within a population to find people who are especially badly-off, and for whom marginal resources are especially useful), but I canât tell what FP thinks of it.
Note that GiveDirectly in general is a bit of a weird outlier in terms of GiveWell top recommendations, because itâs a lot less cost-effective than the other charities, but is very useful as a âstandard candleâ for evaluating whether an intervention is potentially a good target for donations. I think being better than GiveDirectly is not sufficient to be a top recommendation for a cause area.
Methodologically, I do think there are a variety of reasons for why you should estimate a regression to the mean in these impact estimates, more so than for GiveDirectly, in large parts because the number of studies in the space is lot lower, and the method of impact is a lot more complicated in a way that allows for selective reporting.
I did not see that line! I apologize for not reading thoroughly enough.
I do think that makes a pretty big difference, and I retract at least part of my critique, though basically agree with the points you made.
No problem, thanks for your comments anyway and please let me know if any part of your critique remains that I havenât engaged with. (Please see edit in main post which should have cleared most up)
I think most of my critique still stands, and I am still confused why the report does not actually recommend any GiveWell top charities. The fact that the report is limiting itself to charities that exclusively focus on womenâs empowerment seems like a major constraint that makes the investigation a lot less valuable from a broad cause-prioritization perspective (and also for donors who actually care about reducing womenâs empowerment, since it seems very likely that the best charities that achieve that aim do not aim to achieve that target exclusively).
Thanks for pointing this out, Aaron! Happy thatâs cleared up.
Please see my updates in the main post and let me know if you still have questions about this. (Do you now understand why we didnât recommend any other specific GW- or FP-recommended charity in this report, but referred to them as a group?)
I hope the reason for this is now also clearer, given the purpose of the report.
As I mentioned in the other comment, I am still not sure why you do not recommend any GW top charities directly. It seems like your report should answer the question âwhat charities improve womenâs health the most?â not the question âwhat charities that exclusively focus on womenâs health are most effective?â. The second one is a much narrower question and its answer will probably not overlap much with the answer to the first question.
You mention them, but only in a single paragraph. It seems that even from the narrow value perspective of âI only care about womenâs empowermentâ the question of âare women helped more by GiveWell charities or the charities recommended here?â is a really key question that your report should try to answer.
The top of your report also says the following:
This however does not actually seem to be the question you are answering, as I mentioned above. I expect the best interventions for womenâs empowerment to not exclusively focus on doing so (because there are many many more charities trying to improve overall health, because womenâs empowerment seems like it would overlap a lot with general health goals, etc). I even expect them to not overlap that much with GiveWellâs recommendations, though thatâs a critique on a higher level that I think we can ignore for now.
To be transparent about my criticism here, the feeling that Iâve gotten from this report, is that the goal of the report was not to answer the question of âhow can we best achieve the most good for the value of womenâs empowerment?â but was instead focusing on the question âwhat set of charity recommendations will most satisfy our potential donors, by being rigorous and seeming to cover most of the areas we are supposed to checkâ.
To be clear, I think the vast majority of organizations fall into this space, even in EA, and I have roughly similar (though weaker) criticisms for GiveWell itself, which focuses on global development charities in a pretty unprincipled way that I think has a lot to do with global development being transparent in a way that more speculative interventions are not (though most of the key staff has switched from GiveWell to OpenPhil now, I think in parts because of the problems of that approach that I am criticizing here).
I think focusing on that transparency can sometimes be worth it for an individual organization in the long run by demonstrating good judgement and therefore attracting additional resources (as it did in the case of GiveWell), but generally results in the work not being particularly useful for answering the real question of âhow can we do the most good?â.
And on the margin I think that that kind of research is net-harmful for the overall quality of research and discussion on general cause-prioritization by spreading a methodology that is badly suited for answering the much more difficult questions of that domain (similarly to how p-testing has had a negative effect on psychology research, by it being a methodology that is badly suited for the actual complexity of the domain, while still being well-suited to answer questions in a much narrower domain).
I think overall this report is pretty high-quality by the standards of global development research, but a large number of small things (the choice of focus area, limiting yourself to charities exclusively focused on womenâs empowerment, the narrow methodological focus, and I guess my priors for orgs working in this space) give me the sense that this report was not primarily written with the goal of answering the question âwhat interventions will actually improve womenâs lives?â but was instead more trying to do a broad thing, a large part of which was to look rigorous and principled, conform to what your potential donors expect from a rigorous report, be broadly defensible, and fit with the skills and methodologies that your current team has (because those are the skills that are prevalent in the global development community).
And I think all of those aims are reasonable aims for the goal of FP, I just think they together make me expect that EAs with a different set of aims will not benefit much from engaging with this research, and because you canât be fully transparent about those aims (because doing so would confuse your primary audience or be perceived as deceptive), it will inevitably confuse at least some of the people trying to do something that is more aligned with my aims and detract from what I consider key cause-prioritization work.
This overall leaves me in a place where I am happy about this research and FP existing, and think it will cause valuable resources to be allocated towards important projects, but where I donât really want a lot more of it to show up on the EA Forum. I respect your work and think what you are doing is broadly good (though I obviously always have recommendations for things I would do differently).
Hi Habryka,
This is to thank you (and others) once more for all your comments here, and to let you know they have been useful and we have incorporated some changes to account for them in a new version of the report, which will be published in March or April. They were also useful in our internal discussion on how to frame our research, and we plan to keep improving our communication around this throughout the rest of the year, e.g. by publishing a blog post /â brief on cause prioritisation for our members.
I also largely agree with the views you express in your last post above, insofar as they pertain to the contents of this report specifically. However, very importantly, I should stress that your comments do not apply to FP research generally: we generally choose the areas we research through cause prioritisation /â in a cause neutral way, and we do try to answer the question âhow can we achieve the most goodâ in the areas we investigate, not (even) shying away from harder-to-measure impact. In fact, we are moving more and more in the latter direction, and are developing research methodology to do so (see e.g. our recently published methodology brief on policy interventions).
Some of our reports so far have been an exception to these rules for pragmatic (though impact-motivated) reasons, mainly:
We quickly needed to build a large enough âbasicâ portfolio of relatively high-impact charities, so that we could make good recommendations to our members.
There are some causes our members ask lots of questions about /â are extra interested in, and we want to be able to say something about those areas, even if we in the end recommend them to focus on other areas instead, when we find better opportunities there.
But thereâs definitely ways in which we can improve the framing of these exceptions, and the comments you provided have already been helpful in that way.
Good point, though what about the $60/âsexual assault one? That impact even seems better than AMF for combined impact.