Although I agree with much of the criticism against them, the hammering they took felt at best rough and perhaps even unfair.
One general problem with online discourse is that even if each individual makes a fair critique, the net effect of a lot of people doing this can be disproportionate, since there’s a coordination problem. That said, a few things make me think the level of criticism leveled at HLI was reasonable, namely:
HLI was asking for a lot of money ($200k-$1 million).
The critiques people were making seemed (generally) unique, specific, and fair.
The critiques came after some initial positive responses to the post, including responses to the effect of “I’m persuaded by this; how can I donate?”
Does there need to be a “scrutiny rebalancing” of sorts. I would rather other orgs got more scrutiny, rather than development orgs getting less.
I agree with you that GHD organizations tend to be scrutinized more closely, in large part because there is more data to scrutinize. But there is also some logic to balancing scrutiny levels within cause areas. When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
That said, I do think it would be good to apply stricter standards of scrutiny to other EA organizations, without those organizations explicitly opening themselves up to evaluation by posting on the Forum. I wonder if there might be some way to incentivize this kind of review.
When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
“At one end, AMF is 1.3x better than StrongMinds. At the other, StrongMinds is 12x better than AMF. Ultimately, AMF is less cost-effective than StrongMinds under almost all assumptions.
Our general recommendation to donors is StrongMinds.”
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.
One general problem with online discourse is that even if each individual makes a fair critique, the net effect of a lot of people doing this can be disproportionate, since there’s a coordination problem. That said, a few things make me think the level of criticism leveled at HLI was reasonable, namely:
HLI was asking for a lot of money ($200k-$1 million).
The critiques people were making seemed (generally) unique, specific, and fair.
The critiques came after some initial positive responses to the post, including responses to the effect of “I’m persuaded by this; how can I donate?”
I agree with you that GHD organizations tend to be scrutinized more closely, in large part because there is more data to scrutinize. But there is also some logic to balancing scrutiny levels within cause areas. When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
That said, I do think it would be good to apply stricter standards of scrutiny to other EA organizations, without those organizations explicitly opening themselves up to evaluation by posting on the Forum. I wonder if there might be some way to incentivize this kind of review.
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.