When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
“At one end, AMF is 1.3x better than StrongMinds. At the other, StrongMinds is 12x better than AMF. Ultimately, AMF is less cost-effective than StrongMinds under almost all assumptions.
Our general recommendation to donors is StrongMinds.”
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.