The Happier Lives Institute have helped many people (including me) open their eyes to Subjective Wellbeing and perhaps even update us to the potential value of SWB. The recent heavy discussion (60+ comments) on their fundraising thread disheartened me. Although I agree with much of the criticism against them, the hammering they took felt at best rough and perhaps even unfair. I’m not sure exactly why I felt this way, but here are a few ideas.
(High certainty) HLI have openly published their research and ideas, posted almost everything on the forum and engaged deeply with criticism which is amazing—more than perhaps any other org I have seen. This may (uncertain) have hurt them more than it has helped them.
(High certainty) When other orgs are criticised or asked questions, they often don’t reply at all, or get surprisingly little criticism for what I and many EAs might consider poor epistemics and defensiveness in their posts (for charity I’m not going to link to the handful I can think of). Why does HLI get such a hard time while others get a pass? Especially when HLI’s funding is less than many of orgs that have not been scrutinised as much.
(Low certainty) The degree of scrutiny and analysis of some development orgs in general like HLI seems to exceed that of AI orgs, Funding orgs and Community building orgs. This scrutiny has been intense- more than one amazing statistician has picked apart their analysis. This expert-level scrutiny is fantastic, I just wish it could be applied to other orgs as well. Very few EA orgs (at least that have been posted on the forum) produce full papers with publishable level deep statistical analysis like HLI have at least attempted to do. Does there need to be a “scrutiny rebalancing” of sorts. I would rather other orgs got more scrutiny, rather than development orgs getting less.
Other orgs might see threads like the HLI funding thread hammering and compare it with other threads where orgs are criticised and don’t engage, so the thread falls off the frontpage. Orgs might reasonably decide that high degrees of transparency and engagement might do them net harm rather than good. This might not be good for anyone
Do you agree/disagree? And what could we do to make the situation better?
I think it’s fairest to compare HLI’s charity analysis with other charity evaluators like Givewell, ACE, and Giving Green.
Giving Green has been criticised regularly and robustly (just look up any of their posts). Givewell publish their analysis and engage with criticism; HLI themselves have actually criticised them pretty robustly! I don’t know about ACE because I don’t stay up to date on animals but I bet it’s similar there.
The dynamics are quite different for example in charitable foundations where they don’t need to convince anyone to donate differently, or charities that deliver a service who only need to convince their funders to continue donating.
Thanks Kristen for this clear and concise reply. This comparison with the experience of other charity evaluators has shifted my opinion on this somewhat nice one.
It seems a bit of a pity that they should receive significantly more scrutiny than charities or foundations though. In an ideal world everyone should be transparent and heavily scrutinised but it does make sense that the incentives might not be there for other orgs...
I agree that more orgs should get this kind of scrutiny. I agree that we are likely to blindly trust orgs that don’t transparently discuss their inst workings, which is super sad.
Interesting reflection on Mental health providers too, be that’s not a world I know!
This argument I struggle with...
“I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque”
I realize you are probably beinga bit tongue in cheek, but I think we could criticise and discuss while being more encouraging and positive. We are all human, to and I’m not sure piling on the “hate” will necessarily lead to improvement in epistemics and rigorous analysis.
HLI fucked up their analysis, but because it was public we found out about it. Most EAs are too fearful to expose their work to scrutiny. Compare them to others who work on mental health within EA...
Most coaches and therapists in EA don’t do any rigorous testing of whether what they are doing actually works. They don’t even allow you to leave public reviews for them. I think we’re the only organisation to even have a TrustPilot!!!
I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque.
Now HLI have been dragged through the mud, you can bet your ass they won’t be making the same mistakes again. So long as they keep being transparent, they’ll keep learning and growing as an org. Others will keep making the same mistakes indefinitely, only we’ll never know about it and will continue blindly trusting them.
I agree that more orgs should get this kind of scrutiny. I agree that we are likely to blindly trust orgs that don’t transparently discuss their inst workings, which is super sad.
Interesting reflection on Mental health providers too, be that’s not a world I know!
This argument I struggle with...
“I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque”
I realize you are probably beinga bit tongue in cheek, but I think we could criticise and discuss while being more encouraging and positive. We are all human, to and I’m not sure piling on the “hate” will necessarily lead to improvement in epistemics and rigorous analysis.
Although I agree with much of the criticism against them, the hammering they took felt at best rough and perhaps even unfair.
One general problem with online discourse is that even if each individual makes a fair critique, the net effect of a lot of people doing this can be disproportionate, since there’s a coordination problem. That said, a few things make me think the level of criticism leveled at HLI was reasonable, namely:
HLI was asking for a lot of money ($200k-$1 million).
The critiques people were making seemed (generally) unique, specific, and fair.
The critiques came after some initial positive responses to the post, including responses to the effect of “I’m persuaded by this; how can I donate?”
Does there need to be a “scrutiny rebalancing” of sorts. I would rather other orgs got more scrutiny, rather than development orgs getting less.
I agree with you that GHD organizations tend to be scrutinized more closely, in large part because there is more data to scrutinize. But there is also some logic to balancing scrutiny levels within cause areas. When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
That said, I do think it would be good to apply stricter standards of scrutiny to other EA organizations, without those organizations explicitly opening themselves up to evaluation by posting on the Forum. I wonder if there might be some way to incentivize this kind of review.
When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
“At one end, AMF is 1.3x better than StrongMinds. At the other, StrongMinds is 12x better than AMF. Ultimately, AMF is less cost-effective than StrongMinds under almost all assumptions.
Our general recommendation to donors is StrongMinds.”
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.
My read is it wasn’t the statistics they got hammered on misrepresenting other people’s views of them as endorsements e.g. James Snowden’s views. I will also say the AI side does get this criticism but not on cost-effectiveness but on things like culture war (AI Ethics vs. AI Safety) and dooming about techniques (e.g. working in a big company vs. more EA aligned research group and RLHF discourse).
Yes in that post the misrepresentation was part of the criticism they receive (which they engaged with and was at least partially corrected which is impressive) but I think the statistical analysis bore the most heavy overall criticism in that post, and in other earlier posts.
“Fair” and “unfair” are tricky words to nail down.
I think there are a wide range of factors that explain why HLI has been treated differently than other orgs -- some “fair” under most definitions of the word, some less so. Some of those reasons are adjacent to questions of funding and influence, but I’m not sure they provide much room to criticize HLI’s critics.
HLI is running in a lane—global health/development/wellbeing—where the evidentiary standards are much higher than in longtermist areas. Part of this is the nature of the work; asking a biosecurity program how many pandemics it has prevented is not workable. Part of it is that there is a very well-funded organization that has been doing CEAs that conensus views as high-quality. Yet another aspect is that GHDW work has been much more limited by funding constraints, which has incentivized GHDW funders to adopt higher standards.
I think people generally need to be kinder to smaller-scale, early-stage efforts . . . but see point 3 below.
HLI is a charity recommender, a significant portion of whose focus currently involves making recommendations to ordinary people (not megadonors, foundations, etc.) I do think the level of scrutiny should ordinarily be higher for charity recommenders, especially those making recommendations to the general public. The purpose of a charity recommender is to evaluate the relative merits of various charities, and for ordinary donors their recommendations may be seen as near-authoritative. A sense that the community needs to carefully scrutinize the recommender’s work destroys much of a recommender’s value proposition in the first place. And while it’s not very utilitarian of me, I do feel more protective of small donors who don’t have an in-house staff to pick up on a recommender’s mistakes.
I think an overconfident marketing campaign in 2022 did play a major role in how much grace people are willing to extend on the CEA. I haven’t been around that long, but this does seem to significantly distinguish HLI from other orgs. I believe that HLI has expressed regret for certain statements, but a framework that compares statements made at that time (that have not been clearly and explicitly retracted) to what the data actually support strikes me as on the “fair” side of the ledger.
This was HLI’s first major recommendation; people would be less prone to draw negative inferences about (e.g.) an org whose first four analyses/recommendations were fine but whose fifth had some significant issues.
StrongMinds spends (and could potentially fundraise) enough money to make a significant dive into its cost-effectiveness worthwhile for critics, but probably not so much as to justify an airtight multi-million dollar workup (including by commissioning our own studies to fill any major holes in the data that would have a big effect on the CEA). So it’s an awkward-size program to evaluate.
Pretty much all skeptical analysis is done by volunteers on their own time, and so the volume/quality of that work will heavily depend on who is interested in and available to doing it. It’s plausible to me that having a controversial and/or novel framework could motivate more critics to volunteer for duty.
There could also be a snowball effect; the detection of one significant weakness in a CEA may motivate others to start looking.
HLI asked Forum users to contribute money. Although I take a wide stance on “standing” to criticize organizations, one could reasonably characterize asking users for action as opening the door to some extent. Having an active fundraising ask may also provide a more concrete payoff/impact for criticism, by preventing users from taking an action the critic found undesirable.
HLI has been unusually transparent with data and responsive to criticism, which has made such criticism easier and kept it up longer. I think you’re right to be concerned about the ferocity of criticism disincentivizing trransparency and opennness on the margin.
The barriers to criticizing HLI are much lower. Because HLI has little power, no one is concerned about blowback. Compare that to the recent Omega criticisms of AI labs, which were posted psuedonymously and which had to rely on undisclosed data. Criticism from established community members who sign their work and can show their work carries more weight, and there’s a disincentive to writing anonymous criticism (you’ll never get any credit for it).
Several of these points are at least adjacent to questions of funding and power, and they cumulatively make me feel at least somewhat uncomfortable, e.g.:
It’s unlikely an organization with more secure funding would have made a fundraising appeal at this time. Rather, it likely would have laid low until it had produced a new CEA for SM and until more time had passed since the prior harsh posts.
HLI may have felt pressure to be particularly transparent and responsive than a more established org. It’s unlikely HLI would have been taken seriously if it didn’t show its receipts, and it doesn’t have the power/prestige needed for a “no real comment” approach to criticism to have a good shot at working.
That being said, I find it challenging to assign much fault for those factors to the Forum user community on those. For example, in point 10, the unfairness is not that HLI is being criticized by named users who have built up a reputation, but that the criticism of other orgs is disincentivized and psuedonymous.
I think you’re right that the response to HLI may discourage transparency and responsiveness on the margin, and that this is a problem. As a practical matter, I think there are two factors that mitigate this to some extent. One is that I think the criticism of HLI reflects a convergence of a number of factors as listed above, and I’m not sure how much marginal effect comes from their good transparency and responsiveness. Second, I think any startup org trying to pursue HLI-like goals has to be transparent and responsive to get a hearing from the community, so I think it less likely that knowledge of current events will change another org’s stance to a materially less open and responsive one.
I’m undecided on the net effect of all of this. My hope is that it will ultimately result in adoption of better epistemic safeguards and communications management—both at HLI and elsewhere in the ecosystem. (Cf. my recent post on the HLI thread). That would be a good result, although I’d still wish we had gotten there with a lot less rancor.
Quite right. Far too much scrutiny was applied to HLI. Five thousand words autistic debunkings, though highly entertaining to read and no doubt equally entertaining to their authors, should not have been necessary. Any reasonable model of how the world works would not perhaps not quite rule the idea of group therapy in poor countries out of court, but require an incredibly high standard of evidence to even begin discussing it somewhat politely.
On the subject of scrutinizing other orgs, I note that some hardworking but anonymous EAs have done their best to scrutizine EA’s various AI research orgs, but of course this is much more specialized endeavour requiring deeper expertise and is also entirely pointless because OpenPhil will probably fund them anyway.
We’ve banned Sol3:2 for 3 weeks. This comment is uncivil and was reported multiple times. Other comments have been reported in the past for similar reasons.
I want to note that criticism can be extremely valuable, and we have a slightly higher bar for taking mod action against criticism. But referring to analyses of HLI’s work as “autistic” clearly violates core Forum norms and is above that bar. I think it’s possible to outline strong disagreements while still following our norms, and we’d want to see this from Sol3:2 in the future.
If Sol3:2 thinks that this is not right, they can appeal.
The Happier Lives Institute have helped many people (including me) open their eyes to Subjective Wellbeing and perhaps even update us to the potential value of SWB. The recent heavy discussion (60+ comments) on their fundraising thread disheartened me. Although I agree with much of the criticism against them, the hammering they took felt at best rough and perhaps even unfair. I’m not sure exactly why I felt this way, but here are a few ideas.
(High certainty) HLI have openly published their research and ideas, posted almost everything on the forum and engaged deeply with criticism which is amazing—more than perhaps any other org I have seen. This may (uncertain) have hurt them more than it has helped them.
(High certainty) When other orgs are criticised or asked questions, they often don’t reply at all, or get surprisingly little criticism for what I and many EAs might consider poor epistemics and defensiveness in their posts (for charity I’m not going to link to the handful I can think of). Why does HLI get such a hard time while others get a pass? Especially when HLI’s funding is less than many of orgs that have not been scrutinised as much.
(Low certainty) The degree of scrutiny and analysis of some development orgs in general like HLI seems to exceed that of AI orgs, Funding orgs and Community building orgs. This scrutiny has been intense- more than one amazing statistician has picked apart their analysis. This expert-level scrutiny is fantastic, I just wish it could be applied to other orgs as well. Very few EA orgs (at least that have been posted on the forum) produce full papers with publishable level deep statistical analysis like HLI have at least attempted to do. Does there need to be a “scrutiny rebalancing” of sorts. I would rather other orgs got more scrutiny, rather than development orgs getting less.
Other orgs might see threads like the HLI funding thread hammering and compare it with other threads where orgs are criticised and don’t engage, so the thread falls off the frontpage. Orgs might reasonably decide that high degrees of transparency and engagement might do them net harm rather than good. This might not be good for anyone
Do you agree/disagree? And what could we do to make the situation better?
I think it’s fairest to compare HLI’s charity analysis with other charity evaluators like Givewell, ACE, and Giving Green.
Giving Green has been criticised regularly and robustly (just look up any of their posts). Givewell publish their analysis and engage with criticism; HLI themselves have actually criticised them pretty robustly! I don’t know about ACE because I don’t stay up to date on animals but I bet it’s similar there.
The dynamics are quite different for example in charitable foundations where they don’t need to convince anyone to donate differently, or charities that deliver a service who only need to convince their funders to continue donating.
Thanks Kristen for this clear and concise reply. This comparison with the experience of other charity evaluators has shifted my opinion on this somewhat nice one.
It seems a bit of a pity that they should receive significantly more scrutiny than charities or foundations though. In an ideal world everyone should be transparent and heavily scrutinised but it does make sense that the incentives might not be there for other orgs...
I agree that more orgs should get this kind of scrutiny. I agree that we are likely to blindly trust orgs that don’t transparently discuss their inst workings, which is super sad.
Interesting reflection on Mental health providers too, be that’s not a world I know!
This argument I struggle with...
“I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque”
I realize you are probably beinga bit tongue in cheek, but I think we could criticise and discuss while being more encouraging and positive. We are all human, to and I’m not sure piling on the “hate” will necessarily lead to improvement in epistemics and rigorous analysis.
I don’t think this was a reply to me?
Sorry accidentally got confused and sister a comment my bad!
I am a bit more familiar with ACE, and my impression is that you are right.
HLI fucked up their analysis, but because it was public we found out about it. Most EAs are too fearful to expose their work to scrutiny. Compare them to others who work on mental health within EA...
Most coaches and therapists in EA don’t do any rigorous testing of whether what they are doing actually works. They don’t even allow you to leave public reviews for them. I think we’re the only organisation to even have a TrustPilot!!!
I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque.
Now HLI have been dragged through the mud, you can bet your ass they won’t be making the same mistakes again. So long as they keep being transparent, they’ll keep learning and growing as an org. Others will keep making the same mistakes indefinitely, only we’ll never know about it and will continue blindly trusting them.
I agree that more orgs should get this kind of scrutiny. I agree that we are likely to blindly trust orgs that don’t transparently discuss their inst workings, which is super sad.
Interesting reflection on Mental health providers too, be that’s not a world I know!
This argument I struggle with...
“I don’t think the problem is that HLI got too much hate for fucking up, it’s that everyone else gets too little hate for being opaque”
I realize you are probably beinga bit tongue in cheek, but I think we could criticise and discuss while being more encouraging and positive. We are all human, to and I’m not sure piling on the “hate” will necessarily lead to improvement in epistemics and rigorous analysis.
One general problem with online discourse is that even if each individual makes a fair critique, the net effect of a lot of people doing this can be disproportionate, since there’s a coordination problem. That said, a few things make me think the level of criticism leveled at HLI was reasonable, namely:
HLI was asking for a lot of money ($200k-$1 million).
The critiques people were making seemed (generally) unique, specific, and fair.
The critiques came after some initial positive responses to the post, including responses to the effect of “I’m persuaded by this; how can I donate?”
I agree with you that GHD organizations tend to be scrutinized more closely, in large part because there is more data to scrutinize. But there is also some logic to balancing scrutiny levels within cause areas. When HLI solicits donations via Forum post, it seems reasonable to assume that donations they receive more likely come out of GiveWell’s coffers than MIRI’s. This seems like an argument for holding HLI to the GiveWell standard of scrutiny, rather than the MIRI standard (at least in this case).
That said, I do think it would be good to apply stricter standards of scrutiny to other EA organizations, without those organizations explicitly opening themselves up to evaluation by posting on the Forum. I wonder if there might be some way to incentivize this kind of review.
I am concerned that rationale would unduly entrench established players and stifle innovation. Young orgs on a shoestring budget aren’t going to be able to withstand 2023 GiveWell-level scrutiny . . . and neither could GiveWell at the young-org stage of development.
Yeah, I should’ve probably been more precise: the criticism of HLI has mainly been leveled against their evaluation of a single organization’s single intervention, whereas GW has evaluated 100+ programs, so my gut instinct is that it’s fair to hold HLI’s StrongMinds evaluation to the same ballpark level of scrutiny we’d hold a single GW evaluation to (and deworming certainly has been held to that standard). It might be unfair to expect an HLI evaluation to be at the level as a GW evaluation per dollar invested/hour spent (given that there’s a learning curve associated with doing such evaluations and there’s value associated with having multiple organizations do them), but this seems like—if anything—an argument for scrutinizing HLI’s work more closely, since HLI is trying to climb a learning curve, and feedback facilitates this.
I think another factor is that HLI’s analysis is not just below the level of Givewell, but below a more basic standard. If HLI had performed at this basic standard, but below Givewell, I think strong criticism would have been unreasonable, as they are still a young and small org with plenty of room to grow. But as it stands the deficiencies are substantial, and a major rethink doesn’t appear to be forthcoming, despite being warranted.
Probably a stupid question (probably just missed), can someone point me to where Givewell do a meta-analysis or similar depth of analysis as this HLI one. I can’t seem to find it and I would be keen to do a quick compare myself.
I’m not aware of a GW analysis quite like this one, although I didn’t go back and look at all its prior work.
In a situation like this, where GiveWell was considering StrongMinds as a top charity recommendation, it’s almost certain that it would have first funded a bespoke RCT designed to address key questions for which the available literature was mixed or inconclusive. HLI doesn’t have that luxury, of course. Moreover, what HLI is trying to measure is significantly harder to tease out than “how well do bednets work at saving lives” and similar questions.
I think those are relevant considerations that make comparing HLI’s work to the “GiveWell standard” inappropriate. However, to acknowledge Ben’s point, HLI’s critics are alleging that the stuff that was missed was pretty obvious and that HLI hasn’t responded appropriately when the missed stuff was pointed out. I lack the technical background and expertise to fully evaluate those claims.
Which GiveWell evaluation(s) though? The ones on that spreadsheet range from the evaluations used to justify Top Charity status to decisions to deprioritize a potential program after a shallow review. Two deworming charities were until recently GiveWell Top Charities, and I believe Open Phil still makes significant grants to them (presumably in reliance on GiveWell’s work).
In this post, HLI explicitly compares its evaluation of StrongMinds to GiveWell’s evaluation of AMF, and says:
This seems like an argument for scrutinizing HLI’s evaluation of StrongMinds just as closely as we’d scrutinize GiveWell’s evaluation of AMF (i.e., closely). I apologize for the trite analogy, but: if every year Bob’s blueberry pie wins the prize for best pie at the state fair, and this year Jim, a newcomer, is claiming that his blueberry pie is better than Bob’s, this isn’t an argument for employing a more lax standard of judging for Jim’s pie. Nor do I see how concluding that Jim’s pie isn’t the best pie this year—but here’s a lot of feedback on how Jim can improve his pie for next year—undermines Jim’s ability to win pie competitions going forward.
This isn’t to say that we should expect the claims in HLI’s evaluation to be backed by the same level of evidence as GiveWell’s, but we should be able to take a hard look at HLI’s report and determine that the strong claims made on its basis are (somewhat) justified.
Yes, agree that the language re: AMF justifies a higher level of scrutiny than would be warranted in its absence. Also, the AMF-related claim makes more moderate changes in the CEA bottom-line material than if the claims had been limited to stuff like: SM is more cost-effective than other predominately life-enhancing charities like GiveDirectly.
My read is it wasn’t the statistics they got hammered on misrepresenting other people’s views of them as endorsements e.g. James Snowden’s views. I will also say the AI side does get this criticism but not on cost-effectiveness but on things like culture war (AI Ethics vs. AI Safety) and dooming about techniques (e.g. working in a big company vs. more EA aligned research group and RLHF discourse).
Thanks for the AI perspective!
Yes in that post the misrepresentation was part of the criticism they receive (which they engaged with and was at least partially corrected which is impressive) but I think the statistical analysis bore the most heavy overall criticism in that post, and in other earlier posts.
“Fair” and “unfair” are tricky words to nail down.
I think there are a wide range of factors that explain why HLI has been treated differently than other orgs -- some “fair” under most definitions of the word, some less so. Some of those reasons are adjacent to questions of funding and influence, but I’m not sure they provide much room to criticize HLI’s critics.
HLI is running in a lane—global health/development/wellbeing—where the evidentiary standards are much higher than in longtermist areas. Part of this is the nature of the work; asking a biosecurity program how many pandemics it has prevented is not workable. Part of it is that there is a very well-funded organization that has been doing CEAs that conensus views as high-quality. Yet another aspect is that GHDW work has been much more limited by funding constraints, which has incentivized GHDW funders to adopt higher standards.
I think people generally need to be kinder to smaller-scale, early-stage efforts . . . but see point 3 below.
HLI is a charity recommender, a significant portion of whose focus currently involves making recommendations to ordinary people (not megadonors, foundations, etc.) I do think the level of scrutiny should ordinarily be higher for charity recommenders, especially those making recommendations to the general public. The purpose of a charity recommender is to evaluate the relative merits of various charities, and for ordinary donors their recommendations may be seen as near-authoritative. A sense that the community needs to carefully scrutinize the recommender’s work destroys much of a recommender’s value proposition in the first place. And while it’s not very utilitarian of me, I do feel more protective of small donors who don’t have an in-house staff to pick up on a recommender’s mistakes.
I think an overconfident marketing campaign in 2022 did play a major role in how much grace people are willing to extend on the CEA. I haven’t been around that long, but this does seem to significantly distinguish HLI from other orgs. I believe that HLI has expressed regret for certain statements, but a framework that compares statements made at that time (that have not been clearly and explicitly retracted) to what the data actually support strikes me as on the “fair” side of the ledger.
This was HLI’s first major recommendation; people would be less prone to draw negative inferences about (e.g.) an org whose first four analyses/recommendations were fine but whose fifth had some significant issues.
StrongMinds spends (and could potentially fundraise) enough money to make a significant dive into its cost-effectiveness worthwhile for critics, but probably not so much as to justify an airtight multi-million dollar workup (including by commissioning our own studies to fill any major holes in the data that would have a big effect on the CEA). So it’s an awkward-size program to evaluate.
Pretty much all skeptical analysis is done by volunteers on their own time, and so the volume/quality of that work will heavily depend on who is interested in and available to doing it. It’s plausible to me that having a controversial and/or novel framework could motivate more critics to volunteer for duty.
There could also be a snowball effect; the detection of one significant weakness in a CEA may motivate others to start looking.
HLI asked Forum users to contribute money. Although I take a wide stance on “standing” to criticize organizations, one could reasonably characterize asking users for action as opening the door to some extent. Having an active fundraising ask may also provide a more concrete payoff/impact for criticism, by preventing users from taking an action the critic found undesirable.
HLI has been unusually transparent with data and responsive to criticism, which has made such criticism easier and kept it up longer. I think you’re right to be concerned about the ferocity of criticism disincentivizing trransparency and opennness on the margin.
The barriers to criticizing HLI are much lower. Because HLI has little power, no one is concerned about blowback. Compare that to the recent Omega criticisms of AI labs, which were posted psuedonymously and which had to rely on undisclosed data. Criticism from established community members who sign their work and can show their work carries more weight, and there’s a disincentive to writing anonymous criticism (you’ll never get any credit for it).
Several of these points are at least adjacent to questions of funding and power, and they cumulatively make me feel at least somewhat uncomfortable, e.g.:
It’s unlikely an organization with more secure funding would have made a fundraising appeal at this time. Rather, it likely would have laid low until it had produced a new CEA for SM and until more time had passed since the prior harsh posts.
HLI may have felt pressure to be particularly transparent and responsive than a more established org. It’s unlikely HLI would have been taken seriously if it didn’t show its receipts, and it doesn’t have the power/prestige needed for a “no real comment” approach to criticism to have a good shot at working.
That being said, I find it challenging to assign much fault for those factors to the Forum user community on those. For example, in point 10, the unfairness is not that HLI is being criticized by named users who have built up a reputation, but that the criticism of other orgs is disincentivized and psuedonymous.
I think you’re right that the response to HLI may discourage transparency and responsiveness on the margin, and that this is a problem. As a practical matter, I think there are two factors that mitigate this to some extent. One is that I think the criticism of HLI reflects a convergence of a number of factors as listed above, and I’m not sure how much marginal effect comes from their good transparency and responsiveness. Second, I think any startup org trying to pursue HLI-like goals has to be transparent and responsive to get a hearing from the community, so I think it less likely that knowledge of current events will change another org’s stance to a materially less open and responsive one.
I’m undecided on the net effect of all of this. My hope is that it will ultimately result in adoption of better epistemic safeguards and communications management—both at HLI and elsewhere in the ecosystem. (Cf. my recent post on the HLI thread). That would be a good result, although I’d still wish we had gotten there with a lot less rancor.
Quite right. Far too much scrutiny was applied to HLI. Five thousand words autistic debunkings, though highly entertaining to read and no doubt equally entertaining to their authors, should not have been necessary. Any reasonable model of how the world works would not perhaps not quite rule the idea of group therapy in poor countries out of court, but require an incredibly high standard of evidence to even begin discussing it somewhat politely.
On the subject of scrutinizing other orgs, I note that some hardworking but anonymous EAs have done their best to scrutizine EA’s various AI research orgs, but of course this is much more specialized endeavour requiring deeper expertise and is also entirely pointless because OpenPhil will probably fund them anyway.
We’ve banned Sol3:2 for 3 weeks. This comment is uncivil and was reported multiple times. Other comments have been reported in the past for similar reasons.
I want to note that criticism can be extremely valuable, and we have a slightly higher bar for taking mod action against criticism. But referring to analyses of HLI’s work as “autistic” clearly violates core Forum norms and is above that bar. I think it’s possible to outline strong disagreements while still following our norms, and we’d want to see this from Sol3:2 in the future.
If Sol3:2 thinks that this is not right, they can appeal.