huw comments on GWWC’s 2024 evaluations of evaluators

huw 3 Dec 2024 2:12 UTC
13 points
1 ∶ 0
I’ve been going through the evaluation reports and it seems like GWWC might not be as confident in Longview’s Emerging Challenges Fund or the EA Long-Term Future Fund as they are in their choices for GHD and Animal Welfare. The reports for these funds often include some uncertainties, like:

There were several limitations to the evaluation, including various conflicts of interest, limited access to relevant information (in part because the LTFF was not set up to be evaluated in this way), our lack of direct expertise and only limited access to expert external input, and the general difficulty of evaluating funding opportunities in the global catastrophic risk cause area.

We don’t know of any clearly better alternative donation option in reducing GCRs

On the other hand, the Founders Pledge GHD fund wasn’t fully recommended due to more specific methodological issues:

Our belief that FP GHDF evaluations do not robustly demonstrate that the opportunities they fund exceed their stated bar of 10x GiveDirectly in expectation, due to the presence of errors and insufficiently justified subjective inputs we found in some of their BOTECs that could cause the estimated cost-effectiveness to fall below this threshold.

Until I read various posts around the forum and personally looked into what LTFF in particular was funding, I was under the impression—partly from GWWC’s messaging—that the LTFF was at least comparable to a GiveWell or even an ACE. This is partly because GWWC usually recommend their GCR funds at the same time as these other funds.

It might be on me for having the wrong assumptions, so I wrote out my chain of thinking, and I’m keen to find out where we disagree:
1. There might be different evaluation standards between cause areas
2. These varying standards could mean a higher risk of wastage or even fraud in more risky funding opportunities; see this analysis
3. GWWC’s messaging and presentation might not fully convey these differences (the paragraph for these two funds briefly mentions lower tractability and an increased need for specialised grantmaking)
4. It might be challenging for potential donors to grasp these differences without clear communication
(I originally posted this to the 2024 recommendations but thought it might be more constructive / less likely to cause any issues over in this thread)
- Michael Townsend🔸 3 Dec 2024 11:59 UTC
  15 points
  2 ∶ 0
  Parent
  (I no longer work at GWWC, but wrote the reports on the LTFF/ECF, and was involved in the first round of evaluations more generally.)
  In general, I think GWWC’s goal here is to “to support donors in having the highest expected impact given their worldview” which can come apart from supporting donors to give to the most well-researched/vetted funding opportunities. For instance, if you have a longtermist worldview, or perhaps take AI x-risk very seriously, then I’d guess you’d still want to give to the LTFF/ECF even if you thought the quality of their evaluations was lower than GiveWell’s.
  Some of this is discussed in “Why and how GWWC evaluates evaluators” in the limitations section:
  Finally, the quality of our recommendations is highly dependent on the quality of the charity evaluation field in a cause area, and hence inconsistent across cause areas. For example, the state of charity evaluation in animal welfare is less advanced than that in global health and wellbeing, so our evaluations and the resulting recommendations in animal welfare are necessarily lower-confidence than those in global health and wellbeing.
  And also in each of the individual reports, e.g. from the ACE MG report:
  As such, our bar for relying on an evaluator depends on the existence and quality of other donation options we have evaluated in the same cause area.
  In cause areas where we currently rely on one or more evaluators that have passed our bar in a previous evaluation, any new evaluations we do will attempt to compare the quality of the evaluator’s marginal grantmaking and/or charity recommendations to those of the evaluator(s) we already rely on in that cause area.
  For worldviews and associated cause areas where we don’t have existing evaluators we rely on, we expect evaluators to meet the bar of plausibly recommending giving opportunities that are among the best options for their stated worldview, compared to any other opportunity easily accessible to donors.
  - huw 3 Dec 2024 12:25 UTC
    2 points
    1 ∶ 0
    Parent
    Mmm, so maybe the crux is at (3) or (4)? I think that GWWC may be assuming too much about how viewers are interpreting the messaging and presentation around the evaluations. I think there is probably a way to signal the differences in evaluation strength while still maintaining the BYO worldview approach?
    - Michael Townsend🔸 3 Dec 2024 18:11 UTC
      7 points
      1 ∶ 0
      Parent
      Just speaking for myself, I’d guess those would be the cruxes, though I don’t personally see easy-fixes. I also worry that you could also err on being too cautious, by potential adding warning labels that give people an overly negative impression compared to the underlying reality. I’m curious if there are examples where you think GWWC could strike a better balance.
      I think this might be symptomatic of a broader challenge for effective giving for GCR, which is that most of the canonical arguments for focusing on cost-effectiveness involve GHW-specific examples, that don’t clearly generalize to the GCR space. But I don’t think that indicates you shouldn’t give to GCR, or care about cost-effectiveness in the GCR space — from a very plausible worldview (or at least, the worldview I have!) the GCR-focused funding opportunities are the most impactful funding opportunities available. It’s just that the kind of reasoning underlying those recommendations/evaluations are quite different.
      - ElliotJDavies 4 Dec 2024 14:28 UTC
        6 points
        1 ∶ 0
        Parent
        canonical arguments for focusing on cost-effectiveness involve GHW-specific examples, that don’t clearly generalize to the GCR space.
        I am not sure I understand the claim being made here. Do you believe this to be the case, because of a tension between hits based and cost-effective giving?
        
        If so, I may disagree with the point. Fundamentally if you’re an “hit” grant-maker, you still care about (1) The amount of impact as a result of a hit (2) the odds on getting a hit (3) Indicators which may lead up to getting a hit (4) The marginal impact of your grant.
        
        1&2) Require solid theory of change, and BOTEC EV calculations
        3) Good M&E
        
        Fundamentally, I wouldn’t see much of a tension between hits based and cost-effective giving, other than a much higher tolerance for risk.
      - huw 5 Dec 2024 1:49 UTC
        2 points
        0 ∶ 0
        Parent
        I suppose to tack onto Elliot’s answer, I’m curious about what you see the differences in reasoning to be. If it is merely that GCR giving opportunities are more hits-based / high variance, I could see, for example, a small label being applied on the GWWC website next to higher-risk opportunities with a link to something like the explanations you’ve written above (and the evaluation reports).
        
        That kind of labelling feels like only a quantitative difference from the current binary evaluations (as in, currently GWWC signals inclusion/exclusion, but could extend that to signal for strength of evaluation or risk of opportunity).
- ElliotJDavies 4 Dec 2024 14:17 UTC
  4 points
  0 ∶ 0
  Parent
  Good job on highlighting this. While I very much understand GWWC’s angle of approach here, I can see that there’s essentially a dynamic that could be playing out whereby some areas (Animal Welfare and Global Development) get increasingly rigorous, while other areas (Longtermist problem-areas and Meta-EA) don’t receive the same benefit.
  - Aidan Whitfield🔸 5 Dec 2024 8:03 UTC
    3 points
    0 ∶ 0
    Parent
    Thanks for the comment! While we think it could be correct that the quality of evaluations differs between our recommendations in different cause areas, my view is that the evaluating evaluators project applies pressure to increase the strength of evaluations across all cause areas. In our evaluations we communicate areas where we think evaluators can improve. Because we are evaluating multiple options in each cause area, if in future evaluations we find one of our evaluators has improved and another has not, then the latter evaluator is less likely to be recommended in future, which provides an incentive for both evaluators to improve their processes over time.
- Aidan Whitfield🔸 4 Dec 2024 23:51 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks for your comment, Huw! I think Michael has done a great job explaining GWWC’s position on this, but please let us know if we can offer any clarifications.