I am an attorney in a public-sector position not associated with EA, although I cannot provide legal advice to anyone. My involvement with EA so far has been mostly limited so far to writing checks to GiveWell and other effective charities in the Global Health space, as well as some independent reading. I have occasionally read the forum and was looking for ideas for year-end giving when the whole FTX business exploded . . .
Jason
Thanks for sharing, Tom! Could you say a little more about how you see the “classic” EA global health programs fitting into your paradigm? These programs tend to do one thing—like hand out anti-malarial bednets—and aim at doing that very well. EA funders try to be very careful not to fund things that the government (or a non-EA funder) would have otherwise funded. So that would suggest classic EA interventions are “marginal” rather than “core” in your framework. On the other hand, they have a very high return for each dollar invested, which suggests you might classify them as “core.”
Not an expert either, but safest to say the corporate-law question is nuanced and not free from doubt. It’s pretty clear there’s no duty to maximize short-term profits, though.
But we can surmise that most boards that allow the corporation to seriously curtail its profits—at least its medium-term profits—will get replaced by shareholders soon enough. So the end result is largely the same.
I’d go a bit further. The proposed norm has several intended benefits: promoting fairness to the criticized organization by not blindsiding the organization, generating higher-quality responses, minimizing fire drills for organizations and their employees, etc. I think it is a good norm in most cases.
However, there are some circumstances in which the norm would not significantly achieve its intended goals. For instance, the rationale behind the norm will often have less force where the poster is commenting on the topic of a fresh news story. The organization already feels pressure to respond to the news story on a news-cycle timetable; the marginal burden of additionally having a discussion of the issue on the Forum is likely modest. If the media outlet gave the org a chance to comment on the story, the org should also not be blindsided by the issue.
Likewise, criticism in response to a recent statement or action by the organization may or may not trigger some of the same concerns as more out-of-the-blue criticism. Where the nature of the statement/action is such that the criticism was easily foreseeable, the organization should already be in a place to address it (and was not caught unawares by its own statement/action). This assumes, of course, that the criticism is not dependent on speculation about factual matters or the like.
Also, I think the point about a delayed statement being less effective at conveying a message goes both ways: if an organization says or does something today, people will care less about an poster’s critical reaction posted eight days later than a reaction posted shortly after the organization action/statement.
Finally, there may also be countervailing reasons that outweigh the norm’s benefits in specific cases.
So is the number of comments here (5 at time of this comment) vs. there (69).
Thanks for these points! The idea that people care about more than their wellbeing may be critical here. I’m thinking of a simplified model with the following assumptions: a mean lifetime wellbeing of 5, SD 2, normal distribution, wellbeing is constant through the lifespan, with a neutral point of 4 (which is shared by everyone).
Under these assumptions, AMF gets no “credit” (except for grief avoidance) for saving the life of a hypothetical person with wellbeing of 4. I’m really hesitant to say that saving that person’s life doesn’t morally “count” as a good because they are at the neutral point. On the one hand, the model tells me that saving this person’s life doesn’t improve total wellbeing. On the other hand, suppose I (figuratively) asked the person whose life was saved, and he said that he preferred his existence to non-existence and appreciated AMF saving his life.
At that point, I think the WELLBY-based model might not be incorporating some important data—the person telling us that he prefers his existence to non-existence would strongly suggest that saving his life had moral value that should indeed “count” as a moral good in the AMF column. His answers may not be fully consistent, but it’s not obvious to me why I should fully credit his self-reported wellbeing but give zero credence to his view on the desirability of his continued existence. I guess he could be wrong to prefer his continued existence, but he is uniquely qualified to answer that question and so I think I should be really hesitant to completely discount what he says. And a full 30% of the population would have wellbeing of 4 or less under the assumptions.
Even more concerning, AMF gets significantly “penalized” for saving the life of a hypothetical person with wellbeing of 3 who also prefers existence to non-existence. And almost 16% of the population would score at least that low.
Of course, the real world is messier than a quick model. But if you have a population where the neutral point is close enough to the population average, but almost everyone prefers continued existence, it seems that you are going to have a meaningful number of cases where AMF gets very little / no / negative moral “credit” for saving the lives of people who want (or would want) their lives saved. That seems like a weakness, not a feature, of the WELLBY-based model to me.
From HLI’s perspective, it makes sense to describe how the moral/philosophical views one assumes affect the relative effectiveness of charities. They are, after all, a charity recommender, and donors are their “clients” in a sense. GiveWell doesn’t really do this, which makes sense—GiveWell’s moral weights are so weighted toward saving lives that it doesn’t really make sense for them to investigate charities with other modes of action. I think it’s fine to provide a bottom-line recommendation on whatever moral/philosophical view a recommender feels is best-supported, but it’s hardly obligatory.
We recognize donor preferences in that we don’t create a grand theory of effectiveness and push everyone to donate to longtermist organizations, or animal-welfare organizations, or global health organizations depending on the grand theory’s output. Donors choose among these for their own idiosyncratic reasons, but moral/philosophical views are certainly among the critical criteria for many donors. I don’t see why that shouldn’t be the case for interventions within a cause area that produce different kinds of outputs as well.
Here, I doubt most global-health donors—either those who take advice from GiveWell or from HLI—have finely-tuned views on deprivationism, neutral points, and so on. However, I think many donors do have preferences that indirectly track on some of those issues. For instance, you describe a class of donors who “want to give to mental health.” While there could be various reasons for that, it’s plausible to me that these donors place more of an emphasis on improving experience for those who are alive (e.g., they give partial credence to epicureanism) and/or on alleviating suffering. If they did assess and chart their views on neutral point and philosophical view, I would expect them to end up more often at points where SM is ranked relatively higher than the average global-health donor would. But that is just conjecture on my part.
One interesting aspect of thinking from the donor perspective is the possibility that survey results could be significantly affected by religious beliefs. If many respondents chose a 0 neutral point because their religious tradition led them to that conclusion, and you are quite convinced that the religious tradition is just wrong in general, do you adjust for that? Does not adjusting allow the religious tradition to indirectly influence where you spend your charitable dollar?
To me, the most important thing a charity evaluator/recommender does is clearly communicate what the donation accomplishes (on average) if given to various organizations they identify—X lives saved (and smaller benefits), or Y number of people’s well-being improved by Z amount. That’s the part the donor can’t do themselves (without investing a ton of time and resources).
I don’t think the neutral point is as high as 3. But I think it’s fine for HLI to offer recommendations for people who do.
Given the methodological challenges in measuring the neutral point, I would have some hesitation to credit any conclusions that diverged too much from what revealed preferences imply. A high neutral point implies that many people in developing countries believe their lives are not worth living. So I’d look for evidence of behavior (either in respondents or in the population more generally) that corroborated whether people acted in a way that was consistent with the candidate neutral point.
For instance, although money, family, and other considerations doubtless affect it, studying individuals who are faced with serious and permanent (or terminal) medical conditions might be helpful. At what expected life satisfaction score do they decline treatment? If the neutral point is relatively close to the median point in a country, one would expect to see a lot of people decide to not obtain curative treatment if the results would leave them 1-2 points less satisfied than their baseline.
You might be able to approximate that by asking hypothetical questions about specific situations that you believe respondents would assess as reducing life satisfaction by a specified amount (disability, imprisonment, social stigma, etc.), and then ask whether the respondent believes they would find life still worth living if that happened. I don’t think that approach works to establish a neutral point, but I think having something more concrete would be an important cross-check on what may otherwise come across as an academic, conjectural exercise to many respondents.
I’m starting to think posts should get a pinned mod comment if the poster doesn’t assert that the person/organization had a specified amount of advance notice. That could be a tricky norm to define, as there can be valid reasons not to provide advance notice (e.g., breaking news or a situation where delay could risk clearly identifiable harm), and it’s not trivial to define with precision what type of posts warrant an advance-notice norm. I’m not envisioning a hostile pinned comment, but I am wondering if there should be an “official” statement that says something along the lines of: “we don’t delete criticisms that were not shared with the person/organization in advance, but—at least absent special circumstances—no one should expect a prompt response where the poster chose not to share the post in advance.”
Edit: typo
- 29 Mar 2023 15:07 UTC; 4 points) 's comment on Run Posts By Orgs by (LessWrong;
Do you think the neutral point and basic philosophical perspective (e.g., deprivationism vs. epicureanism) are empirical questions, or are they matters on which the donor has to exercise their own moral and philosophical judgment (after considering what the somewhat limited survey data have to say on the topic)?
I would graph the neutral point from 0 to 3. I think very few donors would set the neutral point above 3, and I’d start with the presumption that the most balanced way to present the chart is probably to center it fairly near the best guess from the survey data. On the other hand, if you have most of the surveys reporting “about 2,” then it’s hard to characterize 3 as an outlier view—presumably, a good fraction of the respondents picked a value near, at, or even over 3.
Although I don’t think HLI puts it this way, it doesn’t strike me as implausible to view human suffering as a more severe problem than lost human happiness. As I noted in a different comment, I think of that chart as a starting point from which a donor can apply various discounts and bonuses on a number of potentially relevant factors. But another way to account for this would be to give partial weight to strong epicureanism as a means of discounting the value of lost human happiness vis-a-vis suffering.
Given that your critique was published after HLI’s 2022 charity recommendation, I think it’s fair to ask HLI whether it would reaffirm those characterizations today. I would agree that the appropriate conclusion, on HLI’s current state of analysis, is that the recommendation is either SM or GiveWell’s top charities depending on the donor’s philosophical assumptions. I don’t think it’s inappropriate to make a recommendation based on the charity evaluator’s own philosophical judgment, but unless HLI has changed its stance it has taken no position. I don’t think it is appropriate to merely assume equal credence for each of the philosophical views and neutral points under consideration.
One could also defensibly make a summary recommendation on stated assumptions about donor values or on receipient values. But the best information I’ve seen on those points—the donor and beneficiary surveys as reflected in GiveWell’s moral weights—seemingly points to a predominately deprivationist approach with a pretty low neutral point (otherwise the extremely high value on saving the lives of young children wouldn’t compute).
Thank you for this detailed and transparent response!
I applaud HLI for creating a chart (and now an R Shiny App) to show how philosophical views can affect the tradeoff between predominately life-saving and predominately life-enhancing interventions. However, one challenge with that approach is that almost any changes to your CEA model will be outcome-changing for donors in some areas of that chart. [1]
For example, the 53-> 38% correction alone switched the recommendation for donors with a deprivationist framework who think the neutral point is over ~ 0.65 but under 1.58. Given that GiveWell’s moral weights were significantly derived from donor preferences, and (0.5, deprivationism) is fairly implied by those donor weights, I think that correction shifted the recommendation from SM to AMF for a significant number of donors even though it was only material to one of three philosophical approaches and about 1 point of neutral-point assumptions.
GiveWell reduced the WELLBY estimate from about 62 (based on the 38% figure) to about 17, a difference of about 45. If I’m simplifying your position correctly, for about half of those WELLBYs you disagree with GiveWell that an adjustment is appropriate. For about half of them, you believe a discount is likely appropriate, but think it is likely less than GiveWell modelled.
If we used GiveWell’s numbers for that half but HLI’s numbers otherwise, that split suggests that we’d end up with about 39.5 WELLBYs. So one way to turn your response into a donor-actionable statement would be to say that there is a zone of uncertainty between 39.5 and 62 WELLBYs. One might also guess that the heartland of that zone is between about 45 and 56.5 WELLBYs, reasoning that it is less likely that your discounts will be less than 25% or more than 75% of GiveWell’s.
The bottom end of that zone of uncertainty (39.5) would pull the neutral point at which a deprivationist approach would conclude AMF = SM up to about 2.9. I suspect few people employing a deprivationist approach have the neutral point that high. AMF is also superior to SM on a decent number of TRIA-based approaches at 39.5 WELLBYs.
So it seems there are two reasonable approaches to donor advice under these kinds of circumstances:
One approach would encourage donors within a specified zone of uncertainty to hold their donations until HLI sufficiently updates its CEA for SM to identify a more appropriate WELLBY figure ; or
The other approach would encourage donors to make their decision based on HLI’s best estimate of what the WELLBY figure on the next update of the CEA will be. Even if the other approach is correct, there will be some donors who need to use this approach for various reasons (e.g., tax reasons).
I don’t think reaffirming advice on the current model in the interim without any adjustments is warranted, unless you believe the adjustments will be minor enough such that a reasonable donor would likely not find them of substantive importance no matter where they are on the philosophical chart.[2]
- ^
In the GiveWell model, the top recommendation is to give to a regranting fund, and there isn’t any explicit ranking of the four top charities. So the recommendation is actually to defer the choice of specific charity to someone who has the most up-to-date information when the monies are actually donated to the effective charity. Moreover, all four top charities are effective in very similar ways. Thus, GiveWell’s bottom-line messaging to donors is much less sensitive to changes in the CEA for any given charity.
- ^
I am not sure how to define “minor.” I think whether the change flips the recommendation to the donor is certainly relevant, but wouldn’t go so far as to say that any change that flips the recommendation for a given donor’s philosophical assumptions would be automatically non-minor. On the other hand, I think a large enough change can be non-minor even if it doesn’t flip the recommendation on paper. Some donors apply discounts and bonuses not reflected in HLI’s model. For instance, one could reasonably apply a discount to SM when compared to better-studied interventions, on the basis that CEAs usually decrease as they become more complete. Or one could reasonably apply a bonus to SM because funding a smaller organization is more likely to have a positive effect on its future cost-effectiveness. Thus, just because the change is not outcome-determinative on HLI’s base model doesn’t mean it isn’t so on the donor’s application of the model. The time-to-update and amount of funds involved are also relevant. All that being said, my gut thinks that the starting point for determining minor vs. non-minor is somewhere in the neighborhood of 10%.
This is really helpful! One suggestion for future improvement would be to allow the user to specify a mix among the philosophical views (or at least to be able to select predefined mixes of those views).
I think this is one of those cases where reaching out to the organization prior to posting on the Forum would be helpful. That may have led to a conclusion that deferring this discussion until the grant justification was posted would be more fruitful (and probably would have led to timing it when @jackva wasn’t on leave).
See footnote 3.
I think it has potential!
Finally, I think the two approaches require very different sets of skills. My guess is that there are many more people in the EA community today (which skews young and quantitatively-inclined) with skills that are a good fit for evaluation-and-support than have skills that are an equally good fit for design-and-execution. I worry that this skills gap might increase the risk that people in the EA community might accidentally cause harm while attempting the design-and-execution approach.
This paragraph is a critical component of the argument as presently stated. However, I don’t see much more than a mere assertion that (1) certain skills are generally missing that are needed for design-and-execution (D&E) and (2) the absence of those skills increases the risk of accidental harm. In a full post, I would explain this more.
My own intuition is that a larger driver for increased harm in D&E models (vs. evaluation-and-support, E&S) may be inherent to working in a novel and neglected subject area like AI safety. In an E&S model, the startup efforts incubated independently of EA are more likely to be pretty small-scale. Even if a number of them end up being net-harmful, the risk is limited by how small they are. But in a D&E model, EA resources may be poured into an organization earlier in its life cycle, increasing the risk of significant harm it it turns out the organization was ultimately not well-conceived.
As far as mitigations, I think a presumption toward “start small, go slow” in a underdeveloped cause area for which a heavily D&E approach is necessary might be appropriate in many cases for the reason described in the paragraph above. E.g., in some cases, the objective should be to develop the ecosystem in that cause area where heavy work can begin in 7-10 years, vs. pouring in a ton of resources early and trying to get results ASAP. I think I’d like to see more ideas like than in a full post, as the suggestion to develop better “risk-management or error-correction capabilities” (while correct in my view) is also rather abstract.
And as a general rule most people try to avoid making enemies out of people who they perceive to have lots of power/influence when possible. So their threat model doesn’t necessarily have to be terribly well-defined to be effective at accomplishing the powerful/influential person’s objective.
Where’s the evidence that, e.g., everyone “act[s] as if a couple of beehives or shrimp farms are as important as a human city”?So someone wrote a speculative report about bee welfare ranges . . . if “everyone” accepted that “1 human is worth 14 bees”—or even anything close to that—the funding and staffing pictures in EA would look very, very different. How many EAs are working in bee welfare, and how much is being spent in that area?
As I understand the data, EA resources in GH&D are pretty overwhelmingly in life-saving interventions like AMF, suggesting that the bulk of EA does not agree with HLI at present. I’m not as well versed in farmed animal welfare, but I’m pretty sure no one in that field is fundraising for interventions costing anywhere remotely near hundreds of dollars to save a bee and claiming they are effective.
In the end, reasoning transparency by charity evaluators helps the donor better make an informed moral choice. Carefully reading analyses from various sources helps me (and other donors) make choices that are consistent with our own values. EA is well ahead of most charitable movements by explicitly acknowledging that trade-offs exist and at least attempting to reason about them. One can (and should) decline to donate where the charity’s treatment of tradeoffs isnt convincing. As I’ve stated elsewhere on this post, I’m sticking with GiveWell-style interventions at least for now.
We both think the ratio of parental grief WELLBYs to therapy WELLBYs is likely off, although that doesn’t tell us which number is wrong. Given that your argument is that an implausible ratio should tip HLI off that there’s a problem, the analysis below takes the view more favorable to HLI—that the parental grief number (for which much less work has been done) is at least the major cause of the ratio being off.
As I see it, the number of WELLBYs preserved by averting an episode of parental grief is very unlikely to be material to any decision under HLI’s cost-effectiveness model. Under philosophical assumptions where it is a major contributor to the cost-effectiveness estimate, that estimate is almost always going to be low enough that life-saving interventions won’t be considered cost-effective on the whole. Under philosophical assumptions where life-saving programs may be cost-effective, the bulk of the effectiveness will come directly from the effect on the saved life itself. Thus, it would not be unreasonable for HLI—which faces significant resource constraints—to have deprioritized attempts to improve the accuracy of its estimate for WELLBYs preserved by averting an episode of parental grief.
Given that, I can see three ways of dealing with parental grief in the cost-effectiveness model for AMF. Ignoring it seems rather problematic. And I would argue that reporting the value one’s relatively shallow research provided (with a disclaimer that one has low certainty in the value) is often more epistemically virtuous than
making upadjusting to some value one thinks is more likely to be correct for intuitive reasons, bereft of actual evidence to support that number. I guess the other way is to just not publish anything until one can turn in more precise models . . . but that norm would make it much more difficult to bring new and innovative ideas to the table.I don’t think the thermometer analogy really holds here. Assuming HLI got a significantly wrong value for WELLBYs preserved by averting an episode of parental grief, there are a number of plausible explanations, the bulk of which would not justify not “listen[ing] to [them] anymore.” The relevant literature on grief could be poor quality or underdeveloped; HLI could have missed important data or modeled inadequately due to the resources it could afford to spend on the question; it could have made a technical error; its methodology could be ill-suited for studying parental grief; its methodology could be globally unsound; and doubtless other reasons. In other words, I wouldn’t pay attention to the specific thermometer that said it was much hotter than it was . . . but in most cases I would only update weakly against using other thermometers by the same manufacturer (charity evaluator), or distrusting thermometer technology in general (the WELLBY analysis).
Moreover, I suspect there have been, and will continue to be, malfunctioning thermometers at most of the major charity evaluators and major grantmakers. The grief figure is a non-critical value relating to an intervention that HLI isn’t recommending. For the most part, if an evaluator or grantmaker isn’t recommending or funding an organization, it isn’t going to release its cost-effectiveness model for that organization at all. Even where funding is recommended, there often isn’t the level of reasoning transparency that HLI provides. If we are going to derecognize people who have used malfunctioning thermometer values in any cost-effectiveness analysis, there may not be many people left to perform them.
I’ve criticized HLI on several occasions before, and I’m likely to find reasons to criticize it again at some point. But I think we want to encourage its willingness to release less-refined models for public scrutiny (as long as the limitations are appropriately acknowledged) and its commitment to reasoning transparency more generally. I am skeptical of any argument that would significantly incentivize organizations to keep their analyses close to the chest.
It’s a completely different conversation in my book. The post, per the title, is an assessment of HLI’s model of SM’s effectiveness. I dont really see Vasco’s comment as about GW’s assessment of HLI’s model, HLI’s model itself, or SM’s effectiveness with any particularity. It’s more about the broad idea that GH&D effects for almost any GH&D program may be swamped by animal-welfare and longtermist effects.
I do actually think there is a related point to be made that is appropriate to the post: (1) it is good that we have a new published analysis that SM is very likely an effective charity; because (2) even under GW’s version of the analysis, some donors may feel SM is an attractive choice in the global health & development space because they are concerned about the meat-eater problem [link to Vasco’s analysis here] and/or environmental concerns that potentially affect life-saving and economic-development modes of action.
The reasons I’d find that kind of comment helpful—but didn’t find the comment by @Vasco as written well-suited for this post include:
(1) the perspective above is an attempt at a practical application of GW’s findings that is much more hooked into the main subject of the post (which is about SM and HLI’s CEA thereof), and
(2) By noting the meat-eater problem but linking to a discussion in one’s own post, rather than attempting to explain/discuss it in a post trying to nail down the GH&D effects of SM, the risk of derailing the discussion on someone else’s post is significantly reduced.
Given that Jeff posted this shortly after raising the possibility that he should write a book (of the sort that could easily make it onto many lending/giving tables) -- I admire the post against potential self-interest here.
Also, one challenge on adjusting based on discussions with a country’s government or health service is that you’re going to lose some efficiencies/economies of scale. Each country has different priorities and resources, so different programs will be at the margin in each.