Mo Putera comments on Microdooms averted by working on AI Safety

Mo Putera Sep 21, 2023, 3:29 AM
10 points
2 ∶ 0
I think GiveWell shouldn’t be modeled as wanting to recommend organizations that save as many current lives as possible. I think a more accurate way to model them is “GiveWell recommends organizations that are [within the Overton Window]/[have very sound data to back impact estimates] that save as many current lives as possible.” If GiveWell wanted to recommend organizations that save as many human lives as possible, their portfolio would probably be entirely made up of AI safety orgs.
This paragraph, especially the first sentence, seems to be based on a misunderstanding I used to share, which Holden Karnofsky tried to correct back in 2011 (when he was still at GiveWell) with the blog post Why we can’t take expected value estimates literally (even when they’re unbiased) in which he argued (emphasis his):
While some people feel that GiveWell puts too much emphasis on the measurable and quantifiable, there are others who go further than we do in quantification, and justify their giving (or other) decisions based on fully explicit expected-value formulas. The latter group tends to critique us – or at least disagree with us – based on our preference for strong evidence over high apparent “expected value,” and based on the heavy role of non-formalized intuition in our decisionmaking. …
We believe that people in this group are often making a fundamental mistake… [of] estimating the “expected value” of a donation (or other action) based solely on a fully explicit, quantified formula, many of whose inputs are guesses or very rough estimates.
We believe that any estimate along these lines needs to be adjusted using a “Bayesian prior”; that this adjustment can rarely be made (reasonably) using an explicit, formal calculation; and that most attempts to do the latter, even when they seem to be making very conservative downward adjustments to the expected value of an opportunity, are not making nearly large enough downward adjustments to be consistent with the proper Bayesian approach.
This view of ours illustrates why – while we seek to ground our recommendations in relevant facts, calculations and quantifications to the extent possible – every recommendation we make incorporates many different forms of evidence and involves a strong dose of intuition. And we generally prefer to give where we have strong evidence that donations can do a lot of good rather than where we have weak evidence that donations can do far more good – a preference that I believe is inconsistent with the approach of giving based on explicit expected-value formulas (at least those that (a) have significant room for error (b) do not incorporate Bayesian adjustments, which are very rare in these analyses and very difficult to do both formally and reasonably).
(He since developed this view further in the 2014 post Sequence thinking vs cluster thinking.) Further down, Holden wrote
My prior for charity is generally skeptical, as outlined at this post. Giving well seems conceptually quite difficult to me, and it’s been my experience over time that the more we dig on a cost-effectiveness estimate, the more unwarranted optimism we uncover.
This guiding philosophy hasn’t changed; in GiveWell’s How we work—criteria—cost-effectiveness they write:
Cost-effectiveness is the single most important input in our evaluation of a program’s impact. However, there are many limitations to cost-effectiveness estimates, and we do not assess programs solely based on their estimated cost-effectiveness. We build cost-effectiveness models primarily because:
- They help us compare programs or individual grant opportunities to others that we’ve funded or considered funding; and
- Working on them helps us ensure that we are thinking through as many of the relevant issues as possible.
which jives with what Holden wrote in the relative advantages & disadvantages of sequence thinking vs cluster thinking article above.
Note that this is for global health & development charities, where the feedback loops to sense-check and correct cost-effectiveness analyses that guide resource allocation & decision-making are much clearer and tighter than for AI safety orgs (and other longtermist work more generally). If it’s already this hard for GHD work, I get much more skeptical of CEAs in AIS with super-high EVs, just in model uncertainty terms.
This isn’t meant to devalue AIS work! I think it’s critical and important, and I think some of the “p(doom) modeling” work is persuasive (MTAIR, Froolow, and Carlsmith come to mind). Just thought that “If GiveWell wanted to recommend organizations that save as many human lives as possible, their portfolio would probably be entirely made up of AI safety orgs” felt off given what they’re trying to do, and how they’re going about it.