Presumably there’s some probability X of averting doom that you would consider more important than 25 statistical lives. I’d also guess that you’d agree that this is true for some rather low-but-nonPascalian probabilities. Eg, I predict that if you thought about the problem even briefly, you’d agree the above claim is true for X=0.001%, not just say 30%.
(To be clear I’m definitely not saying that the grant’s effect size is >0.001% in expectation).
So then the real disagreement is either a) What X ought to be (where I presume you have a higher number than LTFF), or b) whether the game is above X.[1]
Stated more clearly, I think your disagreement with the grant is “merely” a practical disagreement about effect sizes. Whereas your language here, if taken literally, is not actually sensitive to the effect size.
(My own guess is that the grant was not above the 2022 LTFF bar, but that’s an entirely different line of reasoning). And of course implicitly I believe the 2022 LTFF bar was above the 2022 GiveWell bar by my lights.
A butterfly flaps its wings and causes a devastating hurricane to form in the tropics. Therefore, we must exterminate butterflies, because there is some small probability X that doing so will avert hurricane disaster.
But it is just as easily the case that the butterfly flaps prevent devastating hurricanes from forming. Therefore we must massively grown their population.
The point being, it can be practically impossible to understand the casual tree and get even the sign right around low probability events.
That’s what I take issue with—it’s not just the numbers, it’s the structural uncertainty of cause and effect chains when you consider really low probability events. Expected value is a pretty bad tool for action relevant decision making when you are dealing with such numerical and structural uncertainty. It’s perhaps better to pick a framework like “it’s robust under multiple decision theories” or “pick something that has the least downside risk”.
In our instance, two competing plausible structural theories among many are something like:
“game teaches someone an AI safety concept → makes them more knowledgeable or inspire them to take action → they work on AI safety → solve alignment problem → future saved”
vs.
“people get interested in doing the most good → sees community of people that claim to do that, but that they fund rich people to make video games → causes widespread distrust of the movement → strong social stigma developed against people that care about AI risk → greatly narrowed range of people / worldviews because people don’t want to associate → makes it near impossible to solve alignment problem → future destroyed”
The justifications for these grants tend to use some simple expected value calculation of a singular rosy hypothetical casual chain. The problem is it’s possible to construct a hypothetical value chain to justify any sort of grant. So you have to do more than just make a rosy casual chain and multiply numbers through. I’ve commented before on some pretty bad ones that don’t pass the laugh test among domain experts in the climate and air quality space.
The key lesson from early EA (evidenced based giving in global health) was that it is really hard to understand if the thing you are doing is having an impact, and what the valence of the impact is, for even short, measurable casual chains. EA’s popular causes now (longtermism) seem to jettison that lesson, when it is even more unclear what the impact and sign is through complicated low probability casual chains.
The justifications for these grants tend to use some simple expected value calculation of a singular rosy hypothetical casual chain. The problem is it’s possible to construct a hypothetical value chain to justify any sort of grant. So you have to do more than just make a rosy casual chain and multiply numbers through.
Our approach to making such comparisons strikes some as highly counterintuitive, and noticeably different from that of other “prioritization” projects such as Copenhagen Consensus. Rather than focusing on a single metric that all “good accomplished” can be converted into (an approach that has obvious advantages when one’s goal is to maximize), we tend to rate options based on a variety of criteria using something somewhat closer to (while distinct from) a “1=poor, 5=excellent” scale, and prioritize options that score well on multiple criteria.
We often take approaches that effectively limit the weight carried by any one criterion, even though, in theory, strong enough performance on an important enough dimension ought to be able to offset any amount of weakness on other dimensions.
… I think the cost-effectiveness analysis we’ve done of top charities has probably added more value in terms of “causing us to reflect on our views, clarify our views and debate our views, thereby highlighting new key questions” than in terms of “marking some top charities as more cost-effective than others.”
Presumably there’s some probability X of averting doom that you would consider more important than 25 statistical lives. I’d also guess that you’d agree that this is true for some rather low-but-nonPascalian probabilities. Eg, I predict that if you thought about the problem even briefly, you’d agree the above claim is true for X=0.001%, not just say 30%.
(To be clear I’m definitely not saying that the grant’s effect size is >0.001% in expectation).
So then the real disagreement is either a) What X ought to be (where I presume you have a higher number than LTFF), or b) whether the game is above X.[1]
Stated more clearly, I think your disagreement with the grant is “merely” a practical disagreement about effect sizes. Whereas your language here, if taken literally, is not actually sensitive to the effect size.
(My own guess is that the grant was not above the 2022 LTFF bar, but that’s an entirely different line of reasoning). And of course implicitly I believe the 2022 LTFF bar was above the 2022 GiveWell bar by my lights.
A butterfly flaps its wings and causes a devastating hurricane to form in the tropics. Therefore, we must exterminate butterflies, because there is some small probability X that doing so will avert hurricane disaster.
But it is just as easily the case that the butterfly flaps prevent devastating hurricanes from forming. Therefore we must massively grown their population.
The point being, it can be practically impossible to understand the casual tree and get even the sign right around low probability events.
That’s what I take issue with—it’s not just the numbers, it’s the structural uncertainty of cause and effect chains when you consider really low probability events. Expected value is a pretty bad tool for action relevant decision making when you are dealing with such numerical and structural uncertainty. It’s perhaps better to pick a framework like “it’s robust under multiple decision theories” or “pick something that has the least downside risk”.
In our instance, two competing plausible structural theories among many are something like: “game teaches someone an AI safety concept → makes them more knowledgeable or inspire them to take action → they work on AI safety → solve alignment problem → future saved” vs. “people get interested in doing the most good → sees community of people that claim to do that, but that they fund rich people to make video games → causes widespread distrust of the movement → strong social stigma developed against people that care about AI risk → greatly narrowed range of people / worldviews because people don’t want to associate → makes it near impossible to solve alignment problem → future destroyed”
The justifications for these grants tend to use some simple expected value calculation of a singular rosy hypothetical casual chain. The problem is it’s possible to construct a hypothetical value chain to justify any sort of grant. So you have to do more than just make a rosy casual chain and multiply numbers through. I’ve commented before on some pretty bad ones that don’t pass the laugh test among domain experts in the climate and air quality space.
The key lesson from early EA (evidenced based giving in global health) was that it is really hard to understand if the thing you are doing is having an impact, and what the valence of the impact is, for even short, measurable casual chains. EA’s popular causes now (longtermism) seem to jettison that lesson, when it is even more unclear what the impact and sign is through complicated low probability casual chains.
So it’s about a lot more than effect sizes.
Worth noting that even GiveWell doesn’t rely on a single EV calculation either (however complex). Quoting Holden’s 10 year old writeup Sequence thinking vs. cluster thinking: