I sometimes see it claimed that AI safety doesn’t require longtermism to be cost effective (roughly: the work is cost effective considering only lives affected this century). However, I can’t see how this is true. Where is the analysis that supports this, preferable relative to GiveWell?
Suppose you have 500 million to donate. You can either a) spend this on top GiveWell charities or b) approximately double all of Open Phil’s investments to date to AI safety groups.
To see the break-even point, just set (a) =(b).
At roughly $5000/death averted, (a) roughly prevents 100,000 premature deaths of children.
There are 8 billion people alive today. (This might change in the future but not by a lot). For a first approximation, for (b) to be more cost-effective than (a) without longtermism, you need to claim that doubling all of Open Phil’s investments in AI safety can reduce x-risk by >100,000/8billion = 0.0000125, or 0.00125%, or 0.125 basis points.
There are a bunch of nuances here, but roughly these are the relevant numbers.
P(AGI in our lifetime) = 80%
P(existential catastrophe | AGI) = 5%
P(human extinction | existential catastrophe) = 10%
Proposition of alignment that Open Phil have solved = 1%
Then you get AI is 4x GiveWell on lives saved.
So we’re in the same order of magnitude and all these numbers are very rough so I can see it going either way. Basically, this case is not clear cut either way. Thanks!
80%*5%*10%*1% ~= 4 x 10^-5 hmm yeah this sounds right. About 3x difference.
I agree that at those numbers the case is not clear either way (slight change in the numbers can flip the conclusion, also not uncertainties are created alike: 3x in a highly speculative calculation might not be enough to swing you to prefer it over the much more validated and careful estimates from Givewell).
Some numbers I disagree with:
P(existential catastrophe | AGI) = 5%. This number feels somewhat low to me, though I think it’s close to the median numbers that AI experts (not AI safety experts) put out.
P(human extinction | existential catastrophe) = 10%. This also feels low to me. Incidentally, if your probability of (extinction | existential catastrophe) is relatively low, you should also have a rough estimate of the number of expected lives saved from non-extinction existential catastrophe scenarios, because those might be significant.
Your other 2 numbers seem reasonable at first glance. One caveat is that you might expect the next $X of spending by Open Phil on alignment to be less effective than the first $X.
Agree the case is not very clear-cut. I remember doing some other quick modeling before and coming to a similar conclusion: by some pretty fuzzy empirical assumptions, x-safety interventions are very slightly better than global health charities for present people assuming zero risk/ambiguity aversion, but the case is pretty unclear overall.
Just to be clear, under most ethical systems this is a lower bound.
Humanity going extinct is a lot worse than 8 billion people dying, unless you don’t care at all about future lives (and you don’t care about the long term goals of present humans, most of which have at least some goals that extend beyond their death).
Hmm agreed with some caveats. Eg, for many people’s ethics, saving infants/newborns is unusually important, whereas preventing extinction is an unweighted average. So that will marginally tip the balance in favor of the global health charities.
On the other hand, you might expect increasing donations by 1% (say) to have higher marginal EV than 1% of doubling donations.
I sometimes see it claimed that AI safety doesn’t require longtermism to be cost effective (roughly: the work is cost effective considering only lives affected this century). However, I can’t see how this is true. Where is the analysis that supports this, preferable relative to GiveWell?
Suppose you have 500 million to donate. You can either a) spend this on top GiveWell charities or b) approximately double all of Open Phil’s investments to date to AI safety groups.
To see the break-even point, just set (a) =(b).
At roughly $5000/death averted, (a) roughly prevents 100,000 premature deaths of children.
There are 8 billion people alive today. (This might change in the future but not by a lot). For a first approximation, for (b) to be more cost-effective than (a) without longtermism, you need to claim that doubling all of Open Phil’s investments in AI safety can reduce x-risk by >100,000/8billion = 0.0000125, or 0.00125%, or 0.125 basis points.
There are a bunch of nuances here, but roughly these are the relevant numbers.
Say you take the following beliefs:
P(AGI in our lifetime) = 80% P(existential catastrophe | AGI) = 5% P(human extinction | existential catastrophe) = 10% Proposition of alignment that Open Phil have solved = 1%
Then you get AI is 4x GiveWell on lives saved.
So we’re in the same order of magnitude and all these numbers are very rough so I can see it going either way. Basically, this case is not clear cut either way. Thanks!
Yep this is roughly the right process!
80%*5%*10%*1% ~= 4 x 10^-5 hmm yeah this sounds right. About 3x difference.
I agree that at those numbers the case is not clear either way (slight change in the numbers can flip the conclusion, also not uncertainties are created alike: 3x in a highly speculative calculation might not be enough to swing you to prefer it over the much more validated and careful estimates from Givewell).
Some numbers I disagree with:
P(existential catastrophe | AGI) = 5%. This number feels somewhat low to me, though I think it’s close to the median numbers that AI experts (not AI safety experts) put out.
P(human extinction | existential catastrophe) = 10%. This also feels low to me. Incidentally, if your probability of (extinction | existential catastrophe) is relatively low, you should also have a rough estimate of the number of expected lives saved from non-extinction existential catastrophe scenarios, because those might be significant.
Your other 2 numbers seem reasonable at first glance. One caveat is that you might expect the next $X of spending by Open Phil on alignment to be less effective than the first $X.
Agree the case is not very clear-cut. I remember doing some other quick modeling before and coming to a similar conclusion: by some pretty fuzzy empirical assumptions, x-safety interventions are very slightly better than global health charities for present people assuming zero risk/ambiguity aversion, but the case is pretty unclear overall.
Just to be clear, under most ethical systems this is a lower bound.
Humanity going extinct is a lot worse than 8 billion people dying, unless you don’t care at all about future lives (and you don’t care about the long term goals of present humans, most of which have at least some goals that extend beyond their death).
Hmm agreed with some caveats. Eg, for many people’s ethics, saving infants/newborns is unusually important, whereas preventing extinction is an unweighted average. So that will marginally tip the balance in favor of the global health charities.
On the other hand, you might expect increasing donations by 1% (say) to have higher marginal EV than 1% of doubling donations.
I think people make this point because they think something like AGI is likely to arrive within this century, possibly within a decade.
There are several analyses of AI timelines (time until something like AGI); this literature review from Epoch is a good place to start.