“EA has a strong cultural bias in favor of believing arbitrary problems are solvable”.
I think you’re pointing to a real phenomenon here (though I might not call it an “optimism bias”—EAs also tend to be unusually pessimistic about some things).
I have pretty strong disagreements with a lot of the more concrete points in the post though, I’ve tried to focus on the most important ones below.
Conclusion One: Pursuing the basic plan entailed in premises 1-4 saves, in expectation, at least 4.8 million lives (800,000 * 0.06 * 0.1 * 0.1).
(I think you may have missed the factor of 0.01, the relative risk reduction you postulated? I get 8 billion * 0.06 * 0.01 * 0.1 * 0.1 = 48,000. So AI safety would look worse by a factor of 100 compared to your numbers.)
But anyway, I strongly disagree with those numbers, and I’m pretty confused as to what kind of model generates them. Specifically, you seem to be extremely confident that we can’t solve AI X-risk (< 1⁄10,000 chance if we multiply together the 1% relative reduction with your two 10% chances). On the other hand, you think we’ll most likely be fine by default (94%). So you seem to be saying that there probably isn’t any problem in the first place, but if there is, then we should be extremely certain that it’s basically intractable. This seems weird to me. Why are you so sure that there isn’t a problem which would lead to catastrophe by default, but which could be solved by e.g. 1,000 AI safety researchers working for 10 years? To get to your level of certainty (<1/10,000 is a lot!), you’d need a very detailed model of AI X-risk IMO, more detailed than I think anyone has written about. A lot of the uncertainty people tend to have about AI X-risk comes specifically from the fact that we’re unsure what the main sources of risk are etc., so it’s unclear how you’d exclude the possibility that there are significant sources of risk that are reasonably easy to address.
As to why I’m not convinced by the argument that leads you to the <1/10,000 chance: the methodology of “split my claim into a conjunction of subclaims, then assign reasonable-sounding probabilities to each, then multiply” often just doesn’t work well (there are exceptions, but this certainly isn’t one of them IMO). You can get basically arbitrary result by splitting up the claim in different ways, since what probabilities are “reasonable-sounding” isn’t very consistent in humans.
Okay, a longtermist might say. Maybe the odds are really slim that we thread this needle, and then also the subsequent needles required to create an interstellar civilization spanning billions of years. But the value of that scenario is so high that if you shut up and multiply, it’s worth putting a lot of resources in that direction.
I can’t speak for all longtermists of course, but that is decidedly not an argument I want to make (and FWIW, my impression is that this is not the key objection most longtermists would raise). If you convinced me that our chances of preventing an AI existential catastrophe were <1/10,000, and that additionally we’d very likely die in a few centuries anyway (not sure just how likely you think that is?), then I would probably throw the expected value calculations out the window and start from scratch trying to figure out what’s important. Basically for exactly the reasons you mention: at some point this starts feeling like a Pascal’s mugging, and that seems fishy and confusing.
But I think the actual chances we prevent an AI existential catastrophe are way higher than 1⁄10,000 (more like 1⁄10 in terms of the order of magnitude). And I think conditioned on that, our chances of surviving for billions of years are pretty decent (very spontaneous take: >=50%). Those feel like cruxes to me way more than whether we should blindly do expected value calculations with tiny probabilities, because my probabilities aren’t tiny.
Scenario Two: Same as scenario one, but there’s a black hole/alien invasion/unstoppable asteroid/solar flare/some other astronomical event we don’t know about yet that unavoidably destroys the planet in the next millennium or two. (I don’t think this scenario is likely, but it is possible.)
I agree it’s possible in a very weak sense, but I think we can say something stronger about just how unlikely this is (over the next millennium or two): Nothing like this has happened over the past 65 million years (where I’m counting the asteroid back then as “unstoppable” even though I think we could stop that soon after AGI). So unless you think that alien invasions are reasonably likely to happen soon (but were’t likely before we sent out radio waves, for example), this scenario seems to be firmly in the “not really worth thinking about” category.
This may seem really nitpicky, but I think it’s important when we talk about how likely it is that we’ll continue living for billions of years. You give several scenarios for how things could go badly, but it would be just as easy to list scenarios for how things could go well. Listing very unlikely scenarios, especially just on one side, actively makes our impression of the overall probabilities worse.
Ah yeah, you’re right—I think basically I put in the percent rather than the probability. So it would indeed be very expensive to be competitive with AMF. Though so is everything else, so that’s not hugely surprising.
As for the numbers, yeah, it does just strike me as really, really unlikely that we can solve AI x-risk right now. 1⁄10,000 does feel about right to me. I certainly wouldn’t expect everyone else to agree though! I think some people would put the odds much higher, and others (like Tyler Cowen maybe?) would put them a bit lower. Probably the 1% step is the step I’m least confident in—wouldn’t surprise me if the (hard to find, hard to execute) solutions that are findable would reduce risk significantly more.
EDIT: tried to fix the math and switched the “relative risk reduction term” to 10%. I feel like among findable, executable interventions there’s probably a lot of variance, and it’s plausible some of the best ones do reduce risk by 10% or so. And 1/1000 feels about as plausible as 1/10000 to me. So, somewhere in there.
it does just strike me as really, really unlikely that we can solve AI x-risk right now
I think Erik wasn’t commenting so much on this number, but rather its combination with the assumption that there is a 94% chance things are fine by default.
I.e. you are assuming that there is a 94% chance it’s trivially easy, and 6% chance it’s insanely hard.
Very few problems have such a bimodal nature, and I also would be interested to understand what’s generating it for you.
I think you’re pointing to a real phenomenon here (though I might not call it an “optimism bias”—EAs also tend to be unusually pessimistic about some things).
I have pretty strong disagreements with a lot of the more concrete points in the post though, I’ve tried to focus on the most important ones below.
(I think you may have missed the factor of 0.01, the relative risk reduction you postulated? I get 8 billion * 0.06 * 0.01 * 0.1 * 0.1 = 48,000. So AI safety would look worse by a factor of 100 compared to your numbers.)
But anyway, I strongly disagree with those numbers, and I’m pretty confused as to what kind of model generates them. Specifically, you seem to be extremely confident that we can’t solve AI X-risk (< 1⁄10,000 chance if we multiply together the 1% relative reduction with your two 10% chances). On the other hand, you think we’ll most likely be fine by default (94%). So you seem to be saying that there probably isn’t any problem in the first place, but if there is, then we should be extremely certain that it’s basically intractable. This seems weird to me. Why are you so sure that there isn’t a problem which would lead to catastrophe by default, but which could be solved by e.g. 1,000 AI safety researchers working for 10 years? To get to your level of certainty (<1/10,000 is a lot!), you’d need a very detailed model of AI X-risk IMO, more detailed than I think anyone has written about. A lot of the uncertainty people tend to have about AI X-risk comes specifically from the fact that we’re unsure what the main sources of risk are etc., so it’s unclear how you’d exclude the possibility that there are significant sources of risk that are reasonably easy to address.
As to why I’m not convinced by the argument that leads you to the <1/10,000 chance: the methodology of “split my claim into a conjunction of subclaims, then assign reasonable-sounding probabilities to each, then multiply” often just doesn’t work well (there are exceptions, but this certainly isn’t one of them IMO). You can get basically arbitrary result by splitting up the claim in different ways, since what probabilities are “reasonable-sounding” isn’t very consistent in humans.
I can’t speak for all longtermists of course, but that is decidedly not an argument I want to make (and FWIW, my impression is that this is not the key objection most longtermists would raise). If you convinced me that our chances of preventing an AI existential catastrophe were <1/10,000, and that additionally we’d very likely die in a few centuries anyway (not sure just how likely you think that is?), then I would probably throw the expected value calculations out the window and start from scratch trying to figure out what’s important. Basically for exactly the reasons you mention: at some point this starts feeling like a Pascal’s mugging, and that seems fishy and confusing.
But I think the actual chances we prevent an AI existential catastrophe are way higher than 1⁄10,000 (more like 1⁄10 in terms of the order of magnitude). And I think conditioned on that, our chances of surviving for billions of years are pretty decent (very spontaneous take: >=50%). Those feel like cruxes to me way more than whether we should blindly do expected value calculations with tiny probabilities, because my probabilities aren’t tiny.
I agree it’s possible in a very weak sense, but I think we can say something stronger about just how unlikely this is (over the next millennium or two): Nothing like this has happened over the past 65 million years (where I’m counting the asteroid back then as “unstoppable” even though I think we could stop that soon after AGI). So unless you think that alien invasions are reasonably likely to happen soon (but were’t likely before we sent out radio waves, for example), this scenario seems to be firmly in the “not really worth thinking about” category.
This may seem really nitpicky, but I think it’s important when we talk about how likely it is that we’ll continue living for billions of years. You give several scenarios for how things could go badly, but it would be just as easy to list scenarios for how things could go well. Listing very unlikely scenarios, especially just on one side, actively makes our impression of the overall probabilities worse.
Ah yeah, you’re right—I think basically I put in the percent rather than the probability. So it would indeed be very expensive to be competitive with AMF. Though so is everything else, so that’s not hugely surprising.
As for the numbers, yeah, it does just strike me as really, really unlikely that we can solve AI x-risk right now. 1⁄10,000 does feel about right to me. I certainly wouldn’t expect everyone else to agree though! I think some people would put the odds much higher, and others (like Tyler Cowen maybe?) would put them a bit lower. Probably the 1% step is the step I’m least confident in—wouldn’t surprise me if the (hard to find, hard to execute) solutions that are findable would reduce risk significantly more.
EDIT: tried to fix the math and switched the “relative risk reduction term” to 10%. I feel like among findable, executable interventions there’s probably a lot of variance, and it’s plausible some of the best ones do reduce risk by 10% or so. And 1/1000 feels about as plausible as 1/10000 to me. So, somewhere in there.
I think Erik wasn’t commenting so much on this number, but rather its combination with the assumption that there is a 94% chance things are fine by default.
I.e. you are assuming that there is a 94% chance it’s trivially easy, and 6% chance it’s insanely hard.
Very few problems have such a bimodal nature, and I also would be interested to understand what’s generating it for you.