I think this is a plausible consequence, but not a clear one.
Many people put significant value on conservation. It is plausible that some version of this would survive in an AI which was somewhat misaligned (especially since conservation might be a reasonably simple goal to point towards), such that it would spend some fraction of its resources towards preserving nature—and one planet is a tiny fraction of the resources it could expect to end up with.
The most straightforward argument against this is that such an AI maybe wouldn’t wipe out all humans. I tend to agree, and a good amount of my probability mass on “existential catastrophe from misaligned AI” does not involve human extinction. But I think there’s some possible middle ground where an AI was not capable of reliably seizing power without driving humans extinct, but was capable if it allowed itself to do so, could wipe them out without eliminating nature (which would presumably pose much less threat to its ascendancy).
Whether AI would wipe out humans entirely is a separate question (and one which has been debated extensively, to the point where I don’t think I have much to add to that conversation, even if I have opinions)
What I’m arguing for here is narrowly: Would AI which wipes out humans leave nature intact? I think the answer to that is pretty clearly no by default.
Yeah, I understood this. This is why I’ve focused on a particular case for it valuing nature which I think could be compatible with wiping out humans (not going into the other cases that Ryan discusses, which I think would be more likely to involve keeping humans around). I needed to bring in the point about humans surviving to address the counterargument “oh but in that case probably humans would survive too” (which I think is probable but not certain). Anyway maybe I was slightly overstating the point? Like I agree that in this scenario the most likely outcome is that nature doesn’t meaningfully survive. But it sounded like you were arguing that it was obvious that nature wouldn’t survive, which doesn’t sound right to me.
I don’t claim it’s impossible that nature survives an AI apocalypse which kills off humanity, but I do think it’s an extremely thin sliver of the outcome space (<0.1%). What odds would you assign to this?
Ok, I guess around 1%? But this is partially driven by model uncertainty; I don’t actually feel confident your number is too small.
I’m much higher (tens of percentage points) on “chance nature survives conditional on most humans being wiped out”; it’s just that most of these scenarios involve some small number of humans being kept around so it’s not literal extinction. (And I think these scenarios are a good part of things people intuitively imagine and worry about when you talk about human extinction from AI, even though the label isn’t literally applicable.)
Thanks for asking explicitly about the odds, I might not have noticed this distinction otherwise.
I thought about where the logic in the post seemed to be going wrong, and it led me to write this quick take on why most possible goals of AI systems are partially concerned with process and not just outcomes.
I think this is a plausible consequence, but not a clear one.
Many people put significant value on conservation. It is plausible that some version of this would survive in an AI which was somewhat misaligned (especially since conservation might be a reasonably simple goal to point towards), such that it would spend some fraction of its resources towards preserving nature—and one planet is a tiny fraction of the resources it could expect to end up with.
The most straightforward argument against this is that such an AI maybe wouldn’t wipe out all humans. I tend to agree, and a good amount of my probability mass on “existential catastrophe from misaligned AI” does not involve human extinction. But I think there’s some possible middle ground where an AI was not capable of reliably seizing power without driving humans extinct, but was capable if it allowed itself to do so, could wipe them out without eliminating nature (which would presumably pose much less threat to its ascendancy).
Whether AI would wipe out humans entirely is a separate question (and one which has been debated extensively, to the point where I don’t think I have much to add to that conversation, even if I have opinions)
What I’m arguing for here is narrowly: Would AI which wipes out humans leave nature intact? I think the answer to that is pretty clearly no by default.
Yeah, I understood this. This is why I’ve focused on a particular case for it valuing nature which I think could be compatible with wiping out humans (not going into the other cases that Ryan discusses, which I think would be more likely to involve keeping humans around). I needed to bring in the point about humans surviving to address the counterargument “oh but in that case probably humans would survive too” (which I think is probable but not certain). Anyway maybe I was slightly overstating the point? Like I agree that in this scenario the most likely outcome is that nature doesn’t meaningfully survive. But it sounded like you were arguing that it was obvious that nature wouldn’t survive, which doesn’t sound right to me.
I don’t claim it’s impossible that nature survives an AI apocalypse which kills off humanity, but I do think it’s an extremely thin sliver of the outcome space (<0.1%). What odds would you assign to this?
Ok, I guess around 1%? But this is partially driven by model uncertainty; I don’t actually feel confident your number is too small.
I’m much higher (tens of percentage points) on “chance nature survives conditional on most humans being wiped out”; it’s just that most of these scenarios involve some small number of humans being kept around so it’s not literal extinction. (And I think these scenarios are a good part of things people intuitively imagine and worry about when you talk about human extinction from AI, even though the label isn’t literally applicable.)
Thanks for asking explicitly about the odds, I might not have noticed this distinction otherwise.
I thought about where the logic in the post seemed to be going wrong, and it led me to write this quick take on why most possible goals of AI systems are partially concerned with process and not just outcomes.