I think reducing the risk of misaligned AI takeover looks like a pretty good usage of people on the margin. My guess is that misaligned AI takeover typically doesn’t result in extinction in the normal definition of the term (killing basically all humans within 100 years). (Maybe I think the chance of extinction-defined-normally given AI takeover is 1⁄3.)
Thus, for me, the bottom line of the debate statement comes down to whether misaligned AI takeover which doesn’t result in extinction-defined-normally actually counts as extinction in the definition used in the post.
I don’t feel like I understand the definition you give of “a future with 0 value” handles cases like:
“Misaligned AIs takeover and have preferences that on their own have ~0 value from our perspective. However, these AIs keep most humans alive out of a small amount of kindness and due to acausal trade. Additionally, lots of stuff happens in our lightcone which is good due to acausal trade (but this was paid for by some entity that shared our preferences). Despite this, misaligned AI takeover is actually somewhat worse (from a pure longtermist perspective) than life on earth being wiped about prior to this point, because aliens were about 50% likely to be able to colonize most of our lightcone (or misaligned AIs they create would do this colonization) and they share our preferences substantially more than the AIs do.”
More generally, my current overall guess at a preference ordering something like: control by a relatively enlightened human society that shares my moral perspectives (and has relatively distributed power > human control where power is roughly as democratic as now > human dictator > humans are driven extinct but primates aren’t (so probably other primates develop an intelligent civilization in like 10-100 million years) > earth is wiped out totally (no AIs and no chance for intelligent civilization to re-evolve) > misaligned AI takeover > earth is wiped out and there aren’t aliens so nothing ever happens with resources in our lightcone > various s-risk scenarios.
What line here counts as “extinction”? Does moving from misaligned AI takeover to “human control where power is roughly as democratic as now” count as an anti extinction scenario?
I’m not sure whether to count AI takeover as extinction or just as a worse future—maybe I should define extinction as actually just literal extinction, and leave scenarios with very small populations out of the definition. Any thoughts on the best way to define it here? I agree it needs some refining.
How about ‘On the margin, work on reducing the chance of our extinction is the work that most increases the value of the future’?
As I see it, the main issue with the framing in this post is that the work to reduce the chances of extinction might be the exact same work as the work to increase EV conditional on survival. In particular, preventing AI takeover might be the most valuable work for both. In which case the question would be asking to compare the overall marginal value of those takeover-prevention actions with the overall marginal value of those same actions.
(At first glance it’s an interesting coincidence for the same actions to help the most with both, but on reflection it’s not that unusual for these to align. Being in a serious car crash is really bad, both because you might die and because it could make your life much worse if you survive. Similarly with serious illness. Or, for nations/cities/tribes throughout history, losing a war where you’re conquered could lead to the conquerors killing you or doing other bad things to you. Avoiding something bad that might be fatal can be very valuable both for avoiding death and for the value conditional on survival.)
That’s a really interesting solution—I’m a bit swamped today but I’ll seriously consider this tomorrow—it might be a nice way to clarify things without changing the meaning of the statement for people who have already written posts. Cheers!
I think I’ll stick with this current statement—partly because it’s now been announced for a while so people may be relying on its specific implications for their essays, but also because this new formulation (to me) doesn’t seem to avoid the problem you raise, that it isn’t clear what your vote would be if you think the same type of work is recommended for both. Perhaps the solution to that issue is in footnote 3 on the current banner—if you think that the value from working on AI takeover is mostly from avoiding extinction, then you should vote agree. If you think it is from increasing the value of the future by another means (such as more democratic control of the future by humans), then you should vote disagree.
I think reducing the risk of misaligned AI takeover looks like a pretty good usage of people on the margin. My guess is that misaligned AI takeover typically doesn’t result in extinction in the normal definition of the term (killing basically all humans within 100 years). (Maybe I think the chance of extinction-defined-normally given AI takeover is 1⁄3.)
Thus, for me, the bottom line of the debate statement comes down to whether misaligned AI takeover which doesn’t result in extinction-defined-normally actually counts as extinction in the definition used in the post.
I don’t feel like I understand the definition you give of “a future with 0 value” handles cases like:
“Misaligned AIs takeover and have preferences that on their own have ~0 value from our perspective. However, these AIs keep most humans alive out of a small amount of kindness and due to acausal trade. Additionally, lots of stuff happens in our lightcone which is good due to acausal trade (but this was paid for by some entity that shared our preferences). Despite this, misaligned AI takeover is actually somewhat worse (from a pure longtermist perspective) than life on earth being wiped about prior to this point, because aliens were about 50% likely to be able to colonize most of our lightcone (or misaligned AIs they create would do this colonization) and they share our preferences substantially more than the AIs do.”
More generally, my current overall guess at a preference ordering something like: control by a relatively enlightened human society that shares my moral perspectives (and has relatively distributed power > human control where power is roughly as democratic as now > human dictator > humans are driven extinct but primates aren’t (so probably other primates develop an intelligent civilization in like 10-100 million years) > earth is wiped out totally (no AIs and no chance for intelligent civilization to re-evolve) > misaligned AI takeover > earth is wiped out and there aren’t aliens so nothing ever happens with resources in our lightcone > various s-risk scenarios.
What line here counts as “extinction”? Does moving from misaligned AI takeover to “human control where power is roughly as democratic as now” count as an anti extinction scenario?
I’m not sure whether to count AI takeover as extinction or just as a worse future—maybe I should define extinction as actually just literal extinction, and leave scenarios with very small populations out of the definition. Any thoughts on the best way to define it here? I agree it needs some refining.
How about ‘On the margin, work on reducing the chance of our extinction is the work that most increases the value of the future’?
As I see it, the main issue with the framing in this post is that the work to reduce the chances of extinction might be the exact same work as the work to increase EV conditional on survival. In particular, preventing AI takeover might be the most valuable work for both. In which case the question would be asking to compare the overall marginal value of those takeover-prevention actions with the overall marginal value of those same actions.
(At first glance it’s an interesting coincidence for the same actions to help the most with both, but on reflection it’s not that unusual for these to align. Being in a serious car crash is really bad, both because you might die and because it could make your life much worse if you survive. Similarly with serious illness. Or, for nations/cities/tribes throughout history, losing a war where you’re conquered could lead to the conquerors killing you or doing other bad things to you. Avoiding something bad that might be fatal can be very valuable both for avoiding death and for the value conditional on survival.)
That’s a really interesting solution—I’m a bit swamped today but I’ll seriously consider this tomorrow—it might be a nice way to clarify things without changing the meaning of the statement for people who have already written posts. Cheers!
I think I’ll stick with this current statement—partly because it’s now been announced for a while so people may be relying on its specific implications for their essays, but also because this new formulation (to me) doesn’t seem to avoid the problem you raise, that it isn’t clear what your vote would be if you think the same type of work is recommended for both. Perhaps the solution to that issue is in footnote 3 on the current banner—if you think that the value from working on AI takeover is mostly from avoiding extinction, then you should vote agree. If you think it is from increasing the value of the future by another means (such as more democratic control of the future by humans), then you should vote disagree.