[New candidate framing for existential risk reduction]
The default [edit: implicit]framing for reducing existential risk is something like this. “Currently, humans have control over what we want, but there’s a risk that we would lose this control. For instance, transformative AI that’s misaligned with what we’d want could prevent us from actualizing good futures.”
I don’t find this framing particularly compelling. I don’t feel like people are particularly “in control of things.” There are areas/domains where our control is growing, but there are also areas/domains where it is waning (e.g., cost disease; dysfunctional institutions). (Or, instead of “control waning,” we can also think of misaligned forces taking away some of our control – for instance with filter bubbles and other polarizing forces reducing the sense that all people have a shared reality.)
The framing I find most compelling is the following:
“Humans aren’t particularly in control of things, but there are areas where technological progress has given us surprisingly advanced capabilities, and every now and then, some groups of people manage to use those capabilities really well. If we want to reduce existential risks, we’d require almost god-like degrees of control over the future and the wisdom/foresight to use it to our advantage. AI risk, in particular, seems especially important from this perspective – for two reasons. (1) AI will likely be radically transformative. Since it’s generally much easier to design good systems from scratch rather than make tweaks to existing systems, transformative AI (precisely because of its potential to be transformative) is our best chance to get in control of things. (2) If we fail to align AI, we won’t be left in a position where we could attain control over things later.”
Fwiw (1) is more naturally phrased as an opportunity associated with AI than a risk (“AI opportunity” vs “AI risk”). And if so you may want to use another term than “existential risk reduction” for the concept that you’ve identified.
“The default framing for reducing existential risk is something like this. “Currently, humans have control over what we want, but there’s a risk that we would lose this control”
Can you perhaps point to some examples?
To me it seems that the default framing is often focused on extinction risks, and then non-extinction existential risks are mentioned as a sort of secondary case. Under this framing you’re not really mentioning the issue of control, but are rather mostly focusing on the distinction between survival and extinction.
Maybe you had specific writings (focusing on AI risk?) in mind though?
Good points. I should have written that the point about control is implicit. The default framing focuses on risks, as you say, not on making something happen that gives us more control than we currently have. I think there’s a natural reading of the existential risk framings that implicitly says something like “current levels of control might be adequate if it weren’t for destructive risks” or perhaps “there’s a trend where control increases by default and things might go well unless some risk comes about.” To be clear, that’s by no means a necessary implication of any text on existential risks. It’s just something that is under-discussed, and the lack of discussion suggests that some people might think that way.
In discussions on the difficulty of aligning transformative AI, I’ve seen reference class arguments like “When engineers build and deploy things, it rarely turns out to be destructive.”
I’ve always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: “When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?”
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where “our algorithm affects society poorly” would receive a lot of sincere attention even at management levels, securely going forward throughout the company’s existence.)
Admittedly, I’m kind of lumping together the alignment and coordination problems. Someone could have the view that “AI alignment,” with a narrow definition of what counts as “aligned,” is comparatively easy, but coordination could still be hard.
[New candidate framing for existential risk reduction]
The default [edit: implicit]framing for reducing existential risk is something like this. “Currently, humans have control over what we want, but there’s a risk that we would lose this control. For instance, transformative AI that’s misaligned with what we’d want could prevent us from actualizing good futures.”
I don’t find this framing particularly compelling. I don’t feel like people are particularly “in control of things.” There are areas/domains where our control is growing, but there are also areas/domains where it is waning (e.g., cost disease; dysfunctional institutions). (Or, instead of “control waning,” we can also think of misaligned forces taking away some of our control – for instance with filter bubbles and other polarizing forces reducing the sense that all people have a shared reality.)
The framing I find most compelling is the following:
“Humans aren’t particularly in control of things, but there are areas where technological progress has given us surprisingly advanced capabilities, and every now and then, some groups of people manage to use those capabilities really well. If we want to reduce existential risks, we’d require almost god-like degrees of control over the future and the wisdom/foresight to use it to our advantage. AI risk, in particular, seems especially important from this perspective – for two reasons. (1) AI will likely be radically transformative. Since it’s generally much easier to design good systems from scratch rather than make tweaks to existing systems, transformative AI (precisely because of its potential to be transformative) is our best chance to get in control of things. (2) If we fail to align AI, we won’t be left in a position where we could attain control over things later.”
Fwiw (1) is more naturally phrased as an opportunity associated with AI than a risk (“AI opportunity” vs “AI risk”). And if so you may want to use another term than “existential risk reduction” for the concept that you’ve identified.
A bit related to an opportunity+risk framing of AI: Artficial Intelligence as a Positive and Negative Factor in Global Risk.
“The default framing for reducing existential risk is something like this. “Currently, humans have control over what we want, but there’s a risk that we would lose this control”
Can you perhaps point to some examples?
To me it seems that the default framing is often focused on extinction risks, and then non-extinction existential risks are mentioned as a sort of secondary case. Under this framing you’re not really mentioning the issue of control, but are rather mostly focusing on the distinction between survival and extinction.
Maybe you had specific writings (focusing on AI risk?) in mind though?
Good points. I should have written that the point about control is implicit. The default framing focuses on risks, as you say, not on making something happen that gives us more control than we currently have. I think there’s a natural reading of the existential risk framings that implicitly says something like “current levels of control might be adequate if it weren’t for destructive risks” or perhaps “there’s a trend where control increases by default and things might go well unless some risk comes about.” To be clear, that’s by no means a necessary implication of any text on existential risks. It’s just something that is under-discussed, and the lack of discussion suggests that some people might think that way.
The second part of my comment here is relevant for this thread’s theme – it explains my position a bit better.
In discussions on the difficulty of aligning transformative AI, I’ve seen reference class arguments like “When engineers build and deploy things, it rarely turns out to be destructive.”
I’ve always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: “When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?”
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where “our algorithm affects society poorly” would receive a lot of sincere attention even at management levels, securely going forward throughout the company’s existence.)
Admittedly, I’m kind of lumping together the alignment and coordination problems. Someone could have the view that “AI alignment,” with a narrow definition of what counts as “aligned,” is comparatively easy, but coordination could still be hard.