I like the term AGI x-safety, to get across the fact that you are talking about safety from existential (x) catastrophe, and sophisticated AI. “AI Safety” can be conflated with more mundane risks from AI (e.g. isolated incidents with robots, self-driving car crashes etc). And “AI Alignment” is only part of the problem. Governance is also required to implement aligned AI and prevent unaligned AI.
A lot of people misunderstand “existential risk” as meaning something like “extinction risk”, rather than as meaning ‘anything that would make the future go way worse than it optimally could have’. Tacking on “safety” might contribute to that impression; we’re still making it sound like the goal is just to prevent bad things (be it at the level of an individual AGI project, or at the level of the world), leaving out the “cause good things” part. That said, “existential safety” seems better than “safety” to me.
And “AI Alignment” is only part of the problem. Governance is also required to implement aligned AI and prevent unaligned AI.
I don’t know what you mean by “governance”. The EA Forum wiki currently defines it as:
AI governance (or the governance of artificial intelligence)is the study of norms, policies, and institutions that can help humanity navigate the transition to a world with advanced artificial intelligence. This includes a broad range of subjects, from global coordination around regulating AI development to providing incentives for corporations to be more cautious in their AI research.
… which makes it sound like governance ignores plans like “just build a really good company and save the world”. If I had to guess, I’d guess that the world is likeliest to be saved because an adequate organization existed, with excellent internal norms, policies, talent, and insight. Shaping external incentives, regulations, etc. can help on the margin, but it’s a sufficiently blunt instrument that it can’t carve orgs into the exact right shape required for the problem structure.
It’s possible that the adequate organization is a government, but this seems less likely to me given the absolute number of exceptionally competent governments in history, and given that govs seem to play little role in ML progress today.
By AI governance we mean local and global norms, policies, laws, processes, politics, and institutions (not just governments) that will affect social outcomes from the development and deployment of AI systems.
Open Phil goes out of its way to say “not just governments”, but its list (“norms, policies, laws, processes, politics, and institutions”) still makes it sound like the problem is shaped more like ‘design a nuclear nonproliferation treaty’ and less like ‘figure out how to build an adequate organization’, ‘cause there to exist such an organization’, or the various activities involved in actually running such an organization and steering it to an existential success.
Both sorts of activities seem useful to me, but dividing the problem into “alignment” and “governance” seems weird to me on the above framings—like we’re going out of our way to cut skew to reality.
On my model, the proliferation of AGI tech destroys the world, as a very strong default. We need some way to prevent this proliferation, even though AGI is easily-copied software. The strategies seem to be:
Using current tech, limit the proliferation of AGI indefinitely. (E.g., by creating a stable, long-term global ban on GPUs outside of some centralized AGI collaboration, paired with pervasive global monitoring and enforcement.)
Use early AGI tech to limit AGI proliferation.
Develop other, non-AGI tech (e.g., whole-brain emulation and/or nanotech) and use it to limit AGI proliferation.
1 sounds the most like “AGI governance” to my ear, and seems impossible to me, though there might be more modest ways to improve coordination and slow progress (thus, e.g., buying a little more time for researchers to figure out how to do 2 or 3). 2 and 3 both seem promising to me, and seem more like tech that could enable a long (or short) reflection, since e.g. they could also help ensure that humanity never blows itself up with other technologies, such as bio-weapons.
Within 2, it seems to me that there are three direct inputs to things going well:
Target selection: You’ve chosen a specific set of tasks for the AGI that will somehow (paired with a specific set of human actions) result in AGI nonproliferation.
Capabilities: The AGI is powerful enough to succeed in the target task. (E.g., if the best way to save the world is by building fast-running WBE, you have AGI capable enough to do that.)
Alignment: You are able to reliably direct the AGI’s cognition at that specific target, without any catastrophic side-effects.
There’s then a larger pool of enabling work that helps with one or more of those inputs: figuring out what sorts of organizations to build; building and running those organizations; recruiting, networking, propagating information; prioritizing and allocating resources; understanding key features of the world at large, like tech forecasting, social dynamics, and the current makeup of the field; etc.
“In addition to alignment, you also need to figure out target selection, capabilities, and (list of enabling activities)” seems clear to me. And also, you might be able to side-step alignment if 3 (or 1) is viable. “Moreover, you need a way to hand back the steering wheel and hand things off to a reasonable decision-making process” seems clear to me as well. “In addition to alignment, you also need governance” is a more opaque-to-me statement, so I’d want to hear more concrete details about what that means before saying “yeah, of course you need governance too”.
This is along the lines of what I’m thinking when I say AGI Governance. The scenario outlined by the winner of FLI’s World Building Contest is an optimistic vision of this.
2. Use early AGI tech to limit AGI proliferation.
This sounds like something to be done unilaterally, as per the ‘pivotal act’ that MIRI folk talk about. To me it seems like such a thing is pretty much as impossible as safely fully aligning an AGI, so working towards doing it unilaterally seems pretty dangerous. Not least for its potential role in exacerbating race dynamics. Maybe the world will be ended by a hubristic team who are convinced that their AGI is safe enough to perform such a pivotal act, and that they need to run it because another team is very close to unleashing their potentially world ending un-aligned AGI. Or by another team seeing all the GPUs starting to melt and pressing go on their (still-not-fully-aligned) AGI… It’s like MAD, but for well intentioned would-be world-savers.
I think your view of AGI governance is idiosyncratic because of thinking in such unilateralist terms. Maybe it could be a move that leads to (the world) winning, but I think that even though effective broad global-scale governance of AGI might seem insurmountable, it’s a better shot. See also aogara’s comment and its links.
Perhaps ASI x-safety would be even better though (the SI being SuperIntelligent), if people are thinking “we win if we can build a useless-but-safe AGI”.
I’d guess not. From my perspective, humanity’s bottleneck is almost entirely that we’re clueless about alignment. If a meme adds muddle and misunderstanding, then it will be harder to get a critical mass of researchers who are extremely reasonable about alignment, and therefore harder to solve the problem.
It’s hard for muddle and misinformation to spread in exactly the right way to offset those costs; and attempting to strategically sow misinformation so will tend to erode our ability to think well and to trust each other.
I’m not sure I get your point here. Surely the terms “AI Safety” and “AI Alignment” are already causing muddle and misunderstanding? I’m saying we should be more specific in our naming of the problem.
“ASI x-safety” might be a better term for other reasons (though Nate objects to it here), but by default, I don’t think we should be influenced in our terminology decisions by ‘term T will cause some alignment researchers to have falser beliefs and pursue dumb-but-harmless strategies, and maybe this will be good’. (Or, by default this should be a reason not to adopt terminology.)
Whether current terms cause muddle and misunderstanding doesn’t change my view on this. In that case, IMO we should consider changing to a new term in order to reduce muddle and misunderstanding. We shouldn’t strategically confuse and mislead people in a new direction, just because we accidentally confused or misled people in the past.
“AGI existential safety” seems like the most popular relatively-unambiguous term for “making the AGI transition go well”, so I’m fine with using it until we find a better term.
I think “AI alignment” is a good term for the technical side of differentially producing good outcomes from AI, though it’s an imperfect term insofar as it collides with Stuart Russell’s “value alignment” and Paul Christiano’s “intent alignment”. (The latter, at least, better subsumes a lot of the core challenges in making AI go well.)
I like the term AGI x-safety, to get across the fact that you are talking about safety from existential (x) catastrophe, and sophisticated AI. “AI Safety” can be conflated with more mundane risks from AI (e.g. isolated incidents with robots, self-driving car crashes etc). And “AI Alignment” is only part of the problem. Governance is also required to implement aligned AI and prevent unaligned AI.
A lot of people misunderstand “existential risk” as meaning something like “extinction risk”, rather than as meaning ‘anything that would make the future go way worse than it optimally could have’. Tacking on “safety” might contribute to that impression; we’re still making it sound like the goal is just to prevent bad things (be it at the level of an individual AGI project, or at the level of the world), leaving out the “cause good things” part. That said, “existential safety” seems better than “safety” to me.
(Nate’s thoughts here.)
I don’t know what you mean by “governance”. The EA Forum wiki currently defines it as:
… which makes it sound like governance ignores plans like “just build a really good company and save the world”. If I had to guess, I’d guess that the world is likeliest to be saved because an adequate organization existed, with excellent internal norms, policies, talent, and insight. Shaping external incentives, regulations, etc. can help on the margin, but it’s a sufficiently blunt instrument that it can’t carve orgs into the exact right shape required for the problem structure.
It’s possible that the adequate organization is a government, but this seems less likely to me given the absolute number of exceptionally competent governments in history, and given that govs seem to play little role in ML progress today.
Open Phil’s definition is a bit different:
Open Phil goes out of its way to say “not just governments”, but its list (“norms, policies, laws, processes, politics, and institutions”) still makes it sound like the problem is shaped more like ‘design a nuclear nonproliferation treaty’ and less like ‘figure out how to build an adequate organization’, ‘cause there to exist such an organization’, or the various activities involved in actually running such an organization and steering it to an existential success.
Both sorts of activities seem useful to me, but dividing the problem into “alignment” and “governance” seems weird to me on the above framings—like we’re going out of our way to cut skew to reality.
On my model, the proliferation of AGI tech destroys the world, as a very strong default. We need some way to prevent this proliferation, even though AGI is easily-copied software. The strategies seem to be:
Using current tech, limit the proliferation of AGI indefinitely. (E.g., by creating a stable, long-term global ban on GPUs outside of some centralized AGI collaboration, paired with pervasive global monitoring and enforcement.)
Use early AGI tech to limit AGI proliferation.
Develop other, non-AGI tech (e.g., whole-brain emulation and/or nanotech) and use it to limit AGI proliferation.
1 sounds the most like “AGI governance” to my ear, and seems impossible to me, though there might be more modest ways to improve coordination and slow progress (thus, e.g., buying a little more time for researchers to figure out how to do 2 or 3). 2 and 3 both seem promising to me, and seem more like tech that could enable a long (or short) reflection, since e.g. they could also help ensure that humanity never blows itself up with other technologies, such as bio-weapons.
Within 2, it seems to me that there are three direct inputs to things going well:
Target selection: You’ve chosen a specific set of tasks for the AGI that will somehow (paired with a specific set of human actions) result in AGI nonproliferation.
Capabilities: The AGI is powerful enough to succeed in the target task. (E.g., if the best way to save the world is by building fast-running WBE, you have AGI capable enough to do that.)
Alignment: You are able to reliably direct the AGI’s cognition at that specific target, without any catastrophic side-effects.
There’s then a larger pool of enabling work that helps with one or more of those inputs: figuring out what sorts of organizations to build; building and running those organizations; recruiting, networking, propagating information; prioritizing and allocating resources; understanding key features of the world at large, like tech forecasting, social dynamics, and the current makeup of the field; etc.
“In addition to alignment, you also need to figure out target selection, capabilities, and (list of enabling activities)” seems clear to me. And also, you might be able to side-step alignment if 3 (or 1) is viable. “Moreover, you need a way to hand back the steering wheel and hand things off to a reasonable decision-making process” seems clear to me as well. “In addition to alignment, you also need governance” is a more opaque-to-me statement, so I’d want to hear more concrete details about what that means before saying “yeah, of course you need governance too”.
This is along the lines of what I’m thinking when I say AGI Governance. The scenario outlined by the winner of FLI’s World Building Contest is an optimistic vision of this.
This sounds like something to be done unilaterally, as per the ‘pivotal act’ that MIRI folk talk about. To me it seems like such a thing is pretty much as impossible as safely fully aligning an AGI, so working towards doing it unilaterally seems pretty dangerous. Not least for its potential role in exacerbating race dynamics. Maybe the world will be ended by a hubristic team who are convinced that their AGI is safe enough to perform such a pivotal act, and that they need to run it because another team is very close to unleashing their potentially world ending un-aligned AGI. Or by another team seeing all the GPUs starting to melt and pressing go on their (still-not-fully-aligned) AGI… It’s like MAD, but for well intentioned would-be world-savers.
I think your view of AGI governance is idiosyncratic because of thinking in such unilateralist terms. Maybe it could be a move that leads to (the world) winning, but I think that even though effective broad global-scale governance of AGI might seem insurmountable, it’s a better shot. See also aogara’s comment and its links.
Perhaps ASI x-safety would be even better though (the SI being SuperIntelligent), if people are thinking “we win if we can build a useless-but-safe AGI”.
I’d guess not. From my perspective, humanity’s bottleneck is almost entirely that we’re clueless about alignment. If a meme adds muddle and misunderstanding, then it will be harder to get a critical mass of researchers who are extremely reasonable about alignment, and therefore harder to solve the problem.
It’s hard for muddle and misinformation to spread in exactly the right way to offset those costs; and attempting to strategically sow misinformation so will tend to erode our ability to think well and to trust each other.
I’m not sure I get your point here. Surely the terms “AI Safety” and “AI Alignment” are already causing muddle and misunderstanding? I’m saying we should be more specific in our naming of the problem.
“ASI x-safety” might be a better term for other reasons (though Nate objects to it here), but by default, I don’t think we should be influenced in our terminology decisions by ‘term T will cause some alignment researchers to have falser beliefs and pursue dumb-but-harmless strategies, and maybe this will be good’. (Or, by default this should be a reason not to adopt terminology.)
Whether current terms cause muddle and misunderstanding doesn’t change my view on this. In that case, IMO we should consider changing to a new term in order to reduce muddle and misunderstanding. We shouldn’t strategically confuse and mislead people in a new direction, just because we accidentally confused or misled people in the past.
What are some better options? Or, what are your current favourites?
“AGI existential safety” seems like the most popular relatively-unambiguous term for “making the AGI transition go well”, so I’m fine with using it until we find a better term.
I think “AI alignment” is a good term for the technical side of differentially producing good outcomes from AI, though it’s an imperfect term insofar as it collides with Stuart Russell’s “value alignment” and Paul Christiano’s “intent alignment”. (The latter, at least, better subsumes a lot of the core challenges in making AI go well.)
Perhaps using “doom” more could work (doom encompasses extinction, permanent curtailment of future potential, and fates worse than extinction).