@William_MacAskill, I’m curious which (if any) of the following is your position?
1.
“I agree with Wei that an approach of ‘point AI towards these problems’ and ‘listen to the AI-results that are being produced’ has a real (>10%? >50%?) chance of ending in moral catastrophe (because ‘aligned’ AIs will end up (unintentionally) corrupting human values or otherwise leading us into incorrect conclusions).
And if we were living in a sane world, then we’d pause AI development for decades, alongside probably engaging in human intelligence enhancement, in order to solve the deep metaethical and metaphilosophical problems at play here. However, our world isn’t sane, and an AI pause isn’t in the cards: the best we can do is to differentially advance AIs’ philosophical competence,[1] and hope that that’s enough to avoid said catastrophe.”
2.
“I don’t buy the argument that aligned AIs can unintentionally corrupt human values. Furthermore, I’m decently confident that my preferred metaethical theory (e.g., idealising subjectivism) is correct. If intent alignment goes well, then I expect a fairly simple procedure like ‘give everyone a slice of the light cone, within which they can do anything they want (modulo some obvious caveats), and facilitate moral trade’ will result in a near-best future.”
3.
“Maybe aligned AIs can unintentionally corrupt human values, but I don’t particularly think this matters since it won’t be average humans making the important decisions. My proposal is that we fully hand off questions re. what to do with the light cone to AIs (rather than have these AIs boost/amplify humans). And I don’t buy that there is a metaphilosophical problem here: If we can train AIs to be at least as good as the best human philosophers at the currently in-distribution ethical+philosophical problems, then I see no reason to think that these AIs will misgeneralise out of distribution any more than humans would. (There’s nothing special about the conclusions human philosophers would reach, and so even if the AIs reach different conclusions, I don’t see that as a problem. Indeed, if anything, the humans are more likely to make random mistakes, thus I’d trust the AIs’ conclusions more.)
(And then, practically, the AIs are much faster than humans, so they will make much more progress over the crucial crunch time months. Moreover, above we were comparing AIs to the best human philosophers / to a well-organised long reflection, but the actual humans calling the shots are far below that bar. For instance, I’d say that today’s Claude has better philosophical reasoning and better starting values than the US president, or Elon Musk, or the general public. All in all, best to hand off philosophical thinking to AIs.)”
@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?
(I’m particularly interested in your plan B re. solving (meta)philosophy, since I’m exploring starting a grantmaking programme in this area. Although, I’m also interested if your answer goes in another direction.)
Giving the average person god-like powers (via an intent-aligned ASI) to reshape the universe and themselves is >25% likely to result in the universe becoming optimised for more-or-less random values—which isn’t too dissimilar to misaligned AI takeover
If we attempt to idealise a human-led reflection and hand if off to AIs, then the outcome will be at least as good as a human-led reflection (under various plausible (meta)ethical frameworks, including ones in which getting what humans-in-particular would reflectively endorse is important)
Sufficient (but not necessary) condition: advanced AIs can just perfectly simulate ideal human deliberation
‘Default’ likelihood of getting an AI pause
Tractability of pushing for an AI pause, including/specifically through trying to legiblise currently-illegible problems
Items that I don’t think should be cruxes for the present discussion, but which might be causing us to talk past each other:
In practice, human-led reflection might be kinda rushed and very far away from an ideal long reflection
For the most important decisions happening in crunch time, either it’s ASIs making the decisions, or non-reflective and not-very-smart humans
Political leaders often make bad decisions, and this is likely to get worse when the issues become more complicated (if they’re not leveraging advanced AI)
An advanced AI could be much better than any current political leader along all the character traits we’d want in a political leader (e.g., honesty, non-self-interest, policymaking capability)
@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?
The answer would depend a lot on what the alignment/capabilities profile of the AI is. But one recent update I’ve made is that humans are really terrible at strategy (in addition to philosophy) so if there was no way to pause AI, it would help a lot to get good strategic advice from AI during crunch time, which implies that maybe AI strategic competence > AI philosophical competence in importance (subject to all the usual disclaimers like dual use and how to trust or verify its answers). My latest LW post has a bit more about this.
(By “strategy” here I especially mean “grand strategy” or strategy at the highest levels, which seems more likely to be neglected versus “operational strategy” or strategy involved in accomplishing concrete tasks, which AI companies are likely to prioritize by default.)
So for example if we had an AI that’s highly competent at answering strategic questions, we could ask it “What questions should I be asking you, or what else should I be doing with my $1B?” (but this may have to be modified based on things like how much can we trust its answers of various kinds, how good is it at understanding my values/constraints/philosophies, etc.).
If we do manage to get good and trustworthy AI advice his way, another problem would be how to get key decision makers (including the public) to see and trust such answers, as they wouldn’t necessarily think to ask such questions themselves nor by default trust the AI answers. But that’s another thing that a strategically competent AI could help with.
BTW your comment made me realize that it’s plausible that AI could accelerate strategic thinking and philosophical progress much more relative to science and technology, because the latter could become bottlenecked on feedback from reality (e.g., waiting for experimental results) whereas the former seemingly wouldn’t be. I’m not sure what implications this has, but want to write it down somewhere.
Moreover, above we were comparing AIs to the best human philosophers / to a well-organised long reflection, but the actual humans calling the shots are far below that bar. For instance, I’d say that today’s Claude has better philosophical reasoning and better starting values than the US president, or Elon Musk, or the general public. All in all, best to hand off philosophical thinking to AIs.
One thought I have here is that AIs could give very different answers to different people. Do we have any idea what kind of answers Grok is (or will be) giving to Elon Musk when it comes to philosophy?
[sorry I’m late to this thread]
@William_MacAskill, I’m curious which (if any) of the following is your position?
1.
“I agree with Wei that an approach of ‘point AI towards these problems’ and ‘listen to the AI-results that are being produced’ has a real (>10%? >50%?) chance of ending in moral catastrophe (because ‘aligned’ AIs will end up (unintentionally) corrupting human values or otherwise leading us into incorrect conclusions).
And if we were living in a sane world, then we’d pause AI development for decades, alongside probably engaging in human intelligence enhancement, in order to solve the deep metaethical and metaphilosophical problems at play here. However, our world isn’t sane, and an AI pause isn’t in the cards: the best we can do is to differentially advance AIs’ philosophical competence,[1] and hope that that’s enough to avoid said catastrophe.”
2.
“I don’t buy the argument that aligned AIs can unintentionally corrupt human values. Furthermore, I’m decently confident that my preferred metaethical theory (e.g., idealising subjectivism) is correct. If intent alignment goes well, then I expect a fairly simple procedure like ‘give everyone a slice of the light cone, within which they can do anything they want (modulo some obvious caveats), and facilitate moral trade’ will result in a near-best future.”
3.
“Maybe aligned AIs can unintentionally corrupt human values, but I don’t particularly think this matters since it won’t be average humans making the important decisions. My proposal is that we fully hand off questions re. what to do with the light cone to AIs (rather than have these AIs boost/amplify humans). And I don’t buy that there is a metaphilosophical problem here: If we can train AIs to be at least as good as the best human philosophers at the currently in-distribution ethical+philosophical problems, then I see no reason to think that these AIs will misgeneralise out of distribution any more than humans would. (There’s nothing special about the conclusions human philosophers would reach, and so even if the AIs reach different conclusions, I don’t see that as a problem. Indeed, if anything, the humans are more likely to make random mistakes, thus I’d trust the AIs’ conclusions more.)
(And then, practically, the AIs are much faster than humans, so they will make much more progress over the crucial crunch time months. Moreover, above we were comparing AIs to the best human philosophers / to a well-organised long reflection, but the actual humans calling the shots are far below that bar. For instance, I’d say that today’s Claude has better philosophical reasoning and better starting values than the US president, or Elon Musk, or the general public. All in all, best to hand off philosophical thinking to AIs.)”
@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?
(I’m particularly interested in your plan B re. solving (meta)philosophy, since I’m exploring starting a grantmaking programme in this area. Although, I’m also interested if your answer goes in another direction.)
Possible cruxes:
Human-AI safety problems are >25% likely to be real and important
Giving the average person god-like powers (via an intent-aligned ASI) to reshape the universe and themselves is >25% likely to result in the universe becoming optimised for more-or-less random values—which isn’t too dissimilar to misaligned AI takeover
If we attempt to idealise a human-led reflection and hand if off to AIs, then the outcome will be at least as good as a human-led reflection (under various plausible (meta)ethical frameworks, including ones in which getting what humans-in-particular would reflectively endorse is important)
Sufficient (but not necessary) condition: advanced AIs can just perfectly simulate ideal human deliberation
‘Default’ likelihood of getting an AI pause
Tractability of pushing for an AI pause, including/specifically through trying to legiblise currently-illegible problems
Items that I don’t think should be cruxes for the present discussion, but which might be causing us to talk past each other:
In practice, human-led reflection might be kinda rushed and very far away from an ideal long reflection
For the most important decisions happening in crunch time, either it’s ASIs making the decisions, or non-reflective and not-very-smart humans
Political leaders often make bad decisions, and this is likely to get worse when the issues become more complicated (if they’re not leveraging advanced AI)
An advanced AI could be much better than any current political leader along all the character traits we’d want in a political leader (e.g., honesty, non-self-interest, policymaking capability)
“For instance, via my currently-unpublished ‘AI for philosophical progress’ and ‘Guarding against mind viruses’ proposals.”
The answer would depend a lot on what the alignment/capabilities profile of the AI is. But one recent update I’ve made is that humans are really terrible at strategy (in addition to philosophy) so if there was no way to pause AI, it would help a lot to get good strategic advice from AI during crunch time, which implies that maybe AI strategic competence > AI philosophical competence in importance (subject to all the usual disclaimers like dual use and how to trust or verify its answers). My latest LW post has a bit more about this.
(By “strategy” here I especially mean “grand strategy” or strategy at the highest levels, which seems more likely to be neglected versus “operational strategy” or strategy involved in accomplishing concrete tasks, which AI companies are likely to prioritize by default.)
So for example if we had an AI that’s highly competent at answering strategic questions, we could ask it “What questions should I be asking you, or what else should I be doing with my $1B?” (but this may have to be modified based on things like how much can we trust its answers of various kinds, how good is it at understanding my values/constraints/philosophies, etc.).
If we do manage to get good and trustworthy AI advice his way, another problem would be how to get key decision makers (including the public) to see and trust such answers, as they wouldn’t necessarily think to ask such questions themselves nor by default trust the AI answers. But that’s another thing that a strategically competent AI could help with.
BTW your comment made me realize that it’s plausible that AI could accelerate strategic thinking and philosophical progress much more relative to science and technology, because the latter could become bottlenecked on feedback from reality (e.g., waiting for experimental results) whereas the former seemingly wouldn’t be. I’m not sure what implications this has, but want to write it down somewhere.
One thought I have here is that AIs could give very different answers to different people. Do we have any idea what kind of answers Grok is (or will be) giving to Elon Musk when it comes to philosophy?
See also this post, which occurred to me after writing my previous reply to you.