Will MacAskill stated in a recent 80,000 hours podcast that he believes marginal work on trajectory change toward a best possible future rather than a mediocre future seems likely significantly more valuable than marginal work on extinction risk.
Could you explain what the key crucial considerations are for this claim to be true, and a basic argument for why think each of the crucial considerations resolves in favor of this claim?
Would also love to hear if others have any other crucial considerations they think weigh in one direction or the other.
Will is thinking about this much more actively and will give the best answer, but here are some key crucial considerations:
How tractable is extinction risk reduction and trajectory change work?
As a part of that, are there ways that we can have a predictable and persistent effect on the value of the long-term future other than by reducing extinction risk?
How good is the future by default?
How good are the best attainable futures?
These are basically Tractability and Importance from the INT framework.
Some of the biggest disagreements in the field are over how likely we are to achieve eutopia by default (or what % of eutopia we will achieve) and what, if anything, can be done to predictably shape the far future. Populating and refining a list of answers to this last question has been a lot of the key work of the field over the past few years.
I think Will MacAskill and Finn Morehouse’s paper rests on the crucial consideration that aligning ASI is possible (by anyone at all). They haven’t established this (EDIT: by this I mean they don’t cite to any supporting arguments for this, rather than personally coming up with the arguments themselves. But as far as I know, there aren’t any supporting arguments for the assumption, and in fact there are good arguments on the other side for why aligning ASI is fundamentally impossible).
To clarify, do you think there’s a large minority change that it is possible to align an arbitrarily powerful system, or do you think there is a large minority chance that it is going to happen with the first such arbitrarily powerful system, such that we’re not locked in to a different future / killed by a misaligned singleton?
Why do you think this? What make you think that it’s possible at all?[1] And what do you mean by “large minority”? Can you give an approximate percentage?
Or to paraphrase Yampolskiy: what makes it possible for a less intelligent species to indefinitely control a more intelligent species (when this has never happened before)?
To respond to Yampolskiy without disagreeing with the fundamental point, I think it’s definitely possible for a less intelligent species to align or even indefinitely control a boundedly and only slightly more intelligent species, especially given greater resources, speed, and/or numbers, and sufficient effort.
The problem is that humans aren’t currently trying to limit the systems or trying much to monitor, much less robustly align or control them.
Fair point. But AI is indeed unlikely to top out at merely “slighlty more” intelligent. And it has the potential for a massive speed/numbers advantage too.
Will MacAskill stated in a recent 80,000 hours podcast that he believes marginal work on trajectory change toward a best possible future rather than a mediocre future seems likely significantly more valuable than marginal work on extinction risk.
Could you explain what the key crucial considerations are for this claim to be true, and a basic argument for why think each of the crucial considerations resolves in favor of this claim?
Would also love to hear if others have any other crucial considerations they think weigh in one direction or the other.
Will is thinking about this much more actively and will give the best answer, but here are some key crucial considerations:
How tractable is extinction risk reduction and trajectory change work?
As a part of that, are there ways that we can have a predictable and persistent effect on the value of the long-term future other than by reducing extinction risk?
How good is the future by default?
How good are the best attainable futures?
These are basically Tractability and Importance from the INT framework.
Some of the biggest disagreements in the field are over how likely we are to achieve eutopia by default (or what % of eutopia we will achieve) and what, if anything, can be done to predictably shape the far future. Populating and refining a list of answers to this last question has been a lot of the key work of the field over the past few years.
I think Will MacAskill and Finn Morehouse’s paper rests on the crucial consideration that aligning ASI is possible (by anyone at all). They haven’t established this (EDIT: by this I mean they don’t cite to any supporting arguments for this, rather than personally coming up with the arguments themselves. But as far as I know, there aren’t any supporting arguments for the assumption, and in fact there are good arguments on the other side for why aligning ASI is fundamentally impossible).
This seems like a really critical issue, and I’d be very interested in hearing whether this is disputed by @tylermjohn / @William_MacAskill.
I think there is a large minority chance that we will successfully align ASI this century, so I definitely think it is possible.
To clarify, do you think there’s a large minority change that it is possible to align an arbitrarily powerful system, or do you think there is a large minority chance that it is going to happen with the first such arbitrarily powerful system, such that we’re not locked in to a different future / killed by a misaligned singleton?
Why do you think this? What make you think that it’s possible at all?[1] And what do you mean by “large minority”? Can you give an approximate percentage?
Or to paraphrase Yampolskiy: what makes it possible for a less intelligent species to indefinitely control a more intelligent species (when this has never happened before)?
To respond to Yampolskiy without disagreeing with the fundamental point, I think it’s definitely possible for a less intelligent species to align or even indefinitely control a boundedly and only slightly more intelligent species, especially given greater resources, speed, and/or numbers, and sufficient effort.
The problem is that humans aren’t currently trying to limit the systems or trying much to monitor, much less robustly align or control them.
Fair point. But AI is indeed unlikely to top out at merely “slighlty more” intelligent. And it has the potential for a massive speed/numbers advantage too.
Yes, by default self-improving AI goes very poorly, but this is a plausible case where would could have aligned AGI, if not ASI.