Could it be more important to improve human values than to make sure AI is aligned?
Consider the following (which is almost definitely oversimplified):
ALIGNED AI
MISALIGNED AI
HUMANITY GOOD VALUES
UTOPIA
EXTINCTION
HUMANITY NEUTRAL VALUES
NEUTRAL WORLD
EXTINCTION
HUMANITY BAD VALUES
DYSTOPIA
EXTINCTION
For clarity, let’s assume dystopia is worse than extinction. This could be a scenario where factory farming expands to an incredibly large scale with the aid of AI, or a bad AI-powered regime takes over the world. Let’s assume neutral world is equivalent to extinction.
The above shows that aligning AI can be good, bad, or neutral. The value of alignment exactly depends on humanity’s values. Improving humanity’s values however is always good.
The only clear case where aligning AI beats improving humanity’s values is if there isn’t scope to improve our values further. An ambiguous case is whenever humanity has positive values in which case both improving values and aligning AI are good options and it isn’t immediately clear to me which wins.
The key takeaway here is that improving values is robustly good whereas aligning AI isn’t—alignment is bad if we have negative values. I would guess that we currently have pretty bad values given how we treat non-human animals and alignment is therefore arguably undesirable. In this simple model, improving values would become the overwhelmingly important mission. Or perhaps ensuring that powerful AI doesn’t end up in the hands of bad actors becomes overwhelmingly important (again, rather than alignment).
This analysis doesn’t consider the moral value of AI itself. It also assumed that misaligned AI necessarily leads to extinction which may not be accurate (perhaps it can also lead to dystopian outcomes?).
I doubt this is a novel argument, but what do y’all think?
If you don’t think misalignment automatically equals extinction, then the argument doesn’t work. The neutral world is now competing with “neutral world where the software fucks up and kills people sometimes”, which seems to be worse.
That is fair. I still think the idea that aligned superintelligent AI in the wrong hands can be very bad may be under-appreciated. The implication is that something like moral circle expansion seems very important at the moment to help mitigate these risks. And of course work to ensure that countries with better values win the race to powerful AI.
I think a neutral world is much better than extinction, and most dystopias are also preferable to human extinction. The latter is debatable but the former seems clear? What do you imagine by a neutral world?
Well I’m assigning extinction a value of zero and a neutral world is any world that has some individuals but also has a value of zero. For example it could be a world where half of the people live bad (negative) lives and the other half live equivalently good (positive) lives. So the sum total of wellbeing adds up to zero.
A dystopia is one which is significantly negative overall. For example a world in which there are trillions of factory farmed animals that live very bad lives. A world with no individuals is a world without all this suffering.
Could it be more important to improve human values than to make sure AI is aligned?
Consider the following (which is almost definitely oversimplified):
ALIGNED AI
MISALIGNED AI
HUMANITY GOOD VALUES
UTOPIA
EXTINCTION
HUMANITY NEUTRAL VALUES
NEUTRAL WORLD
EXTINCTION
HUMANITY BAD VALUES
DYSTOPIA
EXTINCTION
For clarity, let’s assume dystopia is worse than extinction. This could be a scenario where factory farming expands to an incredibly large scale with the aid of AI, or a bad AI-powered regime takes over the world. Let’s assume neutral world is equivalent to extinction.
The above shows that aligning AI can be good, bad, or neutral. The value of alignment exactly depends on humanity’s values. Improving humanity’s values however is always good.
The only clear case where aligning AI beats improving humanity’s values is if there isn’t scope to improve our values further. An ambiguous case is whenever humanity has positive values in which case both improving values and aligning AI are good options and it isn’t immediately clear to me which wins.
The key takeaway here is that improving values is robustly good whereas aligning AI isn’t—alignment is bad if we have negative values. I would guess that we currently have pretty bad values given how we treat non-human animals and alignment is therefore arguably undesirable. In this simple model, improving values would become the overwhelmingly important mission. Or perhaps ensuring that powerful AI doesn’t end up in the hands of bad actors becomes overwhelmingly important (again, rather than alignment).
This analysis doesn’t consider the moral value of AI itself. It also assumed that misaligned AI necessarily leads to extinction which may not be accurate (perhaps it can also lead to dystopian outcomes?).
I doubt this is a novel argument, but what do y’all think?
If you don’t think misalignment automatically equals extinction, then the argument doesn’t work. The neutral world is now competing with “neutral world where the software fucks up and kills people sometimes”, which seems to be worse.
That is fair. I still think the idea that aligned superintelligent AI in the wrong hands can be very bad may be under-appreciated. The implication is that something like moral circle expansion seems very important at the moment to help mitigate these risks. And of course work to ensure that countries with better values win the race to powerful AI.
I think a neutral world is much better than extinction, and most dystopias are also preferable to human extinction. The latter is debatable but the former seems clear? What do you imagine by a neutral world?
Well I’m assigning extinction a value of zero and a neutral world is any world that has some individuals but also has a value of zero. For example it could be a world where half of the people live bad (negative) lives and the other half live equivalently good (positive) lives. So the sum total of wellbeing adds up to zero.
A dystopia is one which is significantly negative overall. For example a world in which there are trillions of factory farmed animals that live very bad lives. A world with no individuals is a world without all this suffering.