That said, fwiw, since I’m recommending Holden’s doc, I should also flag that I think the breakdown of possible outcomes that Holden sketches there isn’t a good one, because:
He defines utopia, dystopia, and “middling worlds” solely by how good they are, whereas “paperclipping” is awkwardly squeezed in with a definition based on how it comes about (namely, that it’s a world run by misaligned AI). This leads two two issues in my view:
I think the classic paperclipping scenario would itself be a “middling” world, yet Holden frames “paperclipping” as a distinct concept from “middling” worlds.
Misaligned AI actually need not lead to something approx. as good/bad as paperclipping; it could instead lead to dystopia, or could maybe lead to utopia, depending on how we define “alignment” and depending on metaethics.
There’s no explicit mention of extinction.
I think Holden is seeing “paperclipping” as synonymous with extinction?
But misaligned AI need not lead to extinction.
And extinction is “middling” relative to utopia and dystopia.
And extinction is also very different from some other “middling” worlds according to many ethical theories (though probably not total utilitarianism).
The author or readers might also find the following interesting:
Flourishing futures (a list of resources on that topic)
Holden Karnofsky’s call for people to think about “How should we value various possible long-run outcomes relative to each other?” and his notes on why and how to do so[1]
That said, fwiw, since I’m recommending Holden’s doc, I should also flag that I think the breakdown of possible outcomes that Holden sketches there isn’t a good one, because:
He defines utopia, dystopia, and “middling worlds” solely by how good they are, whereas “paperclipping” is awkwardly squeezed in with a definition based on how it comes about (namely, that it’s a world run by misaligned AI). This leads two two issues in my view:
I think the classic paperclipping scenario would itself be a “middling” world, yet Holden frames “paperclipping” as a distinct concept from “middling” worlds.
Misaligned AI actually need not lead to something approx. as good/bad as paperclipping; it could instead lead to dystopia, or could maybe lead to utopia, depending on how we define “alignment” and depending on metaethics.
There’s no explicit mention of extinction.
I think Holden is seeing “paperclipping” as synonymous with extinction?
But misaligned AI need not lead to extinction.
And extinction is “middling” relative to utopia and dystopia.
And extinction is also very different from some other “middling” worlds according to many ethical theories (though probably not total utilitarianism).