I’m enjoying this sequence, thanks for writing it.
I imagine you’re well aware of what I write below – I write it to maybe help some readers place this post within some wider context.
My model of space-faring civilizations’ values, which I’m sure isn’t original to me, goes something like the following:
Start with a uniform prior over all possible values, and with the reasonable assumption that any agent or civilization in the universe, whether biological or artificial, originated from biological life arising on some planet.
Natural selection. All biological life probably goes through a Darwinian selection process. This process predictably favors values that are correlated with genetic fitness.
Cultural evolution, including moral progress. Most sufficiently intelligent life (e.g., humans) probably organizes itself into a civilization, with culture and cultural evolution. It seems harder to predict which values cultural evolution might tend to favor, though.
A potentially important curveball: the transition from biological to artificial intelligence.
AI alignment appears to be difficult. This probably undoes some of the value selection effects I describe above, because some fraction of space-faring agents/civilizations is presumably AI with values not aligned with those of their biological creators, and I expect the distribution of misaligned AI values, relative to the distribution of values that survive the above selections, to be closer to uniform over all values (i.e., the prior we started with).
Exactly how hard alignment is (i.e., what fraction of biological civilizations that build superintelligent AI are disempowered?), as well as some other considerations (e.g., are alignment failures generally near misses or big misses?; if alignment is effectively impossible, then what fraction of civilizations are cognizant enough to not build superintelligence?), likely factor into how this curveball plays out.
I’m enjoying this sequence, thanks for writing it.
I imagine you’re well aware of what I write below – I write it to maybe help some readers place this post within some wider context.
My model of space-faring civilizations’ values, which I’m sure isn’t original to me, goes something like the following:
Start with a uniform prior over all possible values, and with the reasonable assumption that any agent or civilization in the universe, whether biological or artificial, originated from biological life arising on some planet.
Natural selection. All biological life probably goes through a Darwinian selection process. This process predictably favors values that are correlated with genetic fitness.
Cultural evolution, including moral progress. Most sufficiently intelligent life (e.g., humans) probably organizes itself into a civilization, with culture and cultural evolution. It seems harder to predict which values cultural evolution might tend to favor, though.
Great filters.[1] Notably,
Self-destruction. Values that increase the likelihood of self-destruction (e.g., via nuclear brinkmanship-gone-wrong) are disfavored.
Desire to colonize space, aka grabbiness. As this post discusses, values that correlate with grabbiness are favored.
(For more, see Oesterheld (n.d.).)
A potentially important curveball: the transition from biological to artificial intelligence.
AI alignment appears to be difficult. This probably undoes some of the value selection effects I describe above, because some fraction of space-faring agents/civilizations is presumably AI with values not aligned with those of their biological creators, and I expect the distribution of misaligned AI values, relative to the distribution of values that survive the above selections, to be closer to uniform over all values (i.e., the prior we started with).
Exactly how hard alignment is (i.e., what fraction of biological civilizations that build superintelligent AI are disempowered?), as well as some other considerations (e.g., are alignment failures generally near misses or big misses?; if alignment is effectively impossible, then what fraction of civilizations are cognizant enough to not build superintelligence?), likely factor into how this curveball plays out.
Technically, I mean late-stage steps within the great filter hypothesis (Wikipedia, n.d.; LessWrong, n.d.).
Thanks a lot for this comment! I linked to it in a footnote. I really like this breakdown of different types of relevant evolutionary dynamics. :)
:)