I’m skeptical of your analysis of scenario 3, as I generally buy the orthogonality thesis, leading me to believe that it’s possible to be both wise and evil.
At the same time, emergent misalignment seems to suggest that it might be reasonable to expect that an AI that has been nudged to become wise will also be nudged somewhat towards being moral.
Interesting! I think I didn’t fully distinguish between two possibilities:
AW just has an understanding of wisdom
AW whose values are aligned to wisdom, or at least aligned to pursuing and acting on wisdom
I think both types of AW are worth pursuing, but the second may be even more valuable, and I think this is the type I had in mind at least in scenario 3.
I’m skeptical of your analysis of scenario 3, as I generally buy the orthogonality thesis, leading me to believe that it’s possible to be both wise and evil.
At the same time, emergent misalignment seems to suggest that it might be reasonable to expect that an AI that has been nudged to become wise will also be nudged somewhat towards being moral.
Interesting! I think I didn’t fully distinguish between two possibilities:
AW just has an understanding of wisdom
AW whose values are aligned to wisdom, or at least aligned to pursuing and acting on wisdom
I think both types of AW are worth pursuing, but the second may be even more valuable, and I think this is the type I had in mind at least in scenario 3.