OK, let’s say a foreign superpower develops what you’re calling ‘weakly aligned AI’, and they do ‘muster the necessary power to force the world into a configuration where [X risk] is lowered’… by, for example, developing a decisive military and economic advantage over other countries, imposing their ideology on everybody, and thereby reducing the risk of great-power conflict.
I still don’t understand how we could call such an AI ‘aligned with humanity’ in any broad sense; it would simply be aligned with its host government and their interests, and somewhat anti-aligned with everybody else.
Maybe I’ve studied and taught game theory for too many decades, and I’m just too attuned to conflicts of interest and mixed-motive games. But I get the very uneasy feeling that the AI alignment community is sweeping some very big problems under the rug here.
OK, let’s say a foreign superpower develops what you’re calling ‘weakly aligned AI’, and they do ‘muster the necessary power to force the world into a configuration where [X risk] is lowered’… by, for example, developing a decisive military and economic advantage over other countries, imposing their ideology on everybody, and thereby reducing the risk of great-power conflict.
I still don’t understand how we could call such an AI ‘aligned with humanity’ in any broad sense; it would simply be aligned with its host government and their interests, and somewhat anti-aligned with everybody else.
Maybe I’ve studied and taught game theory for too many decades, and I’m just too attuned to conflicts of interest and mixed-motive games. But I get the very uneasy feeling that the AI alignment community is sweeping some very big problems under the rug here.