Sustaining symbiosis between a misaligned AGI and humans would seem extremely hard. If superintelligent and capable of manufacturing or manipulation, the AGI will eventually come up with better and better ways to accomplish it’s goals without humans. Temporarily avoiding catastrophic misalignment doesn’t seem sufficient to entrench symbiosis with a misaligned system in that case. I am generally pro-symbiosis as a political goal, but am not optimistic about it long-term for AGI without a lot more detail on strategy.
Also I don’t think intentionally deploying not fully aligned AGI really mitigates incentives to conceal behavior. You can know something is not aligned, but not know how much. You also can under or overestimate a system’s capabilities. There will still be instrumental power seeking incentives. It’s basically the same issues that apply with trusting humans, but more power. I don’t think it matters if there is a minor degree to which the incentives are less intense, when capabilities can be disproportionate and there aren’t counterbalancing incentives.
Overall, given the focus on the potential for catastrophic narrow alignment, we need something like broad intent alignment, which may be the same thing you are aiming for with symbiosis. I like some of the analogies to Academia and Academic freedom here, however academics are still human (and thus partially aligned) and I’m not sure people have a good grasp of what norms are and aren’t working well.
Sustaining symbiosis between a misaligned AGI and humans would seem extremely hard. If superintelligent and capable of manufacturing or manipulation, the AGI will eventually come up with better and better ways to accomplish it’s goals without humans. Temporarily avoiding catastrophic misalignment doesn’t seem sufficient to entrench symbiosis with a misaligned system in that case. I am generally pro-symbiosis as a political goal, but am not optimistic about it long-term for AGI without a lot more detail on strategy.
Also I don’t think intentionally deploying not fully aligned AGI really mitigates incentives to conceal behavior. You can know something is not aligned, but not know how much. You also can under or overestimate a system’s capabilities. There will still be instrumental power seeking incentives. It’s basically the same issues that apply with trusting humans, but more power. I don’t think it matters if there is a minor degree to which the incentives are less intense, when capabilities can be disproportionate and there aren’t counterbalancing incentives.
Overall, given the focus on the potential for catastrophic narrow alignment, we need something like broad intent alignment, which may be the same thing you are aiming for with symbiosis. I like some of the analogies to Academia and Academic freedom here, however academics are still human (and thus partially aligned) and I’m not sure people have a good grasp of what norms are and aren’t working well.
Thanks for these remarks! I am actually very sympathetic to them.