Thanks for publishing this and your research! Few discussion points:
1. > We suggest trying to achieve safety through evolution, rather than only trying to arrive at safety through intelligent design.
But how to evolve unaligned AGI into the one that is deployed in the real world and aligned? It seems unlikely that we can align such a system without the real world environment. And once it is in the real world, it is likely result in goal and/or capabilities mis-generalizations. As an example, how can we be sure that once a CEO system deployed, it won’t disempower stakeholders with the aim it is not shut down and continues optimize for its goals. I mean we can’t emulate everything that in the future of such a company run by AI CEO.
2. > Counteracting forces
So we then have this another system of several AIs that watch each other, offence and defence is balanced. But that is yet another AI system to align, right? Hence, all arguments hold for this system. How do we align it with human values? How do we make sure it indeed pursues goals that were given to it at the time of distribution shift? Not to forget, this is only one bet we make, only one try. Real-world example is government and business, both created by human societies, yet we see cases where they become misaligned.
3. > Avoid capabilities externalities
(Repeated) It is unclear how can we apply it in our current competitive environment (Google vs Facebook, China vs USA). What concretely should be the incentives or policies to adopt a safety culture? And who enforces them? If one adopts it, another will get a competitive advantage as they will spend more on capabilities and then ‘kill you’ (Yudkowsky, AGI Ruin).
4. > Pursue tail impacts,
How does it work with avoiding capabilities externalities? If we make less capable systems, then it will have less impact, right? Won’t some another reckless AI team driving the research to the edge gather all fruits?
5. > For example, it is well-known that moral virtues are distinct from intellectual virtues. An agent that is knowledgeable, inquisitive, quick-witted, and rigorous is not necessarily honest, just, power-averse, or kind.
Why is that true? I mean, I agree it is well-known that, for humans, moral virtues don’t come with intellectual ones. But is it necessary always true for all agents?
Thanks for publishing this and your research! Few discussion points:
1.
> We suggest trying to achieve safety through evolution, rather than only trying to arrive at safety through intelligent design.
But how to evolve unaligned AGI into the one that is deployed in the real world and aligned? It seems unlikely that we can align such a system without the real world environment. And once it is in the real world, it is likely result in goal and/or capabilities mis-generalizations. As an example, how can we be sure that once a CEO system deployed, it won’t disempower stakeholders with the aim it is not shut down and continues optimize for its goals. I mean we can’t emulate everything that in the future of such a company run by AI CEO.
2.
> Counteracting forces
So we then have this another system of several AIs that watch each other, offence and defence is balanced. But that is yet another AI system to align, right? Hence, all arguments hold for this system. How do we align it with human values? How do we make sure it indeed pursues goals that were given to it at the time of distribution shift? Not to forget, this is only one bet we make, only one try. Real-world example is government and business, both created by human societies, yet we see cases where they become misaligned.
3.
> Avoid capabilities externalities
(Repeated)
It is unclear how can we apply it in our current competitive environment (Google vs Facebook, China vs USA). What concretely should be the incentives or policies to adopt a safety culture? And who enforces them? If one adopts it, another will get a competitive advantage as they will spend more on capabilities and then ‘kill you’ (Yudkowsky, AGI Ruin).
4.
> Pursue tail impacts,
How does it work with avoiding capabilities externalities? If we make less capable systems, then it will have less impact, right? Won’t some another reckless AI team driving the research to the edge gather all fruits?
5.
> For example, it is well-known that moral virtues are distinct from intellectual virtues. An agent that is knowledgeable, inquisitive, quick-witted, and rigorous is not necessarily honest, just, power-averse, or kind.
Why is that true? I mean, I agree it is well-known that, for humans, moral virtues don’t come with intellectual ones. But is it necessary always true for all agents?