I am a bit confused by the key question / claim. It seems to be some variant of “Powerful AI may allow the development of technology which could be used to destroy the world. While the AI Alignment problem is about getting advanced AIs to do what their human operator wants, this could still lead to an existential catastrophe if we live in such a vulnerable world where unilateral actors can deploy destructive technology. Thus actual safety looks like not just having Aligned AGI, but also ensuring that the world doesn’t get destroyed by bad or careless or unilateral actors”
If this is the claim, seems about right, and has been discussed a lot both online and offline. Powerful AI itself might be that destructive technology, hence discussion of Deployment and Coordination problems. See here. Some other relevant resources: here, here.
As you asked for feedback,
If I am not mistaken, “AI Alignment” seems to mean getting AI to do what we want without harmful side effects, but “AI Safety” seems to imply keeping AI from harming or destroying humanity.
I would say the distinction isn’t so clear and the semantics don’t seem too important; what matters is that those in the field of AI Alignment and AI Alignment broadly are aimed at getting good outcomes for humanity.
I guess your claim might actually be “Powerful AI may be a precipitating factor for other risks as it allows the development of many other, potentially unsafe, technologies.” This seems technically true but is unlikely to be how the world goes. Mainly I expect one of two outcomes:
humanity is disempowered or dead from misaligned AI;
we successfully align AGIs and solve the deployment problem which results in a world which where no single actor can cause an existential catastrophe.
The reason I think that no single actor can cause existential catastrophe in 2 is that this seems to be a likely precursor to avoiding dying to misaligned AGI. I would recommend the above links for understanding this intuition. I may be wrong here because it may be that the way we avoid misaligned AGI is by democratizing aligned-AGI-creation tech (all the open source libraries include Alignment properties including preventing misuse); but maybe the filters for preventing misuse are not sufficient for stopping people from developing civilization-destroying tech in the future (but probably given such a filter we would already be dead from misaligned AGI that somebody made by stress-testing the filters).
Thank you so much for this reply! I’m glad to know there is already some work on this, makes my job a lot easier. I will definitely look into the articles you mentioned and perhaps just study AI risk / AI safety a lot more in general to get a better understanding of how people think about this. It sounds like what people call “deployment” may be very relevant, so well especially look into this.
I am a bit confused by the key question / claim. It seems to be some variant of “Powerful AI may allow the development of technology which could be used to destroy the world. While the AI Alignment problem is about getting advanced AIs to do what their human operator wants, this could still lead to an existential catastrophe if we live in such a vulnerable world where unilateral actors can deploy destructive technology. Thus actual safety looks like not just having Aligned AGI, but also ensuring that the world doesn’t get destroyed by bad or careless or unilateral actors”
If this is the claim, seems about right, and has been discussed a lot both online and offline. Powerful AI itself might be that destructive technology, hence discussion of Deployment and Coordination problems. See here. Some other relevant resources: here, here.
As you asked for feedback,
I would say the distinction isn’t so clear and the semantics don’t seem too important; what matters is that those in the field of AI Alignment and AI Alignment broadly are aimed at getting good outcomes for humanity.
I guess your claim might actually be “Powerful AI may be a precipitating factor for other risks as it allows the development of many other, potentially unsafe, technologies.” This seems technically true but is unlikely to be how the world goes. Mainly I expect one of two outcomes:
humanity is disempowered or dead from misaligned AI;
we successfully align AGIs and solve the deployment problem which results in a world which where no single actor can cause an existential catastrophe.
The reason I think that no single actor can cause existential catastrophe in 2 is that this seems to be a likely precursor to avoiding dying to misaligned AGI. I would recommend the above links for understanding this intuition. I may be wrong here because it may be that the way we avoid misaligned AGI is by democratizing aligned-AGI-creation tech (all the open source libraries include Alignment properties including preventing misuse); but maybe the filters for preventing misuse are not sufficient for stopping people from developing civilization-destroying tech in the future (but probably given such a filter we would already be dead from misaligned AGI that somebody made by stress-testing the filters).
Sorry for scattered thoughts
Thank you so much for this reply! I’m glad to know there is already some work on this, makes my job a lot easier. I will definitely look into the articles you mentioned and perhaps just study AI risk / AI safety a lot more in general to get a better understanding of how people think about this. It sounds like what people call “deployment” may be very relevant, so well especially look into this.