I hadn’t considered the narrative you bring up here when I wrote the post, that is interesting. As you write, it relies on the assumption that
once someone ‘wins’ the AGI/ASI race, they will be able to use that AI to control or prevent the development of other potentially dangerous AIs
Here we are entering the realm of forecasting stuff about world politics — stuff I am definitely not an expert on. As far as I know, the probability of that scenario could be extremely low. I can also think of alternative scenarios that don’t seem obviously absurd, so I doubt that the probability is extremely high, but it’s hard for me to say much more than that. Anyway, as you said, AI moral reasoning might be valuable in that scenario as well.
but I’m not convinced that it’s valuable in order to prevent malevolent actors using AI.
That’s a bit too much, I don’t think I claimed that moral reasoning in AI can directly prevent that. It seems that in order to prevent malevolent actors from using AI for bad purposes we would have to either stop AI research completely, because it is not only alignment research that works on the control problem but also standard AI research; or ensure that bad actors never get access to powerful and controllable AI, which also seems hard to do and not something AI moral reasoning can help with.
The weaker claim I made in the post is that research on moral reasoning in AI is less likely to help malevolent actors use AI for bad purposes (and/or help them to a lesser degree) wrt research that aims to make AI controllable.
Regarding your last point: I see. I thought this was an argument for “alignment via moral reasoning as an addition to alignment via control”, not “alignment via moral reasoning instead of alignment via control.” So you would hope that alignment via moral reasoning would displace or replace alignment via control.
In that case, your argument is plausible but… quite hopeful? I’m sure many people will pursue control methods regardless. I suppose you might argue that, if enough people buy your argument, then research on AI that is merely controlled will advance more slowly, and research on AI that does its own moral reasoning, and is therefore harder to misuse, would advance faster or at least in parallel. Then I would accept that this might reduce the chance of malevolent misuse, but that’s quite a hopeful scenario! In less hopeful scenarios, I am unsure if people concerned with malevolent misuse ought to pursue this kind of work, or if they wouldn’t be better off simply advocating for a pause/slow down.
In short, I am not hoping for a specific outcome, and I can’t take into account every single scenario. If someone starts giving more credit to research on moral reasoning in AI after reading this, that’s already enough, considering that the topic doesn’t seem to be popular within AI alignment, and it was even more niche at the time I wrote this post.
Sure! And like I said, I do think this is valuable: it just seems more obviously valuable as a way to ensure the best outcomes (aligned AI), rather than as a means to avoid the worst outcomes.
I hadn’t considered the narrative you bring up here when I wrote the post, that is interesting. As you write, it relies on the assumption that
Here we are entering the realm of forecasting stuff about world politics — stuff I am definitely not an expert on. As far as I know, the probability of that scenario could be extremely low. I can also think of alternative scenarios that don’t seem obviously absurd, so I doubt that the probability is extremely high, but it’s hard for me to say much more than that. Anyway, as you said, AI moral reasoning might be valuable in that scenario as well.
That’s a bit too much, I don’t think I claimed that moral reasoning in AI can directly prevent that. It seems that in order to prevent malevolent actors from using AI for bad purposes we would have to either stop AI research completely, because it is not only alignment research that works on the control problem but also standard AI research; or ensure that bad actors never get access to powerful and controllable AI, which also seems hard to do and not something AI moral reasoning can help with.
The weaker claim I made in the post is that research on moral reasoning in AI is less likely to help malevolent actors use AI for bad purposes (and/or help them to a lesser degree) wrt research that aims to make AI controllable.
Regarding your last point: I see. I thought this was an argument for “alignment via moral reasoning as an addition to alignment via control”, not “alignment via moral reasoning instead of alignment via control.” So you would hope that alignment via moral reasoning would displace or replace alignment via control.
In that case, your argument is plausible but… quite hopeful? I’m sure many people will pursue control methods regardless. I suppose you might argue that, if enough people buy your argument, then research on AI that is merely controlled will advance more slowly, and research on AI that does its own moral reasoning, and is therefore harder to misuse, would advance faster or at least in parallel. Then I would accept that this might reduce the chance of malevolent misuse, but that’s quite a hopeful scenario! In less hopeful scenarios, I am unsure if people concerned with malevolent misuse ought to pursue this kind of work, or if they wouldn’t be better off simply advocating for a pause/slow down.
In short, I am not hoping for a specific outcome, and I can’t take into account every single scenario. If someone starts giving more credit to research on moral reasoning in AI after reading this, that’s already enough, considering that the topic doesn’t seem to be popular within AI alignment, and it was even more niche at the time I wrote this post.
Sure! And like I said, I do think this is valuable: it just seems more obviously valuable as a way to ensure the best outcomes (aligned AI), rather than as a means to avoid the worst outcomes.