Ongoing project on moral AI

For lack of a better name =)

The idea is to use current AI technologies, like language models, to get an impartial AI that understands ethics as humans do, possibly even better.

You heard me right: in the same way as an AI can be smarter than a human, we should also accept the fact that we are not morally perfect creatures, and that it’s possible to create an AI which is better than us at, for example, spotting injustice. See Free agents for more details.

In case you are familiar with philosophical language, my objective is a philosopher AI that figures out epistemology and ethics on its own, and then communicates its beliefs.

In case you look at reality through the lenses of AI alignment, I am saying that going for ‘safe’ or ‘aligned’ is kind of lame, and that aiming for ‘moral’ is better. Instead of trying to limit the side effects of, or fix, agents which are morally clueless, I’d like to see more people working on agents which perceive and interpret the world from a human-like point of view.

If you are looking for a place to start, I suggest that you have a look at Free agents and decide where to go from there. Although it touches on some technical subjects, I tried to write that post for a relatively broad audience.

This sequence is just a collection of posts about the same topic, in chronological order. The next post will probably be somewhat mathematical in nature; later on, I expect that posts will become more algorithmic and, finally, about practical experiments run on hardware.

You can find this sequence also on the AI Alignment Forum.

Nat­u­ral­ism and AI alignment

From lan­guage to ethics by au­to­mated reasoning

Crit­i­cism of the main frame­work in AI alignment

On value in hu­mans, other an­i­mals, and AI

Free agents

Agents that act for rea­sons: a thought experiment