Caspar Oesterheld comments on AI alignment shouldn’t be conflated with AI moral achievement

Caspar Oesterheld Mar 2, 2024, 8:15 PM
3 points
0 ∶ 0
Nice post! I generally agree and I believe this is important.
I have one question about this. I’ll distinguish between two different empirical claims. My sense is that you argue for one of them and I’d be curious whether you’d also agree with the other. Intuitively, it seems like there are lots of different but related alignment problems: “how can we make AI that does what Alice wants it to do?”, “how can we make AI that does what the US wants it to do?”, “how can we make AI follow some set of moral norms?”, “how can we make AI build stuff in factories for us, without it wanting to escape and take over the world?”, “how can we make AI that helps us morally reflect (without manipulating us in ways we don’t want)?”, “how can we make a consequentialist AI that doesn’t do any of the crazy things that consequentialism implies in theory?”. You (and I and everyone else in this corner of the Internet) would like the future to solve the more EA-relevant alignment questions and implement the solutions, e.g., help society morally reflect, reduce suffering, etc. Now here are two claims about how the future might fail to do this:
1. Even if all alignment-style problems were solved, then humans would not implement the solutions to the AI-y alignment questions. E.g., if there was the big alignment library that just contains the answer to all these alignment problems, then individuals would grab “from pauper to quadrillionaire and beyond with ChatGPT-n”, not “how to do the most you can do better with ChatGPT-n”, and so on. (And additionally one has to hold that people’s preferences for the not-so-ethical books/AIs will not just go away in the distant future. And I suppose for any of this to be relevant, you’d also need to believe that you have some sort of long-term influence on which books people get from the library.)
2. Modern-day research under the “alignment” (or “safety”) umbrella is mostly aimed at solving the not-so-EA-y alignment questions, and does not put much effort toward the more specifically-EA-relevant questions. In terms of the alignment library analogy, there’ll be lots of books in the aisle on how to get your AI to build widgets without taking over the world, and not so many books in the aisle on how to use AI to do moral reflection and the like. (And again one has to hold that this has some kind of long-term effect, despite the fact that all of these problems can probably be solved _eventually_. E.g., you might think that for the future to go in a good direction, we need AI to help with moral reflection immediately once we get to human-level AI, because of some kinds of lock-in.)
My sense is that you argue mostly for 1. Do you also worry about 2? (I worry about both, but I mostly think about 2, because 1 seems much less tractable, especially for me as a technical person.)