I’m practically new to AI safety, so reading this post was a pretty intense crash course!
What I’m wondering though, even if we suppose that we can solve all the technical problems to create a completely beneficial, Gaia mother-like AGI which is both super-intelligent and genuinely really wants the best for humanity and the rest of earthlings (or even the whole universe), how can humans themselves even align on:
1. What should be the goals and priorities given limited resources and
2. What should be the reasonable contours of the solution space which isn’t going to cause some harm, or since no harm is impossible, what would be acceptable harms for certain gains?
In other words, to my naïve understading it seems like the philosophical questions of what is “good” and what should an AGI even align to is the hardest bit?
I mean, obviously not obliterating life on Earth is a reasonable baseline but feels a bit low ambition? Or maybe this is just a completely different discussion?
Welcome to the field! Wow, I can imagine this post would be an intense crash course! :-o
There are some people who spend time on these questions. It’s not something I’ve spent a ton of time on, but I think you’ll find interesting posts related to this on LessWrong and AI Alignment Forum, e.g. using the value learning tag. Posts discussing ‘ambitious value learning’ and ‘Coherent Extrapolated Volition’ should be pretty directly related to your two questions.
I’m practically new to AI safety, so reading this post was a pretty intense crash course!
What I’m wondering though, even if we suppose that we can solve all the technical problems to create a completely beneficial, Gaia mother-like AGI which is both super-intelligent and genuinely really wants the best for humanity and the rest of earthlings (or even the whole universe), how can humans themselves even align on:
1. What should be the goals and priorities given limited resources and
2. What should be the reasonable contours of the solution space which isn’t going to cause some harm, or since no harm is impossible, what would be acceptable harms for certain gains?
In other words, to my naïve understading it seems like the philosophical questions of what is “good” and what should an AGI even align to is the hardest bit?
I mean, obviously not obliterating life on Earth is a reasonable baseline but feels a bit low ambition? Or maybe this is just a completely different discussion?
Welcome to the field! Wow, I can imagine this post would be an intense crash course! :-o
There are some people who spend time on these questions. It’s not something I’ve spent a ton of time on, but I think you’ll find interesting posts related to this on LessWrong and AI Alignment Forum, e.g. using the value learning tag. Posts discussing ‘ambitious value learning’ and ‘Coherent Extrapolated Volition’ should be pretty directly related to your two questions.
Thanks a lot, really appreciate these pointers!