I’m skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for “moral circle expansion” over time, and I’m not optimistic that process will continue indefinitely unless intervened on.
Thanks for that story!
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for “moral circle expansion” over time, and I’m not optimistic that process will continue indefinitely unless intervened on.
Anyway, these are important questions!