And not consistently represent the preferences of malevolent, parasitic or short-term human actors who want to misuse/co-opt the system through any attack vectors they can find.
And deal with that the preferences of a lot of the possible future humans and of non-human living beings will not get automatically represented in a system that AI corporations by default have built to represent current living humans only (preferably, those who pay).
A humble response to layers on layers of fundamental limits on the possibility of aligning AGI, even in principle, is to ask how we got so stuck on this project in the first place.
Glad to read your thoughts, Ben.
You’re right about this:
Even if long-term AGI safety was possible, then you still have to deal with limits on modelling and consistently acting on preferences expressed by humans from their (perceived) context. https://twitter.com/RemmeltE/status/1620762170819764229
And not consistently represent the preferences of malevolent, parasitic or short-term human actors who want to misuse/co-opt the system through any attack vectors they can find.
And deal with that the preferences of a lot of the possible future humans and of non-human living beings will not get automatically represented in a system that AI corporations by default have built to represent current living humans only (preferably, those who pay).
A humble response to layers on layers of fundamental limits on the possibility of aligning AGI, even in principle, is to ask how we got so stuck on this project in the first place.