I think in the long-run I’d be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I’m pretty clueless and think our current values are likely to be wrong, and I’d rather we had more time to improve them.
I haven’t thought enough about the relationship between power concentration and corrigibility though—I expect that could change my mind.
Oh yes but I made the above comment more to represent the view that I’ve seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/good/post-long-reflection values in AGI/ASI down the line.
Yep fair, that’s what I mean by “power concentration and corrigibility”. AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).
I think in the long-run I’d be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I’m pretty clueless and think our current values are likely to be wrong, and I’d rather we had more time to improve them.
I haven’t thought enough about the relationship between power concentration and corrigibility though—I expect that could change my mind.
Oh yes but I made the above comment more to represent the view that I’ve seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..
This makes sense. I would worry about the purely corrigible AGI being used by actors in such a way that we never get to instil the correct/good/post-long-reflection values in AGI/ASI down the line.
Yep fair, that’s what I mean by “power concentration and corrigibility”. AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs).