I think the idea is that it will only change its values in a particular direction if that helped it realise its current values. So it wonât changes its values if doing so would mean that it would do horrible things according to its current values. A philosophical thing lurking in the background is that you canât work out the correct values just by good thinking, rather basic starting values are thinking-independent, as long as your consistent: no amount of intelligence and reasoning will make you arrive at the correct ones. (They call this the âorthagonality thesisâ, but a similar idea is known in academic philosophy as Humeanism about moral motivation. Itâs quite mainstream but not without its critics).
I think the idea is that it will only change its values in a particular direction if that helped it realise its current values. So it wonât changes its values if doing so would mean that it would do horrible things according to its current values. A philosophical thing lurking in the background is that you canât work out the correct values just by good thinking, rather basic starting values are thinking-independent, as long as your consistent: no amount of intelligence and reasoning will make you arrive at the correct ones. (They call this the âorthagonality thesisâ, but a similar idea is known in academic philosophy as Humeanism about moral motivation. Itâs quite mainstream but not without its critics).