I by contrast would have done the exact opposite: Instead of having a clause about how you must care about preventing changes to one’s objective, I’d have a clause about how contra convergent instrumental goals, it’s OK for you to not care about certain kinds of changes to one’s objective.
That makes sense to me.The book Human Compatible has a good phrasing: “Which preference-change processes do you endorse?”
There were two main options I had in mind how changes to the life-goal objectives can come about without constituting a “failure of goal preservation” in an irrationality-implying sense:
An indirectly specified life goal around valuing reflection (I think of this as a widespread type!). Note that people with indirectly specified life goals would express non-total confidence in their best-guess formulation of what they want in life. So this probably isn’t quite what you were talking about.
A life goal with a stable core and “optional/flexible” parts. For instance, imagine a person who’s fanatically utilitarian in their life goals. They could have a stance that says “if I ever fall in love, it’s okay to start caring partly about something other than utilitarianism.”
On the second bullet point, I guess that leads to an increased risk of failure of goal preservation also for the utilitarian part of their goal. So you’re right that this setup would go against convergent drives.
In a reply to Michael below, you point out that this (e.g., something like what I describe in my second bullet point) seems like a “clunky workaround.” I can see what you mean. I think having a life goal that includes a clause like “I’m okay with particular changes to my life goal, but only brought about in common-sense reasonable ways” would still constitute a life goal. You could think of it as a barrier against (too much) fanaticism, perhaps.
Side note: It’s interesting how, in discussions on “value drift” on the EA forum, you can see people at both extremes of the spectrum. Some consider value drift to be typically bad, while others caution that people may have truer values as they get more experienced.
Thanks for the comment!
That makes sense to me.The book Human Compatible has a good phrasing: “Which preference-change processes do you endorse?”
There were two main options I had in mind how changes to the life-goal objectives can come about without constituting a “failure of goal preservation” in an irrationality-implying sense:
An indirectly specified life goal around valuing reflection (I think of this as a widespread type!). Note that people with indirectly specified life goals would express non-total confidence in their best-guess formulation of what they want in life. So this probably isn’t quite what you were talking about.
A life goal with a stable core and “optional/flexible” parts. For instance, imagine a person who’s fanatically utilitarian in their life goals. They could have a stance that says “if I ever fall in love, it’s okay to start caring partly about something other than utilitarianism.”
On the second bullet point, I guess that leads to an increased risk of failure of goal preservation also for the utilitarian part of their goal. So you’re right that this setup would go against convergent drives.
In a reply to Michael below, you point out that this (e.g., something like what I describe in my second bullet point) seems like a “clunky workaround.” I can see what you mean. I think having a life goal that includes a clause like “I’m okay with particular changes to my life goal, but only brought about in common-sense reasonable ways” would still constitute a life goal. You could think of it as a barrier against (too much) fanaticism, perhaps.
Side note: It’s interesting how, in discussions on “value drift” on the EA forum, you can see people at both extremes of the spectrum. Some consider value drift to be typically bad, while others caution that people may have truer values as they get more experienced.