Thanks! This does clarify things for me, and I think that the definition of a “goal” is very helpful here. I do still have some uncertainty here about the claim of process orthogonality which I can better understand:
Let’s define an “instrumental goal” as a goal X for which there is a goal Y such that whenever it is useful to think of the agent as “trying to do X” it is in fact also useful to think of it as “trying to do Y”; In this case we can think that X is instrumental to Y. Instrumental goals can be generated at the development phase or by the agent itself (implicitly or explicitly).
I think that the (non-process) orthogonality thesis does not hold with respect to instrumental goals. A better selection of instrumental goals will enable better capabilities, and with greater capabilities comes greater planning capacity.
Therefore, the process orthogonality thesis does not hold as well for instrumental goals. This means that instrumental goals are usually not the goals of interest when trying to discern between process and non-process orthogonality theses, and we should focus on terminal goal (those which aren’t instrumental).
In the case of an RL agent or Deep Blue, I can only see one terminal goal—maximize defined score or win chess. These won’t really be change together with capabilities.
I thought a bit about humans, but I feel that this is much more complicated and needs more nuanced definitions of goals. (is avoiding suffering a terminal goal? It seems that way, but who is doing the thinking in which it is useful to think of one thing or another as a goal? Perhaps the goal is to reduce specific neuronal activity for which avoiding suffering is merely instrumental?)
I thought a bit about humans, but I feel that this is much more complicated and needs more nuanced definitions of goals. (is avoiding suffering a terminal goal? It seems that way, but who is doing the thinking in which it is useful to think of one thing or another as a goal? Perhaps the goal is to reduce specific neuronal activity for which avoiding suffering is merely instrumental?)
I’m actually not very optimistic about a more complex or formal definition of goals. In my mind, the concept of a “goal” is often useful, but it’s sort of an intrinisically fuzzy or fundamentally pragmatic concept. I also think that, in practice, the distinction between an “intrinsic” and “instrumental” goal is pretty fuzzy in the same way (although I think your definition is a good one).
Ultimately, agents exhibit behaviors. It’s often useful to try to summarize these behaviors in terms of what sorts of things the agent is fundamentally “trying” to do and in terms of the “capabilities” that the agent brings to bear. But I think this is just sort of a loose way of speaking. I don’t really think, for example, that there are principled/definitive answers to the questions “What are all of my cat’s goals?”, “Which of my cat’s goals are intrinsic?”, or “What’s my cat’s utility function?” Even if we want to move beyond behavioral definitions of goals, to ones that focus on cognitive processes, I think these sorts of questions will probably still remain pretty fuzzy.
(I think that this way of thinking—in which evolutionary or engineering selection processes ultimately act on “behaviors,” which can only somewhat informally or imprecisely be described in terms of “capabilities” and “goals”—also probably has an influence on my relative optimism about AI alignment. )
Thanks! This does clarify things for me, and I think that the definition of a “goal” is very helpful here. I do still have some uncertainty here about the claim of process orthogonality which I can better understand:
Let’s define an “instrumental goal” as a goal X for which there is a goal Y such that whenever it is useful to think of the agent as “trying to do X” it is in fact also useful to think of it as “trying to do Y”; In this case we can think that X is instrumental to Y. Instrumental goals can be generated at the development phase or by the agent itself (implicitly or explicitly).
I think that the (non-process) orthogonality thesis does not hold with respect to instrumental goals. A better selection of instrumental goals will enable better capabilities, and with greater capabilities comes greater planning capacity.
Therefore, the process orthogonality thesis does not hold as well for instrumental goals. This means that instrumental goals are usually not the goals of interest when trying to discern between process and non-process orthogonality theses, and we should focus on terminal goal (those which aren’t instrumental).
In the case of an RL agent or Deep Blue, I can only see one terminal goal—maximize defined score or win chess. These won’t really be change together with capabilities.
I thought a bit about humans, but I feel that this is much more complicated and needs more nuanced definitions of goals. (is avoiding suffering a terminal goal? It seems that way, but who is doing the thinking in which it is useful to think of one thing or another as a goal? Perhaps the goal is to reduce specific neuronal activity for which avoiding suffering is merely instrumental?)
I’m actually not very optimistic about a more complex or formal definition of goals. In my mind, the concept of a “goal” is often useful, but it’s sort of an intrinisically fuzzy or fundamentally pragmatic concept. I also think that, in practice, the distinction between an “intrinsic” and “instrumental” goal is pretty fuzzy in the same way (although I think your definition is a good one).
Ultimately, agents exhibit behaviors. It’s often useful to try to summarize these behaviors in terms of what sorts of things the agent is fundamentally “trying” to do and in terms of the “capabilities” that the agent brings to bear. But I think this is just sort of a loose way of speaking. I don’t really think, for example, that there are principled/definitive answers to the questions “What are all of my cat’s goals?”, “Which of my cat’s goals are intrinsic?”, or “What’s my cat’s utility function?” Even if we want to move beyond behavioral definitions of goals, to ones that focus on cognitive processes, I think these sorts of questions will probably still remain pretty fuzzy.
(I think that this way of thinking—in which evolutionary or engineering selection processes ultimately act on “behaviors,” which can only somewhat informally or imprecisely be described in terms of “capabilities” and “goals”—also probably has an influence on my relative optimism about AI alignment. )