It seems obvious to me that we’re talking past each other, meaning we’re in many cases trying to accomplish different things with our models/explanations. The fact that this doesn’t seem obvious to you suggests to me that I’m either bad at explaining, or that you might be interpreting my comments uncharitably. I agree with your tl;dr, btw!
This matches the data quite nicely, methinks. Better than “irrationality”, anyway.
You’re presupposing that the agent would not modify any of its emotional links if it had the means to do so. This assumption might apply in some cases, but it seems obviously wrong as a generalization. Therefore, your model is incomplete. Reread what I wrote in the part “On goals”. I’m making a distinction between “utility-function_1“, which reflects all the decisions/actions an agent will make in all possible situations, and “utility-function_2”, which reflects all the decisions/actions an agent would want to make in all possible situations. You’re focusing on “utility-function_1”, and what you’re saying is entirely accurate – I like your model in regard to what it is trying to do. However, I find “utility-function_2s” much more interesting and relevant, which is why I’m focusing on them. Why don’t you find them interesting?
Agenty/rational behaviour isn’t exclusive to system 2. How does system 1 decide when to trigger this coping mechanism?
Again, we have different understandings of rationality. The way I defined “goals” in the section “On goals”, it is only the system 2 that is defining what is rational, and system 1 heuristics can be “rational” if they are calibrated in a way that produces outcomes that are good in regard to the system 2 goals, given the most probable environment the agent will encounter. This part seems to be standard usage, in fact.
Side note: Your theory of rationality is quite Panglossian, it is certainly possible to interpret all of human behavior as “rational” (as e.g. Gigerenzer does), but that would strike me as a weird/pointless thing to do.
Was that the evidence you have for the claim that humans aren’t designed to efficiently pursue a single goal? Or do you have more evidence?
This claim strikes me as really obvious, so I’m wondering whether you might be misunderstanding what I mean. Have you never noticed just how bad people are at consequentialism? When people first hear of utilitarianism/EA, they think EA implies to give away all their wealth until they are penniless, instead of considering future earning prospects. Or they think having children and indoctrinating them with utilitarian beliefs is an effective way to do good, ignoring that you could just use the money to teach other people’s children more effectively. The point EY makes in the Godshatter article is all about how evolution programmed many goals into humans even though evolution only (metaphorically) has one goal.
What I’m saying is the following: Humans are bad at being consequentialists, they are bad at orienting their entire lives, i.e. jobs/relationships/free-time/self-improvement… towards one well-defined aim. Why? Well for one thing, if you ask most people “what is your goal”, they’ll have a hard time to even answer! In addition, evolution programmed us to value many things, so someone whose goal is solely “reduce as much suffering as possible” cannot count on evolutionarily adaptive heuristics, and thus, biases against efficient behavioral consequentialism are to be expected.
It is trivially true that a utility function-based agent exists (in a mathematical sense) which perfectly models someone’s behaviour. It may not be the simplest, but it must exist.
I know, what I was trying to say by “A neuroscientist of the future, when the remaining mysteries of the human brain will be solved, will not be able to look at people’s brains and read out a clear utility-function” – emphasis should be on CLEAR – was that there is no representation in the brain where you look at it and see “ah, that’s where the utility function is”. Instead (and I tried to make this clear with my very next sentence, which you quoted), you have to look at the whole agent, where eventually, all the situational heuristics, the emotional weights you talked about, and the agent’s beliefs, together imply a utility-function in the utility-function_1-sense. Contrast this to a possible AI: It should be possible to construct an AI with a more clearly represented utility function, or a “neutral” prototype-AI where scientists need to fill in the utility-function part with whatever they care to fill in. When we speak of “agents”, we have in mind an entity that knows what it’s goals are. When we represent an FAI or a paperclip maximizer, we assume this entity knows what its goal is. However, most humans do not know what their goals are. This is interesting, isn’t it? Paperclippers would not enagage in discussions on moral philosophy, because they know what their goals are. Humans are often confused about their goals (and about morality, which some take to imply more than just goals). So your model of human utility functions should incorporate this curious fact of confusion. I was very confused by it myself, but I now think I understand what is going on.
In my opinion, there is always cognitive dissonance in this entire paradigm of utility quotas. You’re making yourself act like two agents with two different moralities who share the same body but get control at different times. There is cognitive dissonance between those two agents. Even if you try to always have one agent in charge, there’s cognitive dissonance with the part you’re denying.
I find this quite a good description, actually. One is your system 2, “what you would want to do under reflection”, the other is “primal, animal-like brain”, to use outdated language. I wouldn’t call the result “cognitive dissonance” necessarily. If you rationally understand what is going on, and realize that rebelling against your instincts/intuitions/emotional set-up is going to lead to a worse outcome than trying to reconcile your conflicting motivations, then the latter is literally the most sensible thing for your system 2 to attempt to do.
It seems obvious to me that we’re talking past each other, meaning we’re in many cases trying to accomplish different things with our models/explanations. The fact that this doesn’t seem obvious to you suggests to me that I’m either bad at explaining, or that you might be interpreting my comments uncharitably. I agree with your tl;dr, btw!
You’re presupposing that the agent would not modify any of its emotional links if it had the means to do so. This assumption might apply in some cases, but it seems obviously wrong as a generalization. Therefore, your model is incomplete. Reread what I wrote in the part “On goals”. I’m making a distinction between “utility-function_1“, which reflects all the decisions/actions an agent will make in all possible situations, and “utility-function_2”, which reflects all the decisions/actions an agent would want to make in all possible situations. You’re focusing on “utility-function_1”, and what you’re saying is entirely accurate – I like your model in regard to what it is trying to do. However, I find “utility-function_2s” much more interesting and relevant, which is why I’m focusing on them. Why don’t you find them interesting?
Again, we have different understandings of rationality. The way I defined “goals” in the section “On goals”, it is only the system 2 that is defining what is rational, and system 1 heuristics can be “rational” if they are calibrated in a way that produces outcomes that are good in regard to the system 2 goals, given the most probable environment the agent will encounter. This part seems to be standard usage, in fact.
Side note: Your theory of rationality is quite Panglossian, it is certainly possible to interpret all of human behavior as “rational” (as e.g. Gigerenzer does), but that would strike me as a weird/pointless thing to do.
This claim strikes me as really obvious, so I’m wondering whether you might be misunderstanding what I mean. Have you never noticed just how bad people are at consequentialism? When people first hear of utilitarianism/EA, they think EA implies to give away all their wealth until they are penniless, instead of considering future earning prospects. Or they think having children and indoctrinating them with utilitarian beliefs is an effective way to do good, ignoring that you could just use the money to teach other people’s children more effectively. The point EY makes in the Godshatter article is all about how evolution programmed many goals into humans even though evolution only (metaphorically) has one goal.
What I’m saying is the following: Humans are bad at being consequentialists, they are bad at orienting their entire lives, i.e. jobs/relationships/free-time/self-improvement… towards one well-defined aim. Why? Well for one thing, if you ask most people “what is your goal”, they’ll have a hard time to even answer! In addition, evolution programmed us to value many things, so someone whose goal is solely “reduce as much suffering as possible” cannot count on evolutionarily adaptive heuristics, and thus, biases against efficient behavioral consequentialism are to be expected.
I know, what I was trying to say by “A neuroscientist of the future, when the remaining mysteries of the human brain will be solved, will not be able to look at people’s brains and read out a clear utility-function” – emphasis should be on CLEAR – was that there is no representation in the brain where you look at it and see “ah, that’s where the utility function is”. Instead (and I tried to make this clear with my very next sentence, which you quoted), you have to look at the whole agent, where eventually, all the situational heuristics, the emotional weights you talked about, and the agent’s beliefs, together imply a utility-function in the utility-function_1-sense. Contrast this to a possible AI: It should be possible to construct an AI with a more clearly represented utility function, or a “neutral” prototype-AI where scientists need to fill in the utility-function part with whatever they care to fill in. When we speak of “agents”, we have in mind an entity that knows what it’s goals are. When we represent an FAI or a paperclip maximizer, we assume this entity knows what its goal is. However, most humans do not know what their goals are. This is interesting, isn’t it? Paperclippers would not enagage in discussions on moral philosophy, because they know what their goals are. Humans are often confused about their goals (and about morality, which some take to imply more than just goals). So your model of human utility functions should incorporate this curious fact of confusion. I was very confused by it myself, but I now think I understand what is going on.
I find this quite a good description, actually. One is your system 2, “what you would want to do under reflection”, the other is “primal, animal-like brain”, to use outdated language. I wouldn’t call the result “cognitive dissonance” necessarily. If you rationally understand what is going on, and realize that rebelling against your instincts/intuitions/emotional set-up is going to lead to a worse outcome than trying to reconcile your conflicting motivations, then the latter is literally the most sensible thing for your system 2 to attempt to do.