After some clarification Dayan thinks that vigour is not the thing I was looking for.
We discussed this a bit further and he suggested that the temporal difference error does track pretty closely what we mean by happiness/suffering, at least as far as the zero point is concerned. Here’s a paper making the case (but it has limited scope IMO).
If that’s true, we wouldn’t need e.g. the theory that there’s a zero point to keep firing rates close to zero.
The only problem with TD errors seems to be that they don’t account for the difference between wanting and liking. But it’s currently just unresolved what the function of liking is. So I came away with the impression that liking vs wanting and not the zero point is the central question.
I’ve seen one paper suggesting that liking is basically the consumption of rewards, which would bring us back to the question of the zero point though. But we didn’t find that theory satisfying. E.g. food is just a proxy for survival. And as the paper I linked shows, happiness can follow TD errors even when no rewards are consumed.
Dayan mentioned that liking may even be an epiphenomenon of some things that are going on in the brain when we eat food/have sex etc, similar to how the specific flavour of pleasure we get from listening to music is such an epiphenomenon. I don’t know if that would mean that liking has no function.
Daswani and Leike (2015) also define (p. 4) happiness as the temporal difference error (in an MDP), and for model-based agents, the definition is, in my interpretation, basically the common Internet slogan that “happiness = reality—expectations”. However, the authors point out (p. 2) that pleasure = reward != happiness. This still leaves open the issue of what pleasure is.
Personally I think pleasure is more morally relevant. In Tomasik (2014), I wrote (p. 11):
After training, dopamine spikes when a cue appears signaling that a reward will arrive, not when the reward itself is consumed [Schultz et al., 1997], but we know subjectively that the main pleasure of a reward comes from consuming it, not predicting it. In other words, in equation (1), the pleasure comes from the actual reward r, not from the amount of dopamine δ.
In this post commenting on Daswani and Leike (2015), I said:
I personally don’t think the definition of “happiness” that Daswani and Leike advance is the most morally relevant one, but the authors make an interesting case for their definition. I think their definition corresponds most closely with “being pleased of one’s current state in a high-level sense”. In contrast, I think raw pleasure/pain is most morally significant. As a simple test, ask whether you’d rather be in a state where you’ve been unexpectedly notified that you’ll get a cookie in a few minutes or whether you’d rather be in the state where you actually eat the cookie after having been notified a few minutes earlier. Daswani and Leike’s definition considers being notified about the cookie to be happiness, while I think eating the cookie has more moral relevance.
Dayan mentioned that liking may even be an epiphenomenon of some things that are going on in the brain when we eat food/have sex etc, similar to how the specific flavour of pleasure we get from listening to music is such an epiphenomenon.
I’m not sure I understand, but I wrote a quick thing here inspired by this comment. Do you think that’s what he meant? If so, may I attribute him/you for the idea? It seems fairly plausible. :) Studying what separates red from blue might help shine light on this topic.
After some clarification Dayan thinks that vigour is not the thing I was looking for.
We discussed this a bit further and he suggested that the temporal difference error does track pretty closely what we mean by happiness/suffering, at least as far as the zero point is concerned. Here’s a paper making the case (but it has limited scope IMO).
If that’s true, we wouldn’t need e.g. the theory that there’s a zero point to keep firing rates close to zero.
The only problem with TD errors seems to be that they don’t account for the difference between wanting and liking. But it’s currently just unresolved what the function of liking is. So I came away with the impression that liking vs wanting and not the zero point is the central question.
I’ve seen one paper suggesting that liking is basically the consumption of rewards, which would bring us back to the question of the zero point though. But we didn’t find that theory satisfying. E.g. food is just a proxy for survival. And as the paper I linked shows, happiness can follow TD errors even when no rewards are consumed.
Dayan mentioned that liking may even be an epiphenomenon of some things that are going on in the brain when we eat food/have sex etc, similar to how the specific flavour of pleasure we get from listening to music is such an epiphenomenon. I don’t know if that would mean that liking has no function.
Any thoughts?
Interesting. :)
Daswani and Leike (2015) also define (p. 4) happiness as the temporal difference error (in an MDP), and for model-based agents, the definition is, in my interpretation, basically the common Internet slogan that “happiness = reality—expectations”. However, the authors point out (p. 2) that pleasure = reward != happiness. This still leaves open the issue of what pleasure is.
Personally I think pleasure is more morally relevant. In Tomasik (2014), I wrote (p. 11):
In this post commenting on Daswani and Leike (2015), I said:
I’m not sure I understand, but I wrote a quick thing here inspired by this comment. Do you think that’s what he meant? If so, may I attribute him/you for the idea? It seems fairly plausible. :) Studying what separates red from blue might help shine light on this topic.