It’s Yudkowsky’s term for the dangerous bit where the system starts having preferences over future states, rather than just taking the current reward signal and sitting there. It’s crucial to the fast-doom case, but not well explained as far as I can see. David Krueger identified it as a missing assumption under a different name here.
I’m also still a bit confused about what exactly this concept refers to. Is a ‘consequentialist’ basically just an ‘optimiser’ in the sense that Yudkowsky uses in the sequences (e.g. here), that has later been refined by posts like this one (where it’s called ‘selection’) and this one?
In other words, roughly speaking, is a system a consequentialist to the extent that it’s trying to take actions that push its environment towards a certain goal state?
Found the source. There, he says that an “explicit cognitive model and explicit forecasts” about the future are necessary to true consequentialist cognition (CC). He agrees that CC is already common among optimisers (like chess engines); the dangerous kind is consequentialism over broad domains (i.e. where everything in the world is in play, is a possible means, while the chess engine only considers the set of legal moves as its domain).
“Goal-seeking” seems like the previous, less-confusing word for it, not sure why people shifted.
I replaced the original comment with “goal-directed,” each of them has some baggage and isn’t quite right but on balance I think goal-directed is better. I’m not very systematic about this choice, just a reflection of my mood that day.
Could you clarify what ‘consequentialist cognition’ and ‘consequentialist behaviour’ mean in this context? Googling hasn’t given any insight
It’s Yudkowsky’s term for the dangerous bit where the system starts having preferences over future states, rather than just taking the current reward signal and sitting there. It’s crucial to the fast-doom case, but not well explained as far as I can see. David Krueger identified it as a missing assumption under a different name here.
I’m also still a bit confused about what exactly this concept refers to. Is a ‘consequentialist’ basically just an ‘optimiser’ in the sense that Yudkowsky uses in the sequences (e.g. here), that has later been refined by posts like this one (where it’s called ‘selection’) and this one?
In other words, roughly speaking, is a system a consequentialist to the extent that it’s trying to take actions that push its environment towards a certain goal state?
Found the source. There, he says that an “explicit cognitive model and explicit forecasts” about the future are necessary to true consequentialist cognition (CC). He agrees that CC is already common among optimisers (like chess engines); the dangerous kind is consequentialism over broad domains (i.e. where everything in the world is in play, is a possible means, while the chess engine only considers the set of legal moves as its domain).
“Goal-seeking” seems like the previous, less-confusing word for it, not sure why people shifted.
I replaced the original comment with “goal-directed,” each of them has some baggage and isn’t quite right but on balance I think goal-directed is better. I’m not very systematic about this choice, just a reflection of my mood that day.