I do not find the argument against the applicability of the Complete Class theorem in that post convincing. See Charlie Steiner’s reply in the comments.
You just have to separate “how the agent internally represents its preferences” from “what it looks like the agent us doing.” You describe an agent that dodges the money-pump by simply acting consistently with past choices. Internally this agent has an incomplete representation of preferences, plus a memory. But externally it looks like this agent is acting like it assigns equal value to whatever indifferent things it thought of choosing between first.
Decision theory is concerned with external behaviour, not internal representations. All of these theorems are talking about whether the agent’s actions can be consistently described as maximising a utility function. They are not concerned whatsoever with how the agent actually mechanically represents and thinks about its preferences and actions on the inside. To decision theory, agents are black boxes. Information goes in, decision comes out. Whatever processes may go on in between are beyond the scope of what the theorems are trying to talk about.
So
Money-pump arguments for Completeness (understood as the claim that sufficiently-advanced artificial agents will have complete preferences) assume that such agents will not act in accordance with policies like ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ But that assumption is doubtful. Agents with incomplete preferences have good reasons to act in accordance with this kind of policy: (1) it never requires them to change or act against their preferences, and (2) it makes them immune to all possible money-pumps for Completeness.
As far as decision theory is concerned, this is a complete set of preferences. Whether the agent makes up its mind as it goes along or has everything it wants written up in a database ahead of time matters not a peep to decision theory. The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function. If it quacks like a duck, it’s a duck.
The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function.
If you’re only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent’s preferences are defined over entire histories of the universe, and the history it’s enacting is its most-preferred.
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
all behaviour can be interpreted as maximising a utility function.
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it’s a ludicrously large one.
The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It’s telling you what kind of systems you can usefully model as “making decisions” if you want to predict their actions.
If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent’s preferences be somehow explicitly represented in the algorithms it “actually uses” to make decisions, whatever that would mean. It doesn’t know what these algorithms are, and doesn’t even have the vocabulary to formulate questions about them. It’s like saying we can’t use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts of them may or may not correspond to processes that meet some yet unknown embedded-in-physics definition of “making a decision”.
I think this response misses the woods for the trees here. It’s true that you can fit some utility function to behaviour, if you make a more fine-grained outcome-space on which preferences are now coherent etc. But this removes basically all of the predictive content that Eliezer etc. assumes when invoking them.
In particular, the use of these theorems in doomer arguments absolutely does implicitly care about “internal structure” stuff—e.g. one major premise is that non-EU-maximising AI’s will reflectively iron out the “wrinkles” in their preferences to better approximate an EU-maximiser, since they will notice that their e.g. incompleteness leads to exploitability. The OP argument shows that an incomplete-preference agent will be inexploitable by its own lights. The fact that there’s some completely different way to refactor the outcome-space such that from the outside it looks like an EU-maximiser is just irrelevant.
>If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour
This also seems to be begging the question—if I have something I think I can describe as a non-EU-maximising decision-theoretic agent, but which has to be described with an incredibly cumbersome utility function, why do we not just conclude that EU-maximisation is the wrong way to model the agent, rather than throwing out the belief that is should be modelled as an agent. If I have a preferential gap between A and B, and you have to jump through some ridiculous hoops to make this look EU-coherent ( “he prefers [A and Tuesday and feeling slightly hungry and saw some friends yesterday and the price of blueberries is <£1 and....] to [B and Wednesday and full and at a party and blueberries >£1 and...]” ), seems like the correct conclusion is not to throw away me being a decision-theoretic agent, but me being well-modelled as an EU-maximiser
>The less coherent and smart a system acts, the longer the utility function you need to specify...
These are two very different concepts? (Equating “coherent” with “smart” is again kinda begging the question). Re: coherence, it’s just tautologous that the more complexly you have to partition up outcome-space to make things look coherent, the more complex the resulting utility function will be. Re: smartness, if we’re operationalising this as “ability to steer the world towards states of higher utility”, then it seems like smartness and utility-function-complexity are by definition independent. Unless you mean more “ability to steer the world in a way that seems legible to us” in which case it’s again just tautologous
That all sounds approximately right but I’m struggling to see how it bears on this point:
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
I do not find the argument against the applicability of the Complete Class theorem in that post convincing. See Charlie Steiner’s reply in the comments.
Decision theory is concerned with external behaviour, not internal representations. All of these theorems are talking about whether the agent’s actions can be consistently described as maximising a utility function. They are not concerned whatsoever with how the agent actually mechanically represents and thinks about its preferences and actions on the inside. To decision theory, agents are black boxes. Information goes in, decision comes out. Whatever processes may go on in between are beyond the scope of what the theorems are trying to talk about.
So
As far as decision theory is concerned, this is a complete set of preferences. Whether the agent makes up its mind as it goes along or has everything it wants written up in a database ahead of time matters not a peep to decision theory. The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function. If it quacks like a duck, it’s a duck.
If you’re only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent’s preferences are defined over entire histories of the universe, and the history it’s enacting is its most-preferred.
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it’s a ludicrously large one.
The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It’s telling you what kind of systems you can usefully model as “making decisions” if you want to predict their actions.
If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent’s preferences be somehow explicitly represented in the algorithms it “actually uses” to make decisions, whatever that would mean. It doesn’t know what these algorithms are, and doesn’t even have the vocabulary to formulate questions about them. It’s like saying we can’t use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts of them may or may not correspond to processes that meet some yet unknown embedded-in-physics definition of “making a decision”.
I think this response misses the woods for the trees here. It’s true that you can fit some utility function to behaviour, if you make a more fine-grained outcome-space on which preferences are now coherent etc. But this removes basically all of the predictive content that Eliezer etc. assumes when invoking them.
In particular, the use of these theorems in doomer arguments absolutely does implicitly care about “internal structure” stuff—e.g. one major premise is that non-EU-maximising AI’s will reflectively iron out the “wrinkles” in their preferences to better approximate an EU-maximiser, since they will notice that their e.g. incompleteness leads to exploitability. The OP argument shows that an incomplete-preference agent will be inexploitable by its own lights. The fact that there’s some completely different way to refactor the outcome-space such that from the outside it looks like an EU-maximiser is just irrelevant.
>If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour
This also seems to be begging the question—if I have something I think I can describe as a non-EU-maximising decision-theoretic agent, but which has to be described with an incredibly cumbersome utility function, why do we not just conclude that EU-maximisation is the wrong way to model the agent, rather than throwing out the belief that is should be modelled as an agent. If I have a preferential gap between A and B, and you have to jump through some ridiculous hoops to make this look EU-coherent ( “he prefers [A and Tuesday and feeling slightly hungry and saw some friends yesterday and the price of blueberries is <£1 and....] to [B and Wednesday and full and at a party and blueberries >£1 and...]” ), seems like the correct conclusion is not to throw away me being a decision-theoretic agent, but me being well-modelled as an EU-maximiser
>The less coherent and smart a system acts, the longer the utility function you need to specify...
These are two very different concepts? (Equating “coherent” with “smart” is again kinda begging the question). Re: coherence, it’s just tautologous that the more complexly you have to partition up outcome-space to make things look coherent, the more complex the resulting utility function will be. Re: smartness, if we’re operationalising this as “ability to steer the world towards states of higher utility”, then it seems like smartness and utility-function-complexity are by definition independent. Unless you mean more “ability to steer the world in a way that seems legible to us” in which case it’s again just tautologous
That all sounds approximately right but I’m struggling to see how it bears on this point:
Can you explain?