A couple of other examples, both of which have been discussed on LessWrong before:
In Eliezer’s book Inadequate Equilibria, he gives a central anecdote that by reading econ bloggers he confidently realized the Bank of Japan was making mistakes worth trillions of dollars. He further claimed that a change in leadership meant that the Bank of Japan soon after pursued his favored policies, immediately leading to “real GDP growth of 2.3%, where the previous trend was for falling RGDP” and validating his analysis.
If true, this is really remarkable. Let me reiterate: He says that by reading econ blogs, he was able to casually identify an economic policy of such profound importance that the country of Japan was able to reverse declining GDP immediately.
In fact, one of his central points in the book is not just that he was able to identify this opportunity, but that he could be justifiably confident in his knowledge despite not having any expertise in economic policy. His intention with the book is to explain how and why he can be correct about things like this.
I do not find the argument against the applicability of the Complete Class theorem in that post convincing. See Charlie Steiner’s reply in the comments.
You just have to separate “how the agent internally represents its preferences” from “what it looks like the agent us doing.” You describe an agent that dodges the money-pump by simply acting consistently with past choices. Internally this agent has an incomplete representation of preferences, plus a memory. But externally it looks like this agent is acting like it assigns equal value to whatever indifferent things it thought of choosing between first.
Decision theory is concerned with external behaviour, not internal representations. All of these theorems are talking about whether the agent’s actions can be consistently described as maximising a utility function. They are not concerned whatsoever with how the agent actually mechanically represents and thinks about its preferences and actions on the inside. To decision theory, agents are black boxes. Information goes in, decision comes out. Whatever processes may go on in between are beyond the scope of what the theorems are trying to talk about.
So
Money-pump arguments for Completeness (understood as the claim that sufficiently-advanced artificial agents will have complete preferences) assume that such agents will not act in accordance with policies like ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ But that assumption is doubtful. Agents with incomplete preferences have good reasons to act in accordance with this kind of policy: (1) it never requires them to change or act against their preferences, and (2) it makes them immune to all possible money-pumps for Completeness.
As far as decision theory is concerned, this is a complete set of preferences. Whether the agent makes up its mind as it goes along or has everything it wants written up in a database ahead of time matters not a peep to decision theory. The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function. If it quacks like a duck, it’s a duck.
The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function.
If you’re only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent’s preferences are defined over entire histories of the universe, and the history it’s enacting is its most-preferred.
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
all behaviour can be interpreted as maximising a utility function.
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it’s a ludicrously large one.
The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It’s telling you what kind of systems you can usefully model as “making decisions” if you want to predict their actions.
If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent’s preferences be somehow explicitly represented in the algorithms it “actually uses” to make decisions, whatever that would mean. It doesn’t know what these algorithms are, and doesn’t even have the vocabulary to formulate questions about them. It’s like saying we can’t use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts of them may or may not correspond to processes that meet some yet unknown embedded-in-physics definition of “making a decision”.
I think this response misses the woods for the trees here. It’s true that you can fit some utility function to behaviour, if you make a more fine-grained outcome-space on which preferences are now coherent etc. But this removes basically all of the predictive content that Eliezer etc. assumes when invoking them.
In particular, the use of these theorems in doomer arguments absolutely does implicitly care about “internal structure” stuff—e.g. one major premise is that non-EU-maximising AI’s will reflectively iron out the “wrinkles” in their preferences to better approximate an EU-maximiser, since they will notice that their e.g. incompleteness leads to exploitability. The OP argument shows that an incomplete-preference agent will be inexploitable by its own lights. The fact that there’s some completely different way to refactor the outcome-space such that from the outside it looks like an EU-maximiser is just irrelevant.
>If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour
This also seems to be begging the question—if I have something I think I can describe as a non-EU-maximising decision-theoretic agent, but which has to be described with an incredibly cumbersome utility function, why do we not just conclude that EU-maximisation is the wrong way to model the agent, rather than throwing out the belief that is should be modelled as an agent. If I have a preferential gap between A and B, and you have to jump through some ridiculous hoops to make this look EU-coherent ( “he prefers [A and Tuesday and feeling slightly hungry and saw some friends yesterday and the price of blueberries is <£1 and....] to [B and Wednesday and full and at a party and blueberries >£1 and...]” ), seems like the correct conclusion is not to throw away me being a decision-theoretic agent, but me being well-modelled as an EU-maximiser
>The less coherent and smart a system acts, the longer the utility function you need to specify...
These are two very different concepts? (Equating “coherent” with “smart” is again kinda begging the question). Re: coherence, it’s just tautologous that the more complexly you have to partition up outcome-space to make things look coherent, the more complex the resulting utility function will be. Re: smartness, if we’re operationalising this as “ability to steer the world towards states of higher utility”, then it seems like smartness and utility-function-complexity are by definition independent. Unless you mean more “ability to steer the world in a way that seems legible to us” in which case it’s again just tautologous
That all sounds approximately right but I’m struggling to see how it bears on this point:
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
The coherence theorem part seems particularly egregious to me given how load-bearing it seems to be to a lot of his major claims. A frustration I have personally is that he seems to claim a lot that no one ever comes to him with good object-level objections to his arguments, but then when they do like in that thread he just refuses to engage
A couple of other examples, both of which have been discussed on LessWrong before:
In Eliezer’s book Inadequate Equilibria, he gives a central anecdote that by reading econ bloggers he confidently realized the Bank of Japan was making mistakes worth trillions of dollars. He further claimed that a change in leadership meant that the Bank of Japan soon after pursued his favored policies, immediately leading to “real GDP growth of 2.3%, where the previous trend was for falling RGDP” and validating his analysis.
If true, this is really remarkable. Let me reiterate: He says that by reading econ blogs, he was able to casually identify an economic policy of such profound importance that the country of Japan was able to reverse declining GDP immediately.
In fact, one of his central points in the book is not just that he was able to identify this opportunity, but that he could be justifiably confident in his knowledge despite not having any expertise in economic policy. His intention with the book is to explain how and why he can be correct about things like this.
The problem? His anecdote falls apart at the slightest fact check.
Japan’s GDP was not falling when he says it was.
There was no discernible change in GDP growth after the change in leadership and enactment of his preferred policies, while he claimed a huge jump.
This was pointed out in a widely upvoted LessWrong post (300+ karma) early this year.
Eliezer has yet to respond or correct his book.
He misunderstands what expected utility theorems in economics say. He has written very confidently across many years that they provide proofs about dominated strategies and money pumping (see appendix of link), which they do not.
He did at least reply to this post, but his replies further demonstrate his confusions about expected utility theorems. For example, in his first attempted refutation of the post he confused one property for another.
This contributes to his and other’s beliefs about AI doom by default.
Other people in the Rationality and AI Safety communities have absorbed his misunderstanding and his confidence level in it.
I find this comment much more convincing than the top-level post.
I do not find the argument against the applicability of the Complete Class theorem in that post convincing. See Charlie Steiner’s reply in the comments.
Decision theory is concerned with external behaviour, not internal representations. All of these theorems are talking about whether the agent’s actions can be consistently described as maximising a utility function. They are not concerned whatsoever with how the agent actually mechanically represents and thinks about its preferences and actions on the inside. To decision theory, agents are black boxes. Information goes in, decision comes out. Whatever processes may go on in between are beyond the scope of what the theorems are trying to talk about.
So
As far as decision theory is concerned, this is a complete set of preferences. Whether the agent makes up its mind as it goes along or has everything it wants written up in a database ahead of time matters not a peep to decision theory. The only thing that matters is whether the agent’s resulting behaviour can be coherently described as maximising a utility function. If it quacks like a duck, it’s a duck.
If you’re only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent’s preferences are defined over entire histories of the universe, and the history it’s enacting is its most-preferred.
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent’s preference. And once we do that, we can observe violations of Completeness.
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it’s a ludicrously large one.
The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It’s telling you what kind of systems you can usefully model as “making decisions” if you want to predict their actions.
If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent’s preferences be somehow explicitly represented in the algorithms it “actually uses” to make decisions, whatever that would mean. It doesn’t know what these algorithms are, and doesn’t even have the vocabulary to formulate questions about them. It’s like saying we can’t use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts of them may or may not correspond to processes that meet some yet unknown embedded-in-physics definition of “making a decision”.
I think this response misses the woods for the trees here. It’s true that you can fit some utility function to behaviour, if you make a more fine-grained outcome-space on which preferences are now coherent etc. But this removes basically all of the predictive content that Eliezer etc. assumes when invoking them.
In particular, the use of these theorems in doomer arguments absolutely does implicitly care about “internal structure” stuff—e.g. one major premise is that non-EU-maximising AI’s will reflectively iron out the “wrinkles” in their preferences to better approximate an EU-maximiser, since they will notice that their e.g. incompleteness leads to exploitability. The OP argument shows that an incomplete-preference agent will be inexploitable by its own lights. The fact that there’s some completely different way to refactor the outcome-space such that from the outside it looks like an EU-maximiser is just irrelevant.
>If describing a system as a decision theoretic agent is that cumbersome, it’s probably better to look for some other model to predict its behaviour
This also seems to be begging the question—if I have something I think I can describe as a non-EU-maximising decision-theoretic agent, but which has to be described with an incredibly cumbersome utility function, why do we not just conclude that EU-maximisation is the wrong way to model the agent, rather than throwing out the belief that is should be modelled as an agent. If I have a preferential gap between A and B, and you have to jump through some ridiculous hoops to make this look EU-coherent ( “he prefers [A and Tuesday and feeling slightly hungry and saw some friends yesterday and the price of blueberries is <£1 and....] to [B and Wednesday and full and at a party and blueberries >£1 and...]” ), seems like the correct conclusion is not to throw away me being a decision-theoretic agent, but me being well-modelled as an EU-maximiser
>The less coherent and smart a system acts, the longer the utility function you need to specify...
These are two very different concepts? (Equating “coherent” with “smart” is again kinda begging the question). Re: coherence, it’s just tautologous that the more complexly you have to partition up outcome-space to make things look coherent, the more complex the resulting utility function will be. Re: smartness, if we’re operationalising this as “ability to steer the world towards states of higher utility”, then it seems like smartness and utility-function-complexity are by definition independent. Unless you mean more “ability to steer the world in a way that seems legible to us” in which case it’s again just tautologous
That all sounds approximately right but I’m struggling to see how it bears on this point:
Can you explain?
The coherence theorem part seems particularly egregious to me given how load-bearing it seems to be to a lot of his major claims. A frustration I have personally is that he seems to claim a lot that no one ever comes to him with good object-level objections to his arguments, but then when they do like in that thread he just refuses to engage
this link is broken
Thanks, fixed.