I appreciate the whole post. But I personally really enjoyed the appendix. In particular, I found it informative that Yudkowsk can speak/write with that level of authoritativeness, confidence, and disdain for others who disagree, and still be wrong (if this post is right).
I expect someone to write a comment with the details at some point (I am pretty busy right now, so can only give a quick meta-level gleam), but mostly, I feel like in order to argue that something is wrong with these arguments is that you have to argue more compellingly against completeness and possible alternative ways to establish dutch-book arguments.
Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
You can argue that the theorems are wrong, or that the explicit assumptions of the theorems don’t hold, which many people have done, but like, there are still coherence theorems, and IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
Like, I feel like with the same type of argument that is made in the post I could write a post saying “there are no voting impossibility theorems” and then go ahead and argue that the Arrow’s Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making “an error” since “those things are not real theorems”. And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
I’m following previous authors in defining ‘coherence theorems’ as
theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
On that definition, there are no coherence theorems. VNM is not a coherence theorem, nor is Savage’s Theorem, nor is Bolker-Jeffrey, nor are Dutch Book Arguments, nor is Cox’s Theorem, nor is the Complete Class Theorem.
there are theorems that are relevant to the question of agent coherence
I’d have no problem with authors making that claim.
I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences
I’m following previous authors in defining ‘coherence theorems’ as
Can you be concrete whose previous authors definition are you using here? A google search for your definition returns no results but this post, and this is definitely not a definition of “coherence theorems” that I would use.
(1) How we define the term ‘coherence theorems’ doesn’t matter. What matters is that Premise 1 (striking out the word ‘coherence’, if you like) is false.
(2) The way I define the term ‘coherence theorems’ seems standard.
Now making point (1) in more detail:
Reserve the term ‘coherence theorems’ for whatever you like. Premise 1 is false: there are no theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy. The VNM Theorem doesn’t say that, nor does Savage’s Theorem, nor does Bolker-Jeffrey, nor do Dutch Books, nor does Cox’s Theorem, nor does the Complete Class Theorem. That is the error in coherence arguments. Premise 1 is false.
Now for point (2):
I take the Appendix to make plausible enough that my use of the term ‘coherence theorems’ is standard, at least in online discussions. Here are some quotations.
1.
Now, by the general idea behind coherence theorems, since we can’t view this behavior as corresponding to expected utilities, we ought to be able to show that it corresponds to a dominated strategy somehow
2.
Roughly, the general claim these theorems make is that any system either (a) acts like an expected utility maximizer under some probabilistic model, or (b) throws away resources in a pareto-suboptimal manner.
3.
Summary: Violations of coherence constraints in probability theory and decision theory correspond to qualitatively destructive or dominated behaviors.
Again, we see a manifestation of a powerful family of theorems showing that agents which cannot be seen as corresponding to any coherent probabilities and consistent utility function will exhibit qualitatively destructive behavior
4.
One of the most pleasing things about probability and expected utility theory is that there are many coherence arguments that suggest that these are the “correct” ways to reason. If you deviate from what the theory prescribes, then you must be executing a dominated strategy.
5.
‘Coherence arguments’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.
6.
The overall message here is that there is a set of qualitative behaviors and as long you do not engage in these qualitatively destructive behaviors, you will be behaving as if you have a utility function.
7.
I think that to contain the concept of Utility as it exists in me, you would have to do homework exercises I don’t know how to prescribe. Maybe one set of homework exercises like that would be showing you an agent, including a human, making some set of choices that allegedly couldn’t obey expected utility, and having you figure out how to pump money from that agent (or present it with money that it would pass up).
8.
The view that utility maximizers are inevitable is supported by a number of coherence theories developed early on in game theory which show that any agent without a consistent utility function is exploitable in some sense.
Maybe the term ‘coherence theorems’ gets used differently elsewhere. That is okay. See point (1).
Oh, nice, I do remember really liking that post. It’s a great example, though I think if you bring in time and trade-in-time back into this model you do actually get things that are more VNM-shaped again. But overall I am like “OK, I think that post actually characterizes how coherence arguments apply to agents without completeness quite well”, and am also like “yeah, and the coherence arguments still apply quite strongly, because they aren’t as fickle or as narrow as the OP makes them out to be”.
But overall, yeah, I think this post would be a bunch stronger if it used the markets example from John’s post. I like it quite a bit, and I remember using it as an intuition pump in some situations that I somewhat embarrassingly failed to connect to this argument.
You are correct with some of the criticism, but as a side-note, completeness is actually crazy.
All real agents are bounded, and pay non-zero costs for bits, and as a consequence, don’t have complete preferences. Complete agents in real world do not exist. If they existed, correct intuitive model of them wouldn’t be ‘rational players’ but ‘utterly scary god, much bigger than the universe they live in’.
The same is true for the other implicit assumption in VNM, which is doing bayesianism. There exist no bayesian agents. Any non-trivial bayesian agents would be similarly a terrifying alien god, much bigger than the universe they live in.
Each agent has a computable partial preference ordering x≤y that decides if it prefers x to y.
We’d like this partial relation to be complete (i.e., defined for all x,y) and transitive (i.e., x≤y and y≤z implies x≤z).
Now, if the relation is sufficiently non-trivial, it will be expensive to compute for some x,y. So it’s better left undefined...?
If so, I can surely relate to that, as I often struggle computing my preferences. Even if they are theoretically complete. But it seems to me the relationship is still defined, but might not be practical to compute.
It’s also possible to think of it in this way: You start out with partial preference ordering, and need to calculate one of its transitive closures. But that is computationally difficult, and not unique either.
I’m unsure what these observations add to the discussion, though.
and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences
I’d be surprised if you couldn’t come up with situations where completeness isn’t worth the cost—e.g. something like, to close some preference gaps you’d have to think for 100x as long, but if you close them all arbitrarily then you end up with intrasitivity.
This seems like a great point. Completeness requires closing all preference gaps, but if you do that inconsistently and violate transitivity then suddenly you are vulnerable to money-pumping.
Like, I feel like with the same type of argument that is made in the post I could write a post saying “there are no voting impossibility theorems” and then go ahead and argue that the Arrow’s Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making “an error” since “those things are not real theorems”. And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
I think that there is some sense in which the character in your example would be right, since:
Arrow’s theorem doesn’t bind approval voting.
Generalizations of Arrow’s theorem don’t bind probabilistic results, e.g., each candidate is chosen with some probability corresponding to the amount of votes he gets.
Like, if you had someone saying there was “a deep core of electoral process” which means that as they scale to important decisions means that you will necessarily get “highly defective electoral processes”, as illustrated in the classic example of the “dangers of the first pass the post system”. Well in that case it would be reasonable to wonder whether the assumptions of the theorem bind, or whether there is some system like approval voting which is much less shitty than the theorem provers were expecting, because the assumptions don’t hold.
The analogy is imperfect, though, since approval voting is a known decent system, whereas for AI systems we don’t have an example friendly AI.
Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
Here are some scenarios:
Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
Mmh, then it would be good to differentiate between:
There are coherence theorems that talk about some agents with some properties
There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don’t prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
For instance:
“‘Coherence arguments’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.
This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don’t bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I’m still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
“This post gets somewhat technical and mathematical, but the point can be summarised as:
You are vulnerable to money pumps only to the extent to which you deviate from the von Neumann-Morgenstern axioms of expected utility.
In other words, using alternate decision theories is bad for your wealth.”
The “in other words” doesn’t follow, since EV maximization can be more expensive than the shortcuts.
Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
“Rephrasing again: we have a wide variety of mathematical theorems all spotlighting, from different angles, the fact that a plan lacking in clumsiness, is possessing of coherence.”
“The overall message here is that there is a set of qualitative behaviors and as long you do not engage in these qualitatively destructive behaviors, you will be behaving as if you have a utility function.”
“The view that utility maximizers are inevitable is supported by a number of coherence theories developed early on in game theory which show that any agent without a consistent utility function is exploitable in some sense.”
Here are some words I wrote that don’t quite sit right but which I thought I’d still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it’s not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it’s just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
No need to respond/deeply engage, but I’d appreciate if you let me know if the above comments were too nitpicky.
You can argue that the theorems are wrong, or that the explicit assumptions of the theorems don’t hold, which many people have done, but like, there are still coherence theorems, and IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
If you want to see an example of this, I suggest John’s post here.
I appreciate the whole post. But I personally really enjoyed the appendix. In particular, I found it informative that Yudkowsk can speak/write with that level of authoritativeness, confidence, and disdain for others who disagree, and still be wrong (if this post is right).
The post does actually seem wrong though.
I expect someone to write a comment with the details at some point (I am pretty busy right now, so can only give a quick meta-level gleam), but mostly, I feel like in order to argue that something is wrong with these arguments is that you have to argue more compellingly against completeness and possible alternative ways to establish dutch-book arguments.
Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
You can argue that the theorems are wrong, or that the explicit assumptions of the theorems don’t hold, which many people have done, but like, there are still coherence theorems, and IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
Like, I feel like with the same type of argument that is made in the post I could write a post saying “there are no voting impossibility theorems” and then go ahead and argue that the Arrow’s Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making “an error” since “those things are not real theorems”. And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
I’m following previous authors in defining ‘coherence theorems’ as
On that definition, there are no coherence theorems. VNM is not a coherence theorem, nor is Savage’s Theorem, nor is Bolker-Jeffrey, nor are Dutch Book Arguments, nor is Cox’s Theorem, nor is the Complete Class Theorem.
I’d have no problem with authors making that claim.
Working on it.
Can you be concrete whose previous authors definition are you using here? A google search for your definition returns no results but this post, and this is definitely not a definition of “coherence theorems” that I would use.
Two points, made in order of importance:
(1) How we define the term ‘coherence theorems’ doesn’t matter. What matters is that Premise 1 (striking out the word ‘coherence’, if you like) is false.
(2) The way I define the term ‘coherence theorems’ seems standard.
Now making point (1) in more detail:
Reserve the term ‘coherence theorems’ for whatever you like. Premise 1 is false: there are no theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy. The VNM Theorem doesn’t say that, nor does Savage’s Theorem, nor does Bolker-Jeffrey, nor do Dutch Books, nor does Cox’s Theorem, nor does the Complete Class Theorem. That is the error in coherence arguments. Premise 1 is false.
Now for point (2):
I take the Appendix to make plausible enough that my use of the term ‘coherence theorems’ is standard, at least in online discussions. Here are some quotations.
1.
2.
3.
4.
5.
6.
7.
8.
Maybe the term ‘coherence theorems’ gets used differently elsewhere. That is okay. See point (1).
Spoiler (don’t read if you want to work on a fun puzzle or test your alignment metal).
Oh, nice, I do remember really liking that post. It’s a great example, though I think if you bring in time and trade-in-time back into this model you do actually get things that are more VNM-shaped again. But overall I am like “OK, I think that post actually characterizes how coherence arguments apply to agents without completeness quite well”, and am also like “yeah, and the coherence arguments still apply quite strongly, because they aren’t as fickle or as narrow as the OP makes them out to be”.
But overall, yeah, I think this post would be a bunch stronger if it used the markets example from John’s post. I like it quite a bit, and I remember using it as an intuition pump in some situations that I somewhat embarrassingly failed to connect to this argument.
I cite John in the post!
Ah, ok. Why don’t you just respond with markets then!
You are correct with some of the criticism, but as a side-note, completeness is actually crazy.
All real agents are bounded, and pay non-zero costs for bits, and as a consequence, don’t have complete preferences. Complete agents in real world do not exist. If they existed, correct intuitive model of them wouldn’t be ‘rational players’ but ‘utterly scary god, much bigger than the universe they live in’.
Oh, sorry, totally.
The same is true for the other implicit assumption in VNM, which is doing bayesianism. There exist no bayesian agents. Any non-trivial bayesian agents would be similarly a terrifying alien god, much bigger than the universe they live in.
Do I understand you correctly here?
Each agent has a computable partial preference ordering x≤y that decides if it prefers x to y.
We’d like this partial relation to be complete (i.e., defined for all x,y) and transitive (i.e., x≤y and y≤z implies x≤z).
Now, if the relation is sufficiently non-trivial, it will be expensive to compute for some x,y. So it’s better left undefined...?
If so, I can surely relate to that, as I often struggle computing my preferences. Even if they are theoretically complete. But it seems to me the relationship is still defined, but might not be practical to compute.
It’s also possible to think of it in this way: You start out with partial preference ordering, and need to calculate one of its transitive closures. But that is computationally difficult, and not unique either.
I’m unsure what these observations add to the discussion, though.
I’d be surprised if you couldn’t come up with situations where completeness isn’t worth the cost—e.g. something like, to close some preference gaps you’d have to think for 100x as long, but if you close them all arbitrarily then you end up with intrasitivity.
This seems like a great point. Completeness requires closing all preference gaps, but if you do that inconsistently and violate transitivity then suddenly you are vulnerable to money-pumping.
I think that there is some sense in which the character in your example would be right, since:
Arrow’s theorem doesn’t bind approval voting.
Generalizations of Arrow’s theorem don’t bind probabilistic results, e.g., each candidate is chosen with some probability corresponding to the amount of votes he gets.
Like, if you had someone saying there was “a deep core of electoral process” which means that as they scale to important decisions means that you will necessarily get “highly defective electoral processes”, as illustrated in the classic example of the “dangers of the first pass the post system”. Well in that case it would be reasonable to wonder whether the assumptions of the theorem bind, or whether there is some system like approval voting which is much less shitty than the theorem provers were expecting, because the assumptions don’t hold.
The analogy is imperfect, though, since approval voting is a known decent system, whereas for AI systems we don’t have an example friendly AI.
Glad that I added the caveat.
Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
Here are some scenarios:
Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
Mmh, then it would be good to differentiate between:
There are coherence theorems that talk about some agents with some properties
There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don’t prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
For instance:
This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don’t bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I’m still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
The “in other words” doesn’t follow, since EV maximization can be more expensive than the shortcuts.
Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
Here are some words I wrote that don’t quite sit right but which I thought I’d still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it’s not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it’s just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
No need to respond/deeply engage, but I’d appreciate if you let me know if the above comments were too nitpicky.
If you want to see an example of this, I suggest John’s post here.