Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
Here are some scenarios:
Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
Mmh, then it would be good to differentiate between:
There are coherence theorems that talk about some agents with some properties
There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don’t prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
For instance:
“‘Coherence arguments’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.
This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don’t bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I’m still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
“This post gets somewhat technical and mathematical, but the point can be summarised as:
You are vulnerable to money pumps only to the extent to which you deviate from the von Neumann-Morgenstern axioms of expected utility.
In other words, using alternate decision theories is bad for your wealth.”
The “in other words” doesn’t follow, since EV maximization can be more expensive than the shortcuts.
Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
“Rephrasing again: we have a wide variety of mathematical theorems all spotlighting, from different angles, the fact that a plan lacking in clumsiness, is possessing of coherence.”
“The overall message here is that there is a set of qualitative behaviors and as long you do not engage in these qualitatively destructive behaviors, you will be behaving as if you have a utility function.”
“The view that utility maximizers are inevitable is supported by a number of coherence theories developed early on in game theory which show that any agent without a consistent utility function is exploitable in some sense.”
Here are some words I wrote that don’t quite sit right but which I thought I’d still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it’s not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it’s just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
No need to respond/deeply engage, but I’d appreciate if you let me know if the above comments were too nitpicky.
Glad that I added the caveat.
Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
Here are some scenarios:
Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
Mmh, then it would be good to differentiate between:
There are coherence theorems that talk about some agents with some properties
There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don’t prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
For instance:
This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don’t bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I’m still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
The “in other words” doesn’t follow, since EV maximization can be more expensive than the shortcuts.
Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
Here are some words I wrote that don’t quite sit right but which I thought I’d still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it’s not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it’s just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
No need to respond/deeply engage, but I’d appreciate if you let me know if the above comments were too nitpicky.