A Suspicious Pattern

There is a pattern that shows up in many of the toys we like to play with around here: the pattern of maximizing the expected logarithm.

Nash bargaining is a method for aggregating preferences without a means to directly compare them. When Nash bargaining, you are maximizing the expected logarithm of utility, where the expectation is over uncertainty about which person you are.

Kelly betting is an extremely useful tool for not putting all your future wealth in one basket. When Kelly betting, you are maximizing the expected logarithm of your wealth.

The log scoring rule is a very natural way to extract beliefs. When maximizing your log score, you are maximizing the expectation of the logarithm of the probability you assign to the right answer. This is one example of a general pattern. Maximizations of expected logarithms show up all over information theory, often phrased as minimizing the negative of the expected logarithm.

Why does maximization of the expected logarithm keep showing up?

One answer is that all of the instances of it showing up are actually related. In my previous two posts, I made some connections between Nash bargaining and Kelly betting. The fact that Kelly betting can be used to model Bayesian updating illustrates its relationship with the information theory applications. To a certain extent, there is really only one instance of this pattern.

However, I think that there is another argument for why you should expect this pattern to show up a lot, which is that the pattern is very simple. More simple than it looks on the surface. It only looks complicated because mathematicians have failed us.

The Geometric Integral

One of the most underrated concepts in mathematics is the geometric integral, given by $\prod f (x)^{d x} = e^{\int ln (f (x)) d x}$ . (The fact that I couldn’t easily get a latex symbol that looks like an elongated P is a testament to its underratedness.) The geometric integral is just like the standard integral, but everywhere you would add, you multiply instead. Defining it in terms of the standard (arithmetic) integral with logs and exponents is insulting to its nature, and I don’t recommend thinking of it that way. (You wouldn’t define $x \times y$ as $e^{ln (x) + ln (y)}$ .) Instead, you should just think of it as the multiplicative version of the integral. However, using logs and exponentiation, it is the fastest way to get the definition across.

I think people don’t practice thinking multiplicatively enough, which causes them to throw inherently multiplicative things into logarithms, so they can think about them additively.

I will use the phrase geometric expectation when I take a geometric integral over a probability distribution, and I will use the symbol $G$ . Thus, we will write $G_{x \sim P} f (x) = e^{E_{x \sim P} ln f (x)}$ .

Discrete Geometric Expectations

Luckily, most of the time, we will want to talk about discrete geometric expectations, where we can use (possibly infinite) sums rather than integrals and (possibly infinite) products rather than geometric integrals.

Let us gain some intuition for discrete geometric expectations by going though some simple cases. We will start with a uniform distribution on a finite set.

Let $X = {x_{1}, \dots, x_{n}}$ be a finite set with $n$ elements. Let $f : X \to R^{\geq 0}$ be a function that assigns a nonnegative value to each $x_{i}$ . Let $P$ be the uniform probability distribution on $X$ that assigns probability $\frac{1}{n}$ to each element of $X$ .

We have that $E_{x \sim P} f (x) = \sum_{x \in X} P (x) f (x) = \sum_{i = 1}^{n} \frac{f (x_{i})}{n} = \frac{f (x_{1}) + \dots + f (x_{n})}{n}$ . This is just the average, or arithmetic mean of the $f$ values.

We can compute $G_{x \sim P} f (x)$ using the above formula $G_{x \sim P} f (x) = e^{E_{x \sim P} ln f (x)}$ . Here, we get

$G_{x \sim P} f (x) = e^{E_{x \sim P} ln f (x)} = e^{\frac{ln f (x_{1}) + \dots + ln f (x_{n})}{n}} = \sqrt[n]{e^{ln f (x_{1})} \dots e^{ln f (x_{n})}} = \sqrt[n]{f (x_{1}) \dots f (x_{n})}$ .

Thus, the geometric expectation of the uniform distribution is just the geometric mean of the $f$ values. Hence the name.

The infinite non-uniform discrete case is not much more difficult. If $X$ is a finite or countably infinite set, $f : X \to R^{\geq 0}$ assigns a nonnegative value to each $x \in X$ , and $P$ is a probability distribution on $Y$ , then $E_{x \sim P} f (x) = \sum_{x \in X} P (x) f (x)$ , and

$G_{x \sim P} f (x) = e^{E_{x \sim P} ln f (x)} = e^{\sum_{x \in X} P (x) ln f (x)} = \prod_{x \in X} e^{P (x) ln f (x)} = \prod_{x \in X} f (x)^{P (X)}$ .

These two values can be thought of as a weighted arithmetic mean and weighted geometric mean respectively.

When taking the geometric expectation of $f$ with respect to $P$ , you just take the product over all $x \in X$ of $f (x)^{P (x)}$ . You are multiplying together all the $f$ values, but the exponent $P (x)$ is saying that values with less probability get less weight (or less “power”).

Maximizing the Geometric Expectation

Maximization is invariant under applying a monotonic function. Thus ${a r g m a x}_{y \in Y} E_{x \sim P} ln (f (x, y)) = {a r g m a x}_{y \in Y} e^{E_{x \sim P} ln (f (x, y))} = {a r g m a x}_{y \in Y} G_{x \sim P} f (x, y)$ .

So every time we maximize an expectation of a logarithm, this was equivalent to just maximizing the geometric expectation.

Rather than saying “maximize the geometric expectation”, I will just say “geometrically maximize”. For example, when Kelly betting, we are just geometrically maximizing wealth. Note that the unit on the geometric expectation of wealth is dollars. The unit on the expected logarithm of dollars is… confusing? It is log dollars, but like, you add it instead of multiplying? I don’t know how it works. What even is a log dollar?

The geometric expectation just makes more sense than the expected logarithm. It is a real thing with a real meaning. However, when we put the geometric expectation inside of a maximization, and we don’t naturally think in terms of geometric expectations, we are tempted to take a logarithm of the whole thing, (which we can do because the maximization eats the monotonic function), and end up with maximizing the expected logarithm.

Geometric Rationality

When Kelly betting, you are really just geometrically maximizing wealth.

When Nash Bargaining, you are really just geometrically maximizing expected utility with respect to your uncertainty about your identity. In defense of Nash bargaining, It is normally presented as maximizing the product of the utilities. However, if you don’t already have the concept of geometric expectation, it is tempting to convert it to an expected logarithm so you can handle the weighted case and think of it as being about uncertainty behind the veil of ignorance. (Also, it is more like the square root of the product of the utilities rather than the product of the utilities.)

When maximizing log score, you are really just geometrically maximizing the probability you assign your observation.

I will informally use the phrase “geometric rationality” to refer to techniques that tend to geometrically maximize natural features (of the world or the self). I want to raise to attention the hypothesis that humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality, and that around here, the local memes have moved us too far off this path.

The Geometric Expectation

A Suspicious Pattern

The Geometric Integral

Discrete Geometric Expectations

Maximizing the Geometric Expectation

Geometric Rationality