[Stats4EA] Expectations are not Outcomes

This is the first in what might become a bunch of posts picking out issues from statistics and probability of relevance to EA. The format will be informal and fairly bite-size. None of this will be original, hopefully.

Expectations are not outcomes

Here we attempt to trim back the intuition that an expected value can be safely thought of as a representative value of the random variable.

Situation 1

A Rademacher random variable X takes the value 1 with probability ¹⁄₂ and otherwise −1. Its expectation is zero. We will almost surely never see any value other than −1 or 1.

This means that the expected value might not even be a number the distribution could produce. We might not even be able to get arbitrarily close to it.

Imagine walking up to a table in a casino and betting that the next roll of a die will be ⁷⁄₂.

Situation 2

Researchers create a natural language simulation model. Upon receiving a piece of text as stimulus it outputs a random short story. What is the expectation of the story?

Let’s think about the first word. There will be some implied probability distribution over a dictionary. Its expectation is some fractional combination of every word in the dictionary. Whatever that means, and whatever it is useful for, it is not the start of a legible story—and should not be used as such.

What is the expected length of the story? What would a solution to that problem mean? Could one, for example, print the expected story?

Situation 3

Distributions with very fat tails. For instance, the Cauchy distribution has an undefined expectation.

Implication

It is tempting to freely substitute an expectation in as a representative of a random variable. Suppose we used the following procedure in a blanket fashion:

We are faced with a decision depending on an uncertain outcome.
We take the expected value of the outcome.
We use the expectation as a scenario to plan around.

Step three is unsafe in principle—even if sometimes not in practice.

If there is a next time (the length of this series is currently fractional) I hope to touch on some scenarios less easily dismissed as the concerns of a pedant.