Why Cost-Effectiveness ≠ Effectiveness/Cost

Disclaimer

Although I do have a lot of experience in math, this article has not yet been thoroughly peer-reviewed. Check with someone with expertise on the topic before using this formula.

(So far, it has been peer-reviewed by a math teacher with wide-ranging mathematical expertise.)

Notes

For this article, I will be using “Expected value” and “impact” interchangeably.

TL;DR

It’s actually the average increase in effectiveness per increase in cost, divided by cost, and it seems that the two calculations just so happen to be different.

But why?

To show why this is, we’ll use the help of our friend, pigeon, and statistician in training, Todd.

Image generated by DALL-E, edited by me. Say “Hi” Todd!

Todd wants to estimate the expected happiness per rock (the currency the pigeons use) of donating to Jeremy’s Flavored Crumb Stand. Todd decides to calculate the expected impact of one extra rock donated with the following formula.

\frac{E x p e c t e d i m p a c t o f c h a r i t y, g i v e n N d o l l a r s donated}{N d o l l a r s donated}

Here’s the data.

^[1] This doesn’t seem very sensible. This estimate is way too high! The projected change in impact (What we’re interested in) is much higher than all of the actual changes in impact!

Todd explains his problem to his statistics professor, Taylor. Taylor explains to Todd, “Since we’re trying to find $E (V_{h} (N + 1) - (V_{h} (N))$ ^[2], and $E [X] = \sum_{x} x \cdot P_{X} (x)$ ^[3]^[4], the cost-effectiveness of donating rocks to Jeremy’s Flavored Crumb Shop is $\sum_{n} (V_{h} (n + 1) - V_{h} (n)) \cdot P_{N} (n)$ ^[5]^[6]. This leads to a much more sensible output, and the new estimate looks like this.

After making some assumptions, he calculated his new estimate like this.

Calculations (Feel free to skip this part)

$\sum_{n} (V_{h} (n + 1) - V_{h} (n)) \cdot P_{N} (n)$

$= (V_{h} (0 + 1) - V_{h} (0)) \cdot P_{N} (0)$

$+ (V_{h} (1 + 1) - V_{h} (1)) \cdot P_{N} (1)$

$+ (V_{h} (2 + 1) - V_{h} (2)) \cdot P_{N} (2)$

$+ (V_{h} (3 + 1) - V_{h} (3)) \cdot P_{N} (3)$

$+ (V_{h} (4 + 1) - V_{h} (4)) \cdot P_{N} (4)$

$+ (V_{h} (5 + 1) - V_{h} (5)) \cdot P_{N} (5)$

$= (4.4 - 4) \cdot \frac{1}{5}$

$+ (4.8 - 4.4) \cdot \frac{1}{5}$

$+ (4.85$ ^[7] $- 4.8) \cdot \frac{1}{5}$

$+ (5.3 - 4.85$ ^[7] $) \cdot 0$

$+ (6.5 - 5.3) \cdot \frac{1}{5}$

+ $(7.7$ ^[7] $- 6.5) \cdot \frac{1}{5}$

$= [(0.4) \cdot \frac{1}{5}] + [(0.4) \cdot \frac{1}{5}] + [(0.05) \cdot \frac{1}{5}] + [(0.45) \cdot 0] + [(1.2) \cdot \frac{1}{5}] + [(1.2) \cdot \frac{1}{5}]$

$= 0.66$ .

Findings

This is a much better result. He did make some assumptions, but the assumptions he made don’t seem too unreasonable, so he’s happy with his results.

Note that you can intuitively think of the change in the value of a (where a is the expected impact of Jeremy’s Flavored Crumb Shop) when we shift the probabilities to the left by one.

On the right, there is the original probability distribution, and on the left, there is the shifted probability distribution.

What about if the expected value of [donating twice as much] is twice as big?

First of all, note how we can express the impact of donating one rock to Jeremy’s Hotdog Shop as $E (V_{h} (N + 1) - (V_{h} (N))$ . Similarly, we can express $a$ rocks donated as $E (V_{h} (N + a) - (V_{h} (N)) .$ Now, we just need to know if $E (V_{h} (N + a) - (V_{h} (N)) = a \cdot E (V_{h} (N + 1) - (V_{h} (N)) .$ But, it turns out that this is not the case. To show why this is true, consider the following example.

$V_{h} (1) = 1, V_{h} (2) = 1, V_{h} (3) = 3,$ and $P_{N} (1) = 1.$ Therefore,

$E (V_{h} (N + 1) - (V_{h} (N)) = 0$ , since N can only be 1, $V_{h} (1 + 1) = 1, a n d V_{h} (1) = 1$ .
$E (V_{h} (N + 2) - (V_{h} (N)) = 2$ , since N can only be 1, $V_{h} (1 + 2) = 3, a n d V_{h} (1) = 1$ .
$0 \cdot 2 \neq 2$ .
$a = 2.$
$E (V_{h} (N + 2) - (V_{h} (N)) \neq 2 \cdot E (V_{h} (N + 1) - (V_{h} (N)) .$
$E (V_{h} (N + a) - (V_{h} (N)) \neq a \cdot E (V_{h} (N + 1) - (V_{h} (N)) .$

Despite this, on average, $E (V_{h} (N + a) - (V_{h} (N)) = a \cdot E (V_{h} (N + 1) - (V_{h} (N))$ , since, on average, data is linear (i.e. $E (f^{''} (x)) = 0$ , since $f^{''} (x) = h (x),$ and $E (h (x))$ (i.e., data is neither skewed to the more negative side nor the more positive side)(i.e., graphs of data are, on average, in the shape of a line) and when $f (x)$ is linear, by definition, for all $f (x)$ , $f (x) = m x + b$ , with some $m$ and $b$ .

Furthermore, for all $f (x)$ , where $f (x) = m x + b$ , for all x, $f (x + a) - f (x) = (m (x + a) + b) - (m x + b)$

$= ((m x + m a) + b) - (m x + b) = m x + m a + b - m x - b = m a .$

Therefore, for all linear functions $f (x)$ , $a (f (x + 1) - f (x)) = f (x + a) - f (x),$ since $f (x + 1) - f (x) = m$ , $m \cdot a = m a$ , and $f (x + a) - f (x) = m a$ .

It’s important to note that, while on average, $E (V_{h} (N + a) - (V_{h} (N)) = a \cdot E (V_{h} (N + 1) - (V_{h} (N))$ holds, when V_h(x) isn’t linear, $a \cdot E (V_{h} (N + 1) - (V_{h} (N))$ isn’t a perfect estimate for $E (V_{h} (N + a) - (V_{h} (N)) .$

Sidenote

Similarly, $\frac{e f f e c t i v e n e s s}{c o s t}$ is a decent approximation for cost-effectiveness, as, given the way most charities operate, the impact of each “rock” donated is roughly independent of the charity’s total money that would have been donated if said “rock” was not donated. (i.e., if you were to graph impact as the y-axis and the number of dollars donated, the shape that would form would look like a line).

All functions that make this line shape can, by definition, be written in the form $y = m x + b$ , where:

$x$ is the total amount of donated “rocks” is $x$ .
The impact is $y$ .
$m$ can be thought of as “Impact per rock.”
$b$ can be thought of as “impact that the charity did regardless of donations (e.g. if the co-founders were nice to their parents regardless of that year’s donations). We want to know “impact per rock.”

Therefore, $\frac{y}{x} = m + \frac{b}{x}$ , and what we want to know is $m$ , and, as $x$ gets larger and larger, $m + \frac{b}{x}$ trends towards $m$ , since $\frac{b}{x}$ trends towards $0$ .

Congrats! You made it to the end of this article! 🥳!

Now, Todd can finally sit, having finished all his winter break homework. ^[8]

(also, I wanted to make a picture of Todd eating a crumb, but this picture of Todd eating a hotdog turned out way cuter. 🐦‍⬛🌭 )

Image generated by DALL-E, edited by me.

If you have any questions, comments, suggestions, corrections, or feedback, please feel free to put them in the comments!

^
The Avg. is short for the average (Arithmetic mean) yearly impact.
^
where E(x) is the expected value of x, N is the number of rocks Jeremy’s Hotdog Stand gets in any given year, and $V_{h} (N)$ is the value generated by Jeremy’s Hotdog Stand when Jeremy’s Hotdog Stand gets N rocks.
^
$P_{X} (x)$ is the probability that some random variable X is equal to $x$ .
^
(This is only true for the discrete case. In the continuous case, the formula is $\int_{- \infty}^{\infty} x \cdot P_{X} (x) d x$ But that’s a different topic.)
^
We can see why this is true if we input $X = V_{h} (N + 1) - (V_{h} (N)$ .
^
When $V_{h} (s o m e number)$ is known, but $V_{h} (s o m e number + 1)$ isn’t known, use some estimation for $V_{h} (s o m e number + 1)$ . (e.g., $V_{h} (s o m e number) + 0.375)$ . The same goes for when $P_{N} (n)$ is unknown. (That is, we estimate $P_{N} (n)$ )
^
Todd assumed that $P_{N} (0) = \frac{1}{5}, P_{N} (1) = \frac{1}{5}, P_{N} (2) = \frac{1}{5}, P_{N} (4) = \frac{1}{5}, P_{N} (5) = \frac{1}{5}, V_{h} (3) = 4.85, and V_{h} (6) = 7.7 .$
^
I imagine that pigeons get only one piece of homework for winter break.

Why Cost-Effectiveness ≠ Effectiveness/​Cost