Although I do have a lot of experience in math, this article has not yet been thoroughly peer-reviewed. Check with someone with expertise on the topic before using this formula.
(So far, it has been peer-reviewed by a math teacher with wide-ranging mathematical expertise.)
Notes
For this article, I will be using “Expected value” and “impact” interchangeably.
TL;DR
It’s actually the average increase in effectiveness per increase in cost, divided by cost, and it seems that the two calculations just so happen to be different.
But why?
To show why this is, we’ll use the help of our friend, pigeon, and statistician in training, Todd.
Todd wants to estimate the expected happiness per rock (the currency the pigeons use) of donating to Jeremy’s Flavored Crumb Stand. Todd decides to calculate the expected impact of one extra rock donated with the following formula.
[1] This doesn’t seem very sensible. This estimate is way too high! The projected change in impact (What we’re interested in) is much higher than all of the actual changes in impact!
Todd explains his problem to his statistics professor, Taylor. Taylor explains to Todd, “Since we’re trying to find E(Vh(N+1)−(Vh(N))[2], and E[X]=∑xx⋅PX(x)[3][4], the cost-effectiveness of donating rocks to Jeremy’s Flavored Crumb Shop is ∑n(Vh(n+1)−Vh(n))⋅PN(n)[5][6]. This leads to a much more sensible output, and the new estimate looks like this.
After making some assumptions, he calculated his new estimate like this.
This is a much better result. He did make some assumptions, but the assumptions he made don’t seem too unreasonable, so he’s happy with his results.
Note that you can intuitively think of the change in the value of a (where a is the expected impact of Jeremy’s Flavored Crumb Shop) when we shift the probabilities to the left by one.
What about if the expected value of [donating twice as much] is twice as big?
First of all, note how we can express the impact of donating one rock to Jeremy’s Hotdog Shop as E(Vh(N+1)−(Vh(N)). Similarly, we can express arocks donated as E(Vh(N+a)−(Vh(N)). Now, we just need to know if E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)). But, it turns out that this is not the case. To show why this is true, consider the following example.
Vh(1)=1,Vh(2)=1,Vh(3)=3, and PN(1)=1. Therefore,
E(Vh(N+1)−(Vh(N))=0, since N can only be 1, Vh(1+1)=1,andVh(1)=1.
E(Vh(N+2)−(Vh(N))=2, since N can only be 1, Vh(1+2)=3,andVh(1)=1.
0⋅2≠2.
a=2.
E(Vh(N+2)−(Vh(N))≠2⋅E(Vh(N+1)−(Vh(N)).
E(Vh(N+a)−(Vh(N))≠a⋅E(Vh(N+1)−(Vh(N)).
Despite this, on average, E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)), since, on average, data is linear (i.e. E(f′′(x))=0, since f′′(x)=h(x), and E(h(x))(i.e., data is neither skewed to the more negative side nor the more positive side)(i.e., graphs of data are, on average, in the shape of a line) and when f(x) is linear, by definition, for all f(x), f(x)=mx+b, with some m and b.
Furthermore, for all f(x), where f(x)=mx+b, for all x,f(x+a)−f(x)=(m(x+a)+b)−(mx+b)
=((mx+ma)+b)−(mx+b)=mx+ma+b−mx−b=ma.
Therefore, for all linear functions f(x), a(f(x+1)−f(x))=f(x+a)−f(x), sincef(x+1)−f(x)=m, m⋅a=ma, and f(x+a)−f(x)=ma.
It’s important to note that, while on average, E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)) holds, when V_h(x) isn’t linear, a⋅E(Vh(N+1)−(Vh(N)) isn’t a perfect estimate for E(Vh(N+a)−(Vh(N)).
Sidenote
Similarly, effectivenesscost is a decent approximation for cost-effectiveness, as, given the way most charities operate, the impact of each “rock” donated is roughly independent of the charity’s total money that would have been donated if said “rock” was not donated. (i.e., if you were to graph impact as the y-axis and the number of dollars donated, the shape that would form would look like a line).
All functions that make this line shape can, by definition, be written in the form y=mx+b, where:
xis the total amount of donated “rocks” is x.
The impact is y.
m can be thought of as “Impact per rock.”
b can be thought of as “impact that the charity did regardless of donations (e.g. if the co-founders were nice to their parents regardless of that year’s donations). We want to know “impact per rock.”
Therefore, yx=m+bx, and what we want to know is m, and, as x gets larger and larger, m+bx trends towards m , sincebx trends towards 0.
Congrats! You made it to the end of this article! 🥳!
Now, Todd can finally sit, having finished all his winter break homework. [8]
(also, I wanted to make a picture of Todd eating a crumb, but this picture of Todd eating a hotdog turned out way cuter. 🐦⬛🌭 )
If you have any questions, comments, suggestions, corrections, or feedback, please feel free to put them in the comments!
where E(x) is the expected value of x, N is the number of rocks Jeremy’s Hotdog Stand gets in any given year, and Vh(N) is the value generated by Jeremy’s Hotdog Stand when Jeremy’s Hotdog Stand gets N rocks.
When Vh(somenumber) is known, but Vh(somenumber+1) isn’t known, use some estimation for Vh(somenumber+1). (e.g., Vh(somenumber)+0.375). The same goes for when PN(n) is unknown. (That is, we estimate PN(n))
Why Cost-Effectiveness ≠ Effectiveness/Cost
Disclaimer
Although I do have a lot of experience in math, this article has not yet been thoroughly peer-reviewed. Check with someone with expertise on the topic before using this formula.
(So far, it has been peer-reviewed by a math teacher with wide-ranging mathematical expertise.)
Notes
For this article, I will be using “Expected value” and “impact” interchangeably.
TL;DR
It’s actually the average increase in effectiveness per increase in cost, divided by cost, and it seems that the two calculations just so happen to be different.
But why?
To show why this is, we’ll use the help of our friend, pigeon, and statistician in training, Todd.
Todd wants to estimate the expected happiness per rock (the currency the pigeons use) of donating to Jeremy’s Flavored Crumb Stand. Todd decides to calculate the expected impact of one extra rock donated with the following formula.
Expected impact of charity, given N dollars donatedN dollars donatedHere’s the data.
[1] This doesn’t seem very sensible. This estimate is way too high! The projected change in impact (What we’re interested in) is much higher than all of the actual changes in impact!
Todd explains his problem to his statistics professor, Taylor. Taylor explains to Todd, “Since we’re trying to find E(Vh(N+1)−(Vh(N)) [2], and E[X]=∑xx⋅PX(x) [3][4], the cost-effectiveness of donating rocks to Jeremy’s Flavored Crumb Shop is ∑n(Vh(n+1)−Vh(n))⋅PN(n) [5][6]. This leads to a much more sensible output, and the new estimate looks like this.
After making some assumptions, he calculated his new estimate like this.
Calculations (Feel free to skip this part)
∑n(Vh(n+1)−Vh(n))⋅PN(n)
=(Vh(0+1)−Vh(0))⋅PN(0)
+(Vh(1+1)−Vh(1))⋅PN(1)
+(Vh(2+1)−Vh(2))⋅PN(2)
+(Vh(3+1)−Vh(3))⋅PN(3)
+(Vh(4+1)−Vh(4))⋅PN(4)
+(Vh(5+1)−Vh(5))⋅PN(5)
=(4.4−4)⋅15
+(4.8−4.4)⋅15
+(4.85[7]−4.8)⋅15
+(5.3−4.85[7])⋅0
+(6.5−5.3)⋅15
+(7.7[7]−6.5)⋅15
=[(0.4)⋅15]+[(0.4)⋅15]+[(0.05)⋅15]+[(0.45)⋅0]+[(1.2)⋅15]+[(1.2)⋅15]
=0.66.
Findings
This is a much better result. He did make some assumptions, but the assumptions he made don’t seem too unreasonable, so he’s happy with his results.
Note that you can intuitively think of the change in the value of a (where a is the expected impact of Jeremy’s Flavored Crumb Shop) when we shift the probabilities to the left by one.
What about if the expected value of [donating twice as much] is twice as big?
First of all, note how we can express the impact of donating one rock to Jeremy’s Hotdog Shop as E(Vh(N+1)−(Vh(N)). Similarly, we can express a rocks donated as E(Vh(N+a)−(Vh(N)). Now, we just need to know if E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)). But, it turns out that this is not the case. To show why this is true, consider the following example.
Vh(1)=1,Vh(2)=1,Vh(3)=3, and PN(1)=1. Therefore,
E(Vh(N+1)−(Vh(N))=0, since N can only be 1, Vh(1+1)=1, and Vh(1)=1.
E(Vh(N+2)−(Vh(N))=2, since N can only be 1, Vh(1+2)=3, and Vh(1)=1.
0⋅2≠2.
a=2.
E(Vh(N+2)−(Vh(N))≠2⋅E(Vh(N+1)−(Vh(N)).
E(Vh(N+a)−(Vh(N))≠a⋅E(Vh(N+1)−(Vh(N)).
Despite this, on average, E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)), since, on average, data is linear (i.e. E(f′′(x))=0, since f′′(x)=h(x), and E(h(x))(i.e., data is neither skewed to the more negative side nor the more positive side)(i.e., graphs of data are, on average, in the shape of a line) and when f(x) is linear, by definition, for all f(x), f(x)=mx+b, with some m and b.
Furthermore, for all f(x), where f(x)=mx+b, for all x,f(x+a)−f(x)=(m(x+a)+b)−(mx+b)
=((mx+ma)+b)−(mx+b)=mx+ma+b−mx−b=ma.
Therefore, for all linear functions f(x), a(f(x+1)−f(x))=f(x+a)−f(x), sincef(x+1)−f(x)=m, m⋅a=ma, and f(x+a)−f(x)=ma.
It’s important to note that, while on average, E(Vh(N+a)−(Vh(N))=a⋅E(Vh(N+1)−(Vh(N)) holds, when V_h(x) isn’t linear, a⋅E(Vh(N+1)−(Vh(N)) isn’t a perfect estimate for E(Vh(N+a)−(Vh(N)).
Sidenote
Similarly, e f f e c t i v e n e s s c o s t is a decent approximation for cost-effectiveness, as, given the way most charities operate, the impact of each “rock” donated is roughly independent of the charity’s total money that would have been donated if said “rock” was not donated. (i.e., if you were to graph impact as the y-axis and the number of dollars donated, the shape that would form would look like a line).
All functions that make this line shape can, by definition, be written in the form y=mx+b, where:
The impact is y.
Therefore, yx=m+bx, and what we want to know is m, and, as x gets larger and larger, m+bx trends towards m , sincebx trends towards 0.
Congrats! You made it to the end of this article! 🥳!
Now, Todd can finally sit, having finished all his winter break homework. [8]
(also, I wanted to make a picture of Todd eating a crumb, but this picture of Todd eating a hotdog turned out way cuter. 🐦⬛🌭 )
If you have any questions, comments, suggestions, corrections, or feedback, please feel free to put them in the comments!
The Avg. is short for the average (Arithmetic mean) yearly impact.
where E(x) is the expected value of x, N is the number of rocks Jeremy’s Hotdog Stand gets in any given year, and Vh(N) is the value generated by Jeremy’s Hotdog Stand when Jeremy’s Hotdog Stand gets N rocks.
PX(x) is the probability that some random variable X is equal to x.
We can see why this is true if we input X=Vh(N+1)−(Vh(N).
When Vh(some number) is known, but Vh(some number+1) isn’t known, use some estimation for Vh(some number+1). (e.g., Vh(some number)+0.375). The same goes for when PN(n) is unknown. (That is, we estimate PN(n))
Todd assumed that PN(0)=15,PN(1)=15,PN(2)=15,PN(4)=15,PN(5)=15,Vh(3)=4.85, and Vh(6)=7.7.
I imagine that pigeons get only one piece of homework for winter break.