Do the expected values of the output probability distributions equal the point estimates that GiveWell gets from their non-probabilistic estimates? If not, how different are they?
More generally, are there any good write-ups about when and how the expected value of a model with multiple random variables differs from the same model filled out with the expected value of each of its random variables?
(I didn’t find the answer skimming through, but it might be there already—sorry!)
Do the expected values of the output probability distributions equal the point estimates that GiveWell gets from their non-probabilistic estimates?
No, but they’re close.
More generally, are there any good write-ups about when and how the expected value of a model with multiple random variables differs from the same model filled out with the expected value of each of its random variables?
Don’t know of any write-ups unfortunately, but the linearity of expectation means that the two are equal if and (generally?) only if the model is linear.
Long version:
When I run the Python versions of the models with point estimates, I get:
Charity
Value/$
GiveDirectly
0.0038
END
0.0211
DTW
0.0733
SCI
0.0370
Sightsavers
0.0394
Malaria Consortium
0.0316
HKI
0.0219
AMF
0.0240
The (mostly minor) deviations from the official GiveWell numbers are due to:
Different handling of floating point numbers between Google Sheets and Python
Rounded/truncated inputs
A couple models calculated the net present value of an annuity based on payments at the end of the each period instead of the beginning—I never got around to implementing this
Unknown errors
When I calculate the expected values of the probability distributions given the uniform input uncertainty, I get:
Charity
Value/$
GiveDirectly
0.0038
END
0.0204
DTW
0.0715
SCI
0.0354
Sightsavers
0.0383
Malaria Consortium
0.0300
HKI
0.0230
AMF
0.0231
I would generally call these values pretty close.
It’s worth noting though that the procedure I used to add uncertainty to inputs doesn’t produce inputs distributions that have the original point estimate as their expected value. By creating a 90% CI at ±20% of the original value, the CI is centered around the point estimate but since log normal distributions aren’t symmetric, the expected value is not precisely at the the point estimate. That explains some of the discrepancy.
The rest of the discrepancy is presumably from the non-linearity of the models (e.g. there are some logarithms in the models). In general, the linearity of expectation means that the expected value of a linear model of multiple random variables is exactly equal to the linear model of the expected values. For non-linear models, no such rule holds. (The relatively modest discrepancy between the point estimates and the expected values suggests that the models are “mostly” linear.)
Yes, how does the posterior mode differ from GiveWell’s point estimates, and how does this vary as a function of the input uncertainty (confidence interval length)?
Do the expected values of the output probability distributions equal the point estimates that GiveWell gets from their non-probabilistic estimates? If not, how different are they?
More generally, are there any good write-ups about when and how the expected value of a model with multiple random variables differs from the same model filled out with the expected value of each of its random variables?
(I didn’t find the answer skimming through, but it might be there already—sorry!)
Short version:
No, but they’re close.
Don’t know of any write-ups unfortunately, but the linearity of expectation means that the two are equal if and (generally?) only if the model is linear.
Long version:
When I run the Python versions of the models with point estimates, I get:
The (mostly minor) deviations from the official GiveWell numbers are due to:
Different handling of floating point numbers between Google Sheets and Python
Rounded/truncated inputs
A couple models calculated the net present value of an annuity based on payments at the end of the each period instead of the beginning—I never got around to implementing this
Unknown errors
When I calculate the expected values of the probability distributions given the uniform input uncertainty, I get:
I would generally call these values pretty close.
It’s worth noting though that the procedure I used to add uncertainty to inputs doesn’t produce inputs distributions that have the original point estimate as their expected value. By creating a 90% CI at ±20% of the original value, the CI is centered around the point estimate but since log normal distributions aren’t symmetric, the expected value is not precisely at the the point estimate. That explains some of the discrepancy.
The rest of the discrepancy is presumably from the non-linearity of the models (e.g. there are some logarithms in the models). In general, the linearity of expectation means that the expected value of a linear model of multiple random variables is exactly equal to the linear model of the expected values. For non-linear models, no such rule holds. (The relatively modest discrepancy between the point estimates and the expected values suggests that the models are “mostly” linear.)
Fabulous! This is extremely good to know and it’s also quite a relief!
Yes, how does the posterior mode differ from GiveWell’s point estimates, and how does this vary as a function of the input uncertainty (confidence interval length)?
(I think my other two recent comments sort of answer each of your questions.)