This is great! Your super-simple code is helping me learn the basics of Squiggle. Thanks for including it.
I currently use point estimates for cost-effectiveness analyses but you are convincing me that using distributions would be better. A couple of thoughts/questions:
When is a point-estimate approach much worse than a distribution approach? For example, what is it about the Fermi paradox that leads to such differing results between the methods?
If I suspect a lognormal distribution, is there an easy way to turn discrete estimates into a aggregate estimate? Suppose I want to estimate how many eggs my pet toad will lay. One source suggests it’s 50, another says 5000. The lognormal seems like the right model here, but it takes me several minutes with pen, paper and calculator to estimate the mean of the underlying lognormal distribution. Ideally I’d like a ‘mean’ that approximates this for me, but neither arithmetic, geometric nor harmonic means seem appropriate.
it takes me several minutes with pen, paper and calculator to estimate the mean of the underlying lognormal distribution
You can arrive at a lognormal from the 95% c.i. in Squiggle, e.g., by writting 50 to 5000, and it will give you the mean automatically. You could also have a mixture of distributions, e.g., mx([dist_approach_1, dist_approach_2], [0.5, 0.5]).
My preferred approach would be to look at the distributions produced by the two approaches, and then write a lognormal given what I think are plausible lower and upper bounds.
is there an easy way to turn discrete estimates into a aggregate estimate
Use processes which produce distributional estimates from the beginning :)
When is a point-estimate approach much worse than a distribution approach? For example, what is it about the Fermi paradox that leads to such differing results between the methods?
The general answer, for me, is that the true shape of the uncertainty is distributional, and so approaches which don’t capture the true shape of the uncertainty will produce worse answers.
The specific answer is that e.g., “probability of life per planet” doesn’t capture our uncertainty about the number of planets with life, i.e., we aren’t certain that there are exactly X planets, for a probability of X/num planets. Same applies to other factors. Answering this quickly, lmk if it makes sense.
Thanks for the reply and sorry for the long delay! I decided to dive in and write a post about it.
I check when using distributions is much better than point-estimates: it’s when the ratio between upper/lower confidence bounds is high—in situations of high uncertainty like the probability-of-life example you mentioned.
I test your intuition that using lognormal is usually better than normal (and end up agreeing with you)
I check whether the lognormal distribution can be used to find a more reliable mean of two point estimates, but conclude that it’s no good
This is great! Your super-simple code is helping me learn the basics of Squiggle. Thanks for including it.
I currently use point estimates for cost-effectiveness analyses but you are convincing me that using distributions would be better. A couple of thoughts/questions:
When is a point-estimate approach much worse than a distribution approach? For example, what is it about the Fermi paradox that leads to such differing results between the methods?
If I suspect a lognormal distribution, is there an easy way to turn discrete estimates into a aggregate estimate? Suppose I want to estimate how many eggs my pet toad will lay. One source suggests it’s 50, another says 5000. The lognormal seems like the right model here, but it takes me several minutes with pen, paper and calculator to estimate the mean of the underlying lognormal distribution. Ideally I’d like a ‘mean’ that approximates this for me, but neither arithmetic, geometric nor harmonic means seem appropriate.
I welcome help from anyone!!
You can arrive at a lognormal from the 95% c.i. in Squiggle, e.g., by writting 50 to 5000, and it will give you the mean automatically. You could also have a mixture of distributions, e.g., mx([dist_approach_1, dist_approach_2], [0.5, 0.5]).
My preferred approach would be to look at the distributions produced by the two approaches, and then write a lognormal given what I think are plausible lower and upper bounds.
Use processes which produce distributional estimates from the beginning :)
The general answer, for me, is that the true shape of the uncertainty is distributional, and so approaches which don’t capture the true shape of the uncertainty will produce worse answers.
The specific answer is that e.g., “probability of life per planet” doesn’t capture our uncertainty about the number of planets with life, i.e., we aren’t certain that there are exactly X planets, for a probability of X/num planets. Same applies to other factors. Answering this quickly, lmk if it makes sense.
Thanks for the reply and sorry for the long delay! I decided to dive in and write a post about it.
I check when using distributions is much better than point-estimates: it’s when the ratio between upper/lower confidence bounds is high—in situations of high uncertainty like the probability-of-life example you mentioned.
I test your intuition that using lognormal is usually better than normal (and end up agreeing with you)
I check whether the lognormal distribution can be used to find a more reliable mean of two point estimates, but conclude that it’s no good
Thanks!