I realize that the model is not an important or decision-relevant part of this piece, but I also think this piece is really useful as a potential template for future longtermist research, so I wanted to point out some ways the model could be improved to help future model builders. Marie knows all this feedback and agrees with it, but it’s not a priority to update the model in the post. (Disclosure: I am Marie’s boss’s boss.)
Anyways, here are a few pieces of feedback:
1.) The Guesstimate uses 50% as a chance of failure and implements this by just multiplying the distribution by 0.5. This will get an accurate mean, but will not get an accurate 70% CI. The 70% CI actually will have to include 0 since 50% of the total output is failure and thus 0. The correct way to handle this is using a mixture distribution (or, more precisely, a zero-inflated distribution) rather than mere multiplication.
2.) The ranges for percentages include negative numbers and negative values are not plausible in this context. There are more elegant ways to handle this but for ease of use I just recommend clipping.
3.) The model is very sensitive to the absolute pandemic risk. Building a CI around the possible ranges in Michael’s database is very sensible, but a normally distributed range will end up taking the arithmetic mean of the ranges, which may be biased too high. I think we should take the geometric mean of these percentages instead, which I think suggests using a lognormal mean.
When combining these three pieces of feedback into Squigglepy I get the following code:
import numpy as np
import squigglepy as sq
from squigglepy.numbers import K, M
p_success = 0.5
ppe_reduces_relative_risk = sq.lognorm(0.005, 0.05, lclip=0.005)
absolute_risk = sq.lognorm(0.000002, 0.03, lclip=0.000002)
absolute_reduction_in_basis_points = sq.zero_inflated(1 - p_success, ppe_reduces_relative_risk * absolute_risk * 10*K)
print('Absolute reduction (x-risk basis points) 70% CI: {}'.format(sq.get_mean_and_ci(absolute_reduction_in_basis_points @ 10000, digits=1, credibility=70)))
cost = sq.lognorm(10*M, 50*M, lclip=10*M)
cost_per_xrisk_bp_in_m = (cost / M) / absolute_reduction_in_basis_points
cost_samples = cost_per_xrisk_bp_in_m @ 10000
p_waste = np.mean([c == np.inf for c in cost_samples])
otherwise_mean_ci = sq.get_mean_and_ci([c for c in cost_samples if c != np.inf], digits=1, credibility=70)
print('Cost per x-risk basis points ($M): {}% chance of being a waste... conditional on working, 70% CI is {}'.format(int(round(p_waste * 100)), otherwise_mean_ci))
Cost per x-risk basis points ($M): 50% chance of being a waste… conditional on working, 70% CI is {‘mean’: 10328.2, ‘ci_low’: 25.7, ‘ci_high’: 12860.2}
(You need to display the final output as conditional on working or otherwise there will be a bunch of infinite values and the mean will be infinity, which is not very helpful.
P.S. I saw 70% CI: {'mean': 1.5, 'ci_low': 0.0, 'ci_high': 0.2} and was surprised to see the mean so far outside the 70% CI but this is right—it just means the distribution is very heavily right-skewed.
I realize that the model is not an important or decision-relevant part of this piece, but I also think this piece is really useful as a potential template for future longtermist research, so I wanted to point out some ways the model could be improved to help future model builders. Marie knows all this feedback and agrees with it, but it’s not a priority to update the model in the post. (Disclosure: I am Marie’s boss’s boss.)
Anyways, here are a few pieces of feedback:
1.) The Guesstimate uses 50% as a chance of failure and implements this by just multiplying the distribution by 0.5. This will get an accurate mean, but will not get an accurate 70% CI. The 70% CI actually will have to include 0 since 50% of the total output is failure and thus 0. The correct way to handle this is using a mixture distribution (or, more precisely, a zero-inflated distribution) rather than mere multiplication.
2.) The ranges for percentages include negative numbers and negative values are not plausible in this context. There are more elegant ways to handle this but for ease of use I just recommend clipping.
3.) The model is very sensitive to the absolute pandemic risk. Building a CI around the possible ranges in Michael’s database is very sensible, but a normally distributed range will end up taking the arithmetic mean of the ranges, which may be biased too high. I think we should take the geometric mean of these percentages instead, which I think suggests using a lognormal mean.
When combining these three pieces of feedback into Squigglepy I get the following code:
The output for this model is:
Absolute reduction (x-risk basis points) 70% CI: {‘mean’: 1.5, ‘ci_low’: 0.0, ‘ci_high’: 0.2}
Cost per x-risk basis points ($M): 50% chance of being a waste… conditional on working, 70% CI is {‘mean’: 10328.2, ‘ci_low’: 25.7, ‘ci_high’: 12860.2}
(You need to display the final output as conditional on working or otherwise there will be a bunch of infinite values and the mean will be infinity, which is not very helpful.
P.S. I saw
70% CI: {'mean': 1.5, 'ci_low': 0.0, 'ci_high': 0.2}
and was surprised to see the mean so far outside the 70% CI but this is right—it just means the distribution is very heavily right-skewed.I get USD 12 million per basis point in expectation.
Used makedistribution.com to find beta distributions with the appropriate 70% CIs.
Link to model