Further support comes from the fact the harder-to-quantify downsides to the intervention (e.g. lower freedom of choice) are comparatively insignificant.
I’d like to start off by thanking you for actually trying to estimate this, rather than just sweeping it under the rug. I really appreciate this, and think it reflects very poorly on other EA researchers that they don’t do this. This is a major step towards better capturing the true moral tradeoffs.
Unfortunately, that is the last good thing I have to say about this, because your actual methodology seems quite terrible.
You essentially asked 16 of your friends if it was a bad idea, the sort of sample size that should raise major red flags. There was no attempt to sample people disproportionately affected by the ban.
You ignore the fact that some of the responses make zero sense—like the person who, if understood correctly, thinks that a sugary drink ban would reduce their total lifetime utility by 2% but is not willing to spend a single dollar to avoid this colossal hit.
Despite this, if you take a simple average of their responses, the policy looks like a bad idea.
Instead of reporting this fact, you used a bizarre function, consisting of an average of the arithmetic mean of weighted arithmetic means for some respondents, the geometric mean of weighted arithmetic means for some respondents, and the geometric mean of weighted geometric mean of other respondents, to show the ‘average’ was low.
The weights appear designed to reduce the estimate, because they are (I think) based on the size of the standard errors, which naturally scale with estimate size.
The number of degrees of freedom in how you defined this formula is very large compared to your number of data points. I would be very surprised if you pre-registered this formula.
What’s more, aside from looking p-hacked, the formula is totally absurd. Here is a simple example:
In the final column, cell I18, you are using the arithmetic mean for 4 of the respondents as a hack to get around the fact that it would be embarrassing to report the geomean as it is zero.
But this makes your function non-monotonic! If we took the four respondents who gave literally zero as their answer (I5, I13, I16 and I17) and replaced ‘0’ with 0.0000000001, the utility cost of the program has surely increased (as some people dislike it more, and no-one dislikes it less). But because they are now non-zero, we apply the geomean (at least I assume we did—the logic is hardcoded, never a good sign) which means the bottom line number actually falls, going down by a factor of 96%.
So basically the existence of a single person who disvalues something by a sufficiently small but positive value can outweigh any number of other people saying something is a major cost.
I’d like to start off by thanking you for actually trying to estimate this, rather than just sweeping it under the rug. I really appreciate this, and think it reflects very poorly on other EA researchers that they don’t do this. This is a major step towards better capturing the true moral tradeoffs.
Unfortunately, that is the last good thing I have to say about this, because your actual methodology seems quite terrible.
You essentially asked 16 of your friends if it was a bad idea, the sort of sample size that should raise major red flags. There was no attempt to sample people disproportionately affected by the ban.
You ignore the fact that some of the responses make zero sense—like the person who, if understood correctly, thinks that a sugary drink ban would reduce their total lifetime utility by 2% but is not willing to spend a single dollar to avoid this colossal hit.
Despite this, if you take a simple average of their responses, the policy looks like a bad idea.
Instead of reporting this fact, you used a bizarre function, consisting of an average of the arithmetic mean of weighted arithmetic means for some respondents, the geometric mean of weighted arithmetic means for some respondents, and the geometric mean of weighted geometric mean of other respondents, to show the ‘average’ was low.
The weights appear designed to reduce the estimate, because they are (I think) based on the size of the standard errors, which naturally scale with estimate size.
The number of degrees of freedom in how you defined this formula is very large compared to your number of data points. I would be very surprised if you pre-registered this formula.
What’s more, aside from looking p-hacked, the formula is totally absurd. Here is a simple example:
In the final column, cell I18, you are using the arithmetic mean for 4 of the respondents as a hack to get around the fact that it would be embarrassing to report the geomean as it is zero.
But this makes your function non-monotonic! If we took the four respondents who gave literally zero as their answer (I5, I13, I16 and I17) and replaced ‘0’ with 0.0000000001, the utility cost of the program has surely increased (as some people dislike it more, and no-one dislikes it less). But because they are now non-zero, we apply the geomean (at least I assume we did—the logic is hardcoded, never a good sign) which means the bottom line number actually falls, going down by a factor of 96%.
So basically the existence of a single person who disvalues something by a sufficiently small but positive value can outweigh any number of other people saying something is a major cost.