Thanks a bunch for your question, Matt. I can speak to the philosophical side of this; Laura has some practical comments below. I do think you’re right that—and in fact our team discussed the possibility that—we ought to be treating the welfare range estimates as correlated variables. However, we weren’t totally sure that that’s the best way forward, as it may treat the models with more deference than makes sense. Here’s the rough thought. We need to distinguish between (a) philosophical theories about the relationship between the proxies and welfare ranges and (b) models that attempt to express the relationship between proxies and welfare range estimates. We assume that there’s some correct theory about the relationship between the proxies and welfare ranges, but while there might be a best model for expressing the relationship between proxies and welfare range estimates, we definitely don’t assume that we’ve found it. In part, this is because of ordinary points about uncertainty. Additionally, it’s because the philosophical theories underdetermine the models: lots of models are compatible with any given philosophical theory; so, we just had to choose representative possibilities. (The 1-point-per-proxy and aggregation-by-addition approaches, for instance, are basically justified by appeal to simplicity and ignorance. But, of course, the philosophical theory behind them is compatible with many other scoring and aggregation methods.) So, there’s a worry that if we set things up the way you’re describing, we’re treating the models as though they were the philosophical theories, whereas it might make more sense not to do that and then make other adjustments for practical purposes in specific decision contexts if we’re worried about this.
Laura’s practical notes on this:
A change like the one you’re suggesting would likely decrease the variance in the estimates of f(), since if you assume the welfare ranges are independent variables, you’d get samples where the undiluted experiences model is dominating the welfare range for, say, shrimp, and the neuron count model is dominating the welfare range for pigs. I suggest a quick practical way of dealing with this would be to cut off values of f() below the 2.5th percentile and 97.5th percentile.
Or, even better, I suggest sorting the welfare ranges from least to greatest, then using pairs of the ith-indexed welfare ranges for the ith estimate of f(). Since each welfare model is given the same weight, I predict this’ll most accurately match up welfare range values from the same welfare model. (e.g. the first 11% will be neuron count welfare ranges, etc.)
Ultimately, however, given all the uncertainty in whether our models are accurately tracking reality, it might not be advisable to reduce the variance as such.
Thanks, this is great information! The concern you raised regarding distinguishing between philosophical theories and models makes a lot of sense. With that said, I don’t currently feel super satisfied with the practical steps you suggested.
On the first note, the impact of the correlation depends on the structure of f. Suppose I’m trying to estimate the total harms of eating chicken/pork, so we have something like y=c1∗welfarerangeofpigs+c2∗welfarerangeofchickens. In this case, treating the welfare ranges of chickens and pigs as correlated will increasethe variance of y. On the flip side, if we’re trying to estimate the welfare impact of switching from eating chicken to eating pork, we have something like y=c3∗welfarerangeofchickens−c4∗welfarerangeofpigs. In that case, treating the welfare ranges of pigs and chickens as correlated will decreasethe variance of y. Trying to address this in an ad-hoc manner seems like it’s pretty challenging.
On the second note, I think that’s basically treating the welfare capacities of e.g. pigs and chickens as perfectly correlated with one another. That seems extreme to me, since I think a substantial portion of the uncertainty in the welfare rages is coming from uncertainty as to which traits each species has, not which philosophical theory of welfare is correct.
I come away still thinking that the procedure I suggested seems like the most workable of the approaches mentioned so far. To put a little more rigor to things, here are some examples of plotting the welfare range estimates of chickens and pigs against one another with the different methods (uncorrelated sampling from the respective mixture distributions, sampling from the ordered distributions, and pair-wise sampling from the constituent models). In addition, there are some plots showing the impact of the different sampling methods on some toy analyses of the welfare impact of eating chicken/pork and the impact of switching from eating chicken to eating pork (note that the actual numbers are not intended to be very representative). You can see that the trimming approach only make sense in the second case, and that the paired sampling from constituent models approach produces distributions in between those for the uncorrelated case and those for the ordered case.
Note that when using the pair-wise sampling from constituent models approach, pigs and chickens are more strongly correlated with one another than many other pairs of species are. Here is what the correlation between chickens and shrimp looks like, for example:
Hey, thanks for this detailed reply! When I said “practical”, I more meant “simple things that people can do without needing to download and work directly with the code for the welfare ranges.” In this sense, I don’t entirely agree that your solution is the most workable of them (assuming independence probably would be). But I agree—pairwise sampling is the best method if you have the access and ability to manipulate the code! (I also think that the perfect correlation you graphed makes the second suggestion probably worse than just assuming perfect independence, so thanks!)
Thanks a bunch for your question, Matt. I can speak to the philosophical side of this; Laura has some practical comments below. I do think you’re right that—and in fact our team discussed the possibility that—we ought to be treating the welfare range estimates as correlated variables. However, we weren’t totally sure that that’s the best way forward, as it may treat the models with more deference than makes sense.
Here’s the rough thought. We need to distinguish between (a) philosophical theories about the relationship between the proxies and welfare ranges and (b) models that attempt to express the relationship between proxies and welfare range estimates. We assume that there’s some correct theory about the relationship between the proxies and welfare ranges, but while there might be a best model for expressing the relationship between proxies and welfare range estimates, we definitely don’t assume that we’ve found it. In part, this is because of ordinary points about uncertainty. Additionally, it’s because the philosophical theories underdetermine the models: lots of models are compatible with any given philosophical theory; so, we just had to choose representative possibilities. (The 1-point-per-proxy and aggregation-by-addition approaches, for instance, are basically justified by appeal to simplicity and ignorance. But, of course, the philosophical theory behind them is compatible with many other scoring and aggregation methods.) So, there’s a worry that if we set things up the way you’re describing, we’re treating the models as though they were the philosophical theories, whereas it might make more sense not to do that and then make other adjustments for practical purposes in specific decision contexts if we’re worried about this.
Laura’s practical notes on this:
A change like the one you’re suggesting would likely decrease the variance in the estimates of f(), since if you assume the welfare ranges are independent variables, you’d get samples where the undiluted experiences model is dominating the welfare range for, say, shrimp, and the neuron count model is dominating the welfare range for pigs. I suggest a quick practical way of dealing with this would be to cut off values of f() below the 2.5th percentile and 97.5th percentile.
Or, even better, I suggest sorting the welfare ranges from least to greatest, then using pairs of the ith-indexed welfare ranges for the ith estimate of f(). Since each welfare model is given the same weight, I predict this’ll most accurately match up welfare range values from the same welfare model. (e.g. the first 11% will be neuron count welfare ranges, etc.)
Ultimately, however, given all the uncertainty in whether our models are accurately tracking reality, it might not be advisable to reduce the variance as such.
Thanks, this is great information! The concern you raised regarding distinguishing between philosophical theories and models makes a lot of sense. With that said, I don’t currently feel super satisfied with the practical steps you suggested.
On the first note, the impact of the correlation depends on the structure of f. Suppose I’m trying to estimate the total harms of eating chicken/pork, so we have something like y=c1∗welfare range of pigs+c2∗welfare range of chickens. In this case, treating the welfare ranges of chickens and pigs as correlated will increase the variance of y. On the flip side, if we’re trying to estimate the welfare impact of switching from eating chicken to eating pork, we have something like y=c3∗welfare range of chickens−c4∗welfare range of pigs. In that case, treating the welfare ranges of pigs and chickens as correlated will decrease the variance of y. Trying to address this in an ad-hoc manner seems like it’s pretty challenging.
On the second note, I think that’s basically treating the welfare capacities of e.g. pigs and chickens as perfectly correlated with one another. That seems extreme to me, since I think a substantial portion of the uncertainty in the welfare rages is coming from uncertainty as to which traits each species has, not which philosophical theory of welfare is correct.
I come away still thinking that the procedure I suggested seems like the most workable of the approaches mentioned so far. To put a little more rigor to things, here are some examples of plotting the welfare range estimates of chickens and pigs against one another with the different methods (uncorrelated sampling from the respective mixture distributions, sampling from the ordered distributions, and pair-wise sampling from the constituent models). In addition, there are some plots showing the impact of the different sampling methods on some toy analyses of the welfare impact of eating chicken/pork and the impact of switching from eating chicken to eating pork (note that the actual numbers are not intended to be very representative). You can see that the trimming approach only make sense in the second case, and that the paired sampling from constituent models approach produces distributions in between those for the uncorrelated case and those for the ordered case.
Note that when using the pair-wise sampling from constituent models approach, pigs and chickens are more strongly correlated with one another than many other pairs of species are. Here is what the correlation between chickens and shrimp looks like, for example:
Hey, thanks for this detailed reply!
When I said “practical”, I more meant “simple things that people can do without needing to download and work directly with the code for the welfare ranges.” In this sense, I don’t entirely agree that your solution is the most workable of them (assuming independence probably would be). But I agree—pairwise sampling is the best method if you have the access and ability to manipulate the code! (I also think that the perfect correlation you graphed makes the second suggestion probably worse than just assuming perfect independence, so thanks!)
Yeah that makes complete sense, it was a pain to get the pairwise sampling working.