I think a weighted geometric mean is unprincipled and won’t reflect expected value maximization (if w is meant to be a probability). It’s equivalent to weighing by the following, where X is the ratio of moral weights (or maybe conditional on being positive):
geomean(X)=eE[log(X)]
The expectation is in the exponent, but taking expectations is supposed to be the last thing we do, after aggregation, if we’re maximizing an expected value.
It’s not clear if it would be a good approximation of more principled approaches, but it seems like a compromise between the human-relative and animal-relative approaches and should (always?) give intermediate moral weights.
Also, you shouldn’t be taking the square root in the weighted geometric mean. You need the exponents to sum to 1, not 0.5.
EDIT: And you need to condition on both humans and the other animal having nonzero moral weight before taking the weighted geometric mean, or else you’ll get 0, infinite or undefined weighted geometric means. If you take the expected value of the conditional weighted geomean, you would have something like
g(X)=E[eE[log(X)|X>0,X<∞]]
but then g(X)∗g(X−1)>1 (and probably at least one of the two should be infinite, anyway), so you have a two envelopes problem again.
I think a weighted geometric mean is unprincipled and won’t reflect expected value maximization (if w is meant to be a probability).
I agree it is unprincipled, and I strongly endorse expected value maximisation in principle, but maybe using the geometric mean is still a good method in practice?
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/unweighted geometric mean performed better than the weighted/unweighted mean on Metaculus’ questions.
Samotsvetyaggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values.
Also, you shouldn’t be taking the square root in the weighted geometric mean. You need the exponents to sum to 1, not 0.5.
Thanks! Corrected.
you need to condition on both humans and the other animal having nonzero moral weight
I think the welfare range outputted by any given model should always be positive.
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest.
I agree it is unprincipled, and I strongly endorse expected value maximisation in princple, but maybe using the geometric mean is still a good method in practice?
I would want to know more about what our actual targets should plausibly be before making any such claim. I’m not sure we can infer much from your examples. Maybe an analogy is that we’re aggregating predictions of different perspectives, though?
I think the welfare range outputted by any given model should always be positive.
Other animals could fail to be conscious, and so have welfare ranges of 0.
I would want to know more about what our actual targets should plausibly be before making any such claim. I’m not sure we can infer much from your examples.
I agree it would be good to know which aggregation methods perform better under different conditions, and performance targets. The geometric mean is better than the mean, in the sense of achieving a lower Brier and log score, for all Metaculus’ questions. However, it might be this would not hold for a set of questions whose predictions are distributed more like the welfare ranges of the 12 models considered by Rethink Priorities. I would even be open to using different aggregation methods depending on the species, since the distribution of the 12 mean welfare ranges of each model varies across species.
Maybe an analogy is that we’re aggregating predictions of different perspectives, though?
If the forecasts come from “all-considered views of experts”, which I think is what you are calling “different perspectives”, Jaime Sevilla suggests using the geometric mean of odds if poorly calibrated outliers can be removed, or the median otherwise. For the case of welfare ranges, I do not think one can say there are poorly calibrated outliers. So, if one interpreted each of the 12 models as one forecaster[1], I guess Jaime would suggest determining the cumulative distribution function (CDF) of the welfare range from the geometric mean of the odds of the CDFs of the welfare ranges of the 12 models, as Epoch did for judgment-based AI timelines. I think using the geometric mean is also fine, as it performed marginally better than the geometric mean of odds in Metaculus’ questions.
Jaime agrees with using the mean if the forecasts come from “models with mutually exclusive assumptions”:
If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
However:
Models can have more or less mutually exclusive assumptions. The less they do, the more it makes sense to rely on the median, geometric mean, or geometric mean of odds instead of the mean.
There is not a strong distinction between all-considered views and the outputs of quantitative models, as the judgements of people are models themselves. Moreover, one should presumably prefer the all-considered views of the modellers over the models, as the former account for more information.
Somewhat relatedly, Rethink recommends using the median (not mean) welfare ranges.
Other animals could fail to be conscious, and so have welfare ranges of 0.
Sorry for not being clear. I agree with the above if lack of consciousness is defined as having a null welfare range. However:
In practice, consciousness has to be operationalised as satisfying certain properties to a desired extent.
I do not think one can say that, conditional on such properties not being satisfied to the desired extent, the welfare range is 0.
So I would say one should put no probability mass on a null welfare range, and that the CDF of the welfare range should be continuous[2]. In general, I assume zeros and infinities do not exist in the real world, even though they are useful in maths and physics to think about limiting processes.
In addition, I think the CDF of the welfare range should be smooth such that the probability density function (PDF) of the welfare range is continuous.
I think a weighted geometric mean is unprincipled and won’t reflect expected value maximization (if w is meant to be a probability). It’s equivalent to weighing by the following, where X is the ratio of moral weights (or maybe conditional on being positive):
geomean(X)=eE[log(X)]The expectation is in the exponent, but taking expectations is supposed to be the last thing we do, after aggregation, if we’re maximizing an expected value.
It’s not clear if it would be a good approximation of more principled approaches, but it seems like a compromise between the human-relative and animal-relative approaches and should (always?) give intermediate moral weights.
It and both the unmodified human-relative and animal-relative solutions also hide the differences between types of uncertainty. For example, I think conscious subsystems should be treated separately like the number of moral patients.
Also, you shouldn’t be taking the square root in the weighted geometric mean. You need the exponents to sum to 1, not 0.5.
EDIT: And you need to condition on both humans and the other animal having nonzero moral weight before taking the weighted geometric mean, or else you’ll get 0, infinite or undefined weighted geometric means. If you take the expected value of the conditional weighted geomean, you would have something like
g(X)=E[eE[log(X)|X>0,X<∞]]but then g(X)∗g(X−1)>1 (and probably at least one of the two should be infinite, anyway), so you have a two envelopes problem again.
Thanks for the reply!
I agree it is unprincipled, and I strongly endorse expected value maximisation in principle, but maybe using the geometric mean is still a good method in practice?
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/unweighted geometric mean performed better than the weighted/unweighted mean on Metaculus’ questions.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values.
Thanks! Corrected.
I think the welfare range outputted by any given model should always be positive.
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest.
I would want to know more about what our actual targets should plausibly be before making any such claim. I’m not sure we can infer much from your examples. Maybe an analogy is that we’re aggregating predictions of different perspectives, though?
Other animals could fail to be conscious, and so have welfare ranges of 0.
I agree it would be good to know which aggregation methods perform better under different conditions, and performance targets. The geometric mean is better than the mean, in the sense of achieving a lower Brier and log score, for all Metaculus’ questions. However, it might be this would not hold for a set of questions whose predictions are distributed more like the welfare ranges of the 12 models considered by Rethink Priorities. I would even be open to using different aggregation methods depending on the species, since the distribution of the 12 mean welfare ranges of each model varies across species.
If the forecasts come from “all-considered views of experts”, which I think is what you are calling “different perspectives”, Jaime Sevilla suggests using the geometric mean of odds if poorly calibrated outliers can be removed, or the median otherwise. For the case of welfare ranges, I do not think one can say there are poorly calibrated outliers. So, if one interpreted each of the 12 models as one forecaster[1], I guess Jaime would suggest determining the cumulative distribution function (CDF) of the welfare range from the geometric mean of the odds of the CDFs of the welfare ranges of the 12 models, as Epoch did for judgment-based AI timelines. I think using the geometric mean is also fine, as it performed marginally better than the geometric mean of odds in Metaculus’ questions.
Jaime agrees with using the mean if the forecasts come from “models with mutually exclusive assumptions”:
However:
Models can have more or less mutually exclusive assumptions. The less they do, the more it makes sense to rely on the median, geometric mean, or geometric mean of odds instead of the mean.
There is not a strong distinction between all-considered views and the outputs of quantitative models, as the judgements of people are models themselves. Moreover, one should presumably prefer the all-considered views of the modellers over the models, as the former account for more information.
Somewhat relatedly, Rethink recommends using the median (not mean) welfare ranges.
Sorry for not being clear. I agree with the above if lack of consciousness is defined as having a null welfare range. However:
In practice, consciousness has to be operationalised as satisfying certain properties to a desired extent.
I do not think one can say that, conditional on such properties not being satisfied to the desired extent, the welfare range is 0.
So I would say one should put no probability mass on a null welfare range, and that the CDF of the welfare range should be continuous[2]. In general, I assume zeros and infinities do not exist in the real world, even though they are useful in maths and physics to think about limiting processes.
This sounds like a moral parliament in some way?
Side note. I sometimes link to concepts I know you are aware of, but readers may not be.
In addition, I think the CDF of the welfare range should be smooth such that the probability density function (PDF) of the welfare range is continuous.