I would want to know more about what our actual targets should plausibly be before making any such claim. Iâm not sure we can infer much from your examples.
I agree it would be good to know which aggregation methods perform better under different conditions, and performance targets. The geometric mean is better than the mean, in the sense of achieving a lower Brier and log score, for all Metaculusâ questions. However, it might be this would not hold for a set of questions whose predictions are distributed more like the welfare ranges of the 12 models considered by Rethink Priorities. I would even be open to using different aggregation methods depending on the species, since the distribution of the 12 mean welfare ranges of each model varies across species.
Maybe an analogy is that weâre aggregating predictions of different perspectives, though?
If the forecasts come from âall-considered views of expertsâ, which I think is what you are calling âdifferent perspectivesâ, Jaime Sevilla suggests using the geometric mean of odds if poorly calibrated outliers can be removed, or the median otherwise. For the case of welfare ranges, I do not think one can say there are poorly calibrated outliers. So, if one interpreted each of the 12 models as one forecaster[1], I guess Jaime would suggest determining the cumulative distribution function (CDF) of the welfare range from the geometric mean of the odds of the CDFs of the welfare ranges of the 12 models, as Epoch did for judgment-based AI timelines. I think using the geometric mean is also fine, as it performed marginally better than the geometric mean of odds in Metaculusâ questions.
Jaime agrees with using the mean if the forecasts come from âmodels with mutually exclusive assumptionsâ:
If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
However:
Models can have more or less mutually exclusive assumptions. The less they do, the more it makes sense to rely on the median, geometric mean, or geometric mean of odds instead of the mean.
There is not a strong distinction between all-considered views and the outputs of quantitative models, as the judgements of people are models themselves. Moreover, one should presumably prefer the all-considered views of the modellers over the models, as the former account for more information.
Somewhat relatedly, Rethink recommends using the median (not mean) welfare ranges.
Other animals could fail to be conscious, and so have welfare ranges of 0.
Sorry for not being clear. I agree with the above if lack of consciousness is defined as having a null welfare range. However:
In practice, consciousness has to be operationalised as satisfying certain properties to a desired extent.
I do not think one can say that, conditional on such properties not being satisfied to the desired extent, the welfare range is 0.
So I would say one should put no probability mass on a null welfare range, and that the CDF of the welfare range should be continuous[2]. In general, I assume zeros and infinities do not exist in the real world, even though they are useful in maths and physics to think about limiting processes.
In addition, I think the CDF of the welfare range should be smooth such that the probability density function (PDF) of the welfare range is continuous.
I agree it would be good to know which aggregation methods perform better under different conditions, and performance targets. The geometric mean is better than the mean, in the sense of achieving a lower Brier and log score, for all Metaculusâ questions. However, it might be this would not hold for a set of questions whose predictions are distributed more like the welfare ranges of the 12 models considered by Rethink Priorities. I would even be open to using different aggregation methods depending on the species, since the distribution of the 12 mean welfare ranges of each model varies across species.
If the forecasts come from âall-considered views of expertsâ, which I think is what you are calling âdifferent perspectivesâ, Jaime Sevilla suggests using the geometric mean of odds if poorly calibrated outliers can be removed, or the median otherwise. For the case of welfare ranges, I do not think one can say there are poorly calibrated outliers. So, if one interpreted each of the 12 models as one forecaster[1], I guess Jaime would suggest determining the cumulative distribution function (CDF) of the welfare range from the geometric mean of the odds of the CDFs of the welfare ranges of the 12 models, as Epoch did for judgment-based AI timelines. I think using the geometric mean is also fine, as it performed marginally better than the geometric mean of odds in Metaculusâ questions.
Jaime agrees with using the mean if the forecasts come from âmodels with mutually exclusive assumptionsâ:
However:
Models can have more or less mutually exclusive assumptions. The less they do, the more it makes sense to rely on the median, geometric mean, or geometric mean of odds instead of the mean.
There is not a strong distinction between all-considered views and the outputs of quantitative models, as the judgements of people are models themselves. Moreover, one should presumably prefer the all-considered views of the modellers over the models, as the former account for more information.
Somewhat relatedly, Rethink recommends using the median (not mean) welfare ranges.
Sorry for not being clear. I agree with the above if lack of consciousness is defined as having a null welfare range. However:
In practice, consciousness has to be operationalised as satisfying certain properties to a desired extent.
I do not think one can say that, conditional on such properties not being satisfied to the desired extent, the welfare range is 0.
So I would say one should put no probability mass on a null welfare range, and that the CDF of the welfare range should be continuous[2]. In general, I assume zeros and infinities do not exist in the real world, even though they are useful in maths and physics to think about limiting processes.
This sounds like a moral parliament in some way?
Side note. I sometimes link to concepts I know you are aware of, but readers may not be.
In addition, I think the CDF of the welfare range should be smooth such that the probability density function (PDF) of the welfare range is continuous.