Could you clarify how you aggregated the welfare range distributions from the 8 models you considered? I understand you gave the same weight to all of these 8 models, but I did not find the aggregation method here.
I would obtain the final cumulative distribution function (CDF) of the welfare range aggregating the CDFs of the 8 models with the geometric mean of odds, as Epoch did to aggregate judgement-based AI timelines. I think Jaime Sevilla would suggest using the mean in this case:
If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
However, I would say the 8 welfare range models are closer to the âall-considered views of expertsâ than to âmodels with mutually exclusive assumptionsâ. In addition:
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/âunweighted geometric mean of odds (and also the geometric mean) performed better than the weighted/âunweighted mean on Metaculusâ questions.
Samotsvetyaggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values (and the geometric mean is more similar to the geometric mean of odds than to the mean).
For the question âWhat is the unconditional probability of London being hit with a nuclear weapon in October?â, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/â10^-8) times the smallest.
Hi Bob,
Could you clarify how you aggregated the welfare range distributions from the 8 models you considered? I understand you gave the same weight to all of these 8 models, but I did not find the aggregation method here.
I would obtain the final cumulative distribution function (CDF) of the welfare range aggregating the CDFs of the 8 models with the geometric mean of odds, as Epoch did to aggregate judgement-based AI timelines. I think Jaime Sevilla would suggest using the mean in this case:
However, I would say the 8 welfare range models are closer to the âall-considered views of expertsâ than to âmodels with mutually exclusive assumptionsâ. In addition:
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/âunweighted geometric mean of odds (and also the geometric mean) performed better than the weighted/âunweighted mean on Metaculusâ questions.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values (and the geometric mean is more similar to the geometric mean of odds than to the mean).
For the question âWhat is the unconditional probability of London being hit with a nuclear weapon in October?â, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/â10^-8) times the smallest.