Could you clarify how you aggregated the welfare range distributions from the 8 models you considered? I understand you gave the same weight to all of these 8 models, but I did not find the aggregation method here.
I think Jaime Sevilla would suggest using the mean in this case:
If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
However, I wonder say the 8 welfare range models are closer to the âall-considered views of expertsâ than to âmodels with mutually exclusive assumptionsâ, in which case Jaime would recommend using the geometric mean of odds. In addition:
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/âunweighted geometric mean of odds (and also the geometric mean) performed better than the weighted/âunweighted mean on Metaculusâ questions.
Samotsvetyaggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values (and the geometric mean is more similar to the geometric mean of odds than to the mean).
For the question âWhat is the unconditional probability of London being hit with a nuclear weapon in October?â, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/â10^-8) times the smallest.
Then, we created a mixture model to aggregate the welfare range distributions across all models to factor in our uncertainty about which model is correct. Specifically, for a given organism and model, we modeled each distribution as a normal distribution with a 90% interval with lower and upper bounds equal to the fifth- and ninety-fifth percentile welfare ranges. Each of the eight models was assigned an equal probability of being correct. Then, we sampled 10,000 welfare ranges from this mixture model and stored the resulting 5th-, 50th-, and 95th-percentile welfare ranges in a data frame.
Hi @Bob Fischer,
Could you clarify how you aggregated the welfare range distributions from the 8 models you considered? I understand you gave the same weight to all of these 8 models, but I did not find the aggregation method here.
I think Jaime Sevilla would suggest using the mean in this case:
However, I wonder say the 8 welfare range models are closer to the âall-considered views of expertsâ than to âmodels with mutually exclusive assumptionsâ, in which case Jaime would recommend using the geometric mean of odds. In addition:
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/âunweighted geometric mean of odds (and also the geometric mean) performed better than the weighted/âunweighted mean on Metaculusâ questions.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values (and the geometric mean is more similar to the geometric mean of odds than to the mean).
For the question âWhat is the unconditional probability of London being hit with a nuclear weapon in October?â, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/â10^-8) times the smallest.
I think you aggregated them with the mean.