One thing that might help would be “meta-forecasting”. We could later have some expert forecasters predict the accuracy of average statements made by different groups in different domains. I’d predict that they would have given pretty poor scores to most of these groups.
I agree with your meta-meta-forecast.