If I was to summarise your post in another way, it would be this:
The biggest problem with pooling is that a point estimate isn’t the end goal. In most applications you care about some transform of the estimate. In general, you’re better off keeping all of the information (ie your new prior) rather than just a point estimate of said prior.
I disagree with you that the most natural prior is “mixture distribution over experts”. (Although I wonder how much that actually ends up mattering in the real world).
I also think something “interesting” is being said here about the performance of estimates in the real world. If I had to say that the empirical performance of mean log-odds doing well, I would say that it means that “mixture distribution over experts” is not a great prior. But then, someone with my priors would say that...
If I was to summarise your post in another way, it would be this:
The biggest problem with pooling is that a point estimate isn’t the end goal. In most applications you care about some transform of the estimate. In general, you’re better off keeping all of the information (ie your new prior) rather than just a point estimate of said prior.
I disagree with you that the most natural prior is “mixture distribution over experts”. (Although I wonder how much that actually ends up mattering in the real world).
I also think something “interesting” is being said here about the performance of estimates in the real world. If I had to say that the empirical performance of mean log-odds doing well, I would say that it means that “mixture distribution over experts” is not a great prior. But then, someone with my priors would say that...