z(“researcher A”, “text X”) = (“geometric mean of text X according to researcher A” - mu(“geometric means of researcher A”))/sigma(“geometric means of researcher A”)?
This would ensure that each researcher has the same weight on the combined scores, as the sum of the z-scores for each researcher would be null.
So right now (i.e., in the next version), I am normalizing by share of total impact that each project takes according to each researcher, which feels like a more adequate normalization. But thanks for the tip.
One problem I’m having is that the means are not always positive, and hence the geommean isn’t well defined. I’m solving this by taking the mixture of the distributions, rather than working with the means, but it’s not all that trivial (e.g., it’s a bit computationally expensive)
That also makes sense to me, as the sum of the shares of total impact adds up to 1, each researcher has the same weight on the combined scores (as with the z-scores).
Thanks for the post!
Have you considered using z-scores:
z(“researcher A”, “text X”) = (“geometric mean of text X according to researcher A” - mu(“geometric means of researcher A”))/sigma(“geometric means of researcher A”)?
This would ensure that each researcher has the same weight on the combined scores, as the sum of the z-scores for each researcher would be null.
So right now (i.e., in the next version), I am normalizing by share of total impact that each project takes according to each researcher, which feels like a more adequate normalization. But thanks for the tip.
One problem I’m having is that the means are not always positive, and hence the geommean isn’t well defined. I’m solving this by taking the mixture of the distributions, rather than working with the means, but it’s not all that trivial (e.g., it’s a bit computationally expensive)
That also makes sense to me, as the sum of the shares of total impact adds up to 1, each researcher has the same weight on the combined scores (as with the z-scores).