Your 4 cluster headache groups contribute about equally to the total number of cluster headaches if you multiply group size by # of CH’s. (The top 2% actually contribute a bit less). That’s my entire point. I’m not sure if you disagree?
I would disagree for the following reason. For a group to contribute equally it needs to have both its average and its size be such that when you multiply them you get the same value. While it is true that people at the 50% percentile get 1⁄10 of the people at the 90% (and ~1/50 of the 99%), these do not define groups. What we need to look at instead is the cumulative distribution function:
The bottom 50% accounts for 3.17% of incidents
The bottom 90% accounts for 30% of incidents
The bottom 95% accounts for 43% of incidents
What I am getting at is that for a given percentile, the contribution from the group “this percentile and lower” will be a lot smaller than the value at that percentile multiplied by the fraction of the participants below that level. This is because the distribution is very skewed, so for any given percentile the values below it quickly decrease.
Another way of looking at this is by assuming that each percentile has a corresponding value (in the example “number of CHs per year”) proportional to the rarity of that percentile or above. For simplicity, let’s say we have a step function where each time we divide the group by half we get twice the value for those right above the cut-off:
0 to 50% have 1/year
50 to 75% have 2/year
75 to 87.5% have 4/year
and so on...
Here each group contributes equally (size * # of CH is the same for each group). Counter-intuitively, this does not imply that extremes account for a small amount. On the contrary, it implies that the average is infinite (cf. St. Petersburg paradox): even though you will have that for any given percentile, the average below it is always finite (e.g. between 0 and 40% it’s 1/year), the average (and total contribution) above that percentile is always infinite. In this idealized case, it will always be the case that “the bulk is concentrated on a tiny percentile” (and indeed you can make that percentile as small as you want and still get infinitely more above it than below it).
The empirical distribution is not so skewed that we need to worry about infinity. But we do need to worry about the 57% accounted for by the top 5%.
That fair, I made a mathematical error there. The cluster headache math convinces me that a large chunk of total suffering goes to few people there due to lopsided frequencies. Do you have other examples? I particularly felt that the relative frequency of extreme compared to less extreme pain wasn’t well supported.
Your 4 cluster headache groups contribute about equally to the total number of cluster headaches if you multiply group size by # of CH’s. (The top 2% actually contribute a bit less). That’s my entire point. I’m not sure if you disagree?
Hey Soeren!
I would disagree for the following reason. For a group to contribute equally it needs to have both its average and its size be such that when you multiply them you get the same value. While it is true that people at the 50% percentile get 1⁄10 of the people at the 90% (and ~1/50 of the 99%), these do not define groups. What we need to look at instead is the cumulative distribution function:
The bottom 50% accounts for 3.17% of incidents
The bottom 90% accounts for 30% of incidents
The bottom 95% accounts for 43% of incidents
What I am getting at is that for a given percentile, the contribution from the group “this percentile and lower” will be a lot smaller than the value at that percentile multiplied by the fraction of the participants below that level. This is because the distribution is very skewed, so for any given percentile the values below it quickly decrease.
Another way of looking at this is by assuming that each percentile has a corresponding value (in the example “number of CHs per year”) proportional to the rarity of that percentile or above. For simplicity, let’s say we have a step function where each time we divide the group by half we get twice the value for those right above the cut-off:
0 to 50% have 1/year
50 to 75% have 2/year
75 to 87.5% have 4/year
and so on...
Here each group contributes equally (size * # of CH is the same for each group). Counter-intuitively, this does not imply that extremes account for a small amount. On the contrary, it implies that the average is infinite (cf. St. Petersburg paradox): even though you will have that for any given percentile, the average below it is always finite (e.g. between 0 and 40% it’s 1/year), the average (and total contribution) above that percentile is always infinite. In this idealized case, it will always be the case that “the bulk is concentrated on a tiny percentile” (and indeed you can make that percentile as small as you want and still get infinitely more above it than below it).
The empirical distribution is not so skewed that we need to worry about infinity. But we do need to worry about the 57% accounted for by the top 5%.
That fair, I made a mathematical error there. The cluster headache math convinces me that a large chunk of total suffering goes to few people there due to lopsided frequencies. Do you have other examples? I particularly felt that the relative frequency of extreme compared to less extreme pain wasn’t well supported.