This is very interesting, thanks for doing this work.
I would note that members of a cluster headache subreddit are unlikely to be representative of the broader population that generated the 1/1000 figure. Presumably they experience a disproportionately large number of headaches.
I completely agree that the members of a cluster headache subreddit or facebook group are not necessarily representative, and in fact quite likely not representative at all.
I think that the conclusion that the distribution follows a long-tail regardless is still accurate. I reason this based on the following point: even if the probability of participating in the survey increased exponentially as a function of the number of times one experiences CHs per year (or sigmoid at the limit), you would nonetheless not be able to make a Gaussian distribution look like a log-normal. The reason is that the rate at which a Gaussian decreases is proportional to the inverse *squared* of the distance from the mean. So we would still get a net decrease at an exponential rate, which does not produce a long-tail (just a somewhat more bulky tail that still tapers off rather quickly). For it to exhibit a long-tail, the probability of participating in the survey as a function of the number of CHs per year would have to grow *doubly* exponentially, at which point we really run out very quickly of possible participants.
That said, I do agree that there is likely an over-estimation of the frequency, but I would argue due to the above reasons that such over-estimation can’t account for the long-tail.
This is very interesting, thanks for doing this work.
I would note that members of a cluster headache subreddit are unlikely to be representative of the broader population that generated the 1/1000 figure. Presumably they experience a disproportionately large number of headaches.
I completely agree that the members of a cluster headache subreddit or facebook group are not necessarily representative, and in fact quite likely not representative at all.
I think that the conclusion that the distribution follows a long-tail regardless is still accurate. I reason this based on the following point: even if the probability of participating in the survey increased exponentially as a function of the number of times one experiences CHs per year (or sigmoid at the limit), you would nonetheless not be able to make a Gaussian distribution look like a log-normal. The reason is that the rate at which a Gaussian decreases is proportional to the inverse *squared* of the distance from the mean. So we would still get a net decrease at an exponential rate, which does not produce a long-tail (just a somewhat more bulky tail that still tapers off rather quickly). For it to exhibit a long-tail, the probability of participating in the survey as a function of the number of CHs per year would have to grow *doubly* exponentially, at which point we really run out very quickly of possible participants.
That said, I do agree that there is likely an over-estimation of the frequency, but I would argue due to the above reasons that such over-estimation can’t account for the long-tail.