I also conduct research on the generalizability issue, but from a different perspective. In my view, any attempt to measure effect heterogeneity (and by extension, research generalizability) is scale dependent. It is very difficult to tease apart genuine effect heterogeneity from the appearance of heterogeneity due to using an inappropriate scale to measure the effects.
In order to to get around this, I have constructed a new scale for measuring effects, which I believe is more natural than the alternative measures. My work on this is available on arXiv at https://arxiv.org/abs/1610.00069 . The paper has been accepted for publication at the journal Epidemiologic Methods, and I plan to post a full explanation of the idea here and on Less Wrong when it is published (presumably, this will be a couple of weeks from now).
I would very much appreciate feedback on this work, and as always, I operate according to Crocker’s Rules.
Thank you! I will think about whether I can come up with a catchier name for future publications (and about whether the benefits outweight the costs of rebranding).
If anyone has suggestions for a better name (for an effect measure that intuitively measures the probability that the exposure switches a person’s outcome state), please let me know!