Even though this study had insufficient statistical power to detect an effect, it is easy to understate how much of an improvement this study was on the prior studies that have been conducted.
This study is the largest to date, more than 3x the size of the second largest study. Also, this study is the first of its kind to have an equal-sized control group, 14x larger than any control group previously studied.
Kudos for this long-term effort and the significant improvements.
[ X, Y, Z ] Does that matter?
Given the problems you’re having with statistical power you may do better by creating an index outcome variable that takes several signals into account. For example, the GiveDirectly evaluations combine a variety of well-being measures into a single index to replace massive underpowered multiple testing.
Instead, the big problem to me is one of bias—we’ve already analyzed the results and we’re operating on a motivated continuation where we only continue if we don’t find the effect we’re looking for and only stop once the effect is found. Instead, we need to precommit to a stopping point.
Kudos for this long-term effort and the significant improvements.
Given the problems you’re having with statistical power you may do better by creating an index outcome variable that takes several signals into account. For example, the GiveDirectly evaluations combine a variety of well-being measures into a single index to replace massive underpowered multiple testing.
You might want to read this, this and this,