It was commendable to seek advice, but I fear in this case the recommendation you got doesnât hit the mark.
I donât see the use of âact (as if)â as helping much. Firstly, it is not clear what it means to be âwrong aboutâ âacting as if the null hypothesis is falseâ, but I donât think however one cashes this out it avoids the problem of the absent prior. Even if we say âWe will follow the policy of rejecting the null whenever p < alphaâ, knowing the error rate of this policy overall still demands a âgrand priorâ of something like âhow likely is a given (/ârandomly selected?) null hypothesis we are considering to be true?â
Perhaps what Lakens has in mind is as we expand the set of null hypothesis we are testing to some very large set the prior becomes maximally uninformative (and so alpha converges to the significance threshold), but this is deeply uncertain to meâand, besides, we want to know (and a reader might reasonably interpret the rider as being about) the likelihood of this policy getting the wrong result for the particular null hypothesis under discussion.
--
As I fear this thread demonstrates, p values are a subject which tends to get more opaque the more one tries to make them clear (only typically rivalled by âconfidence intervalâ). Theyâre also generally much lower yield than most other bits of statistical information (i.e. we generally care a lot more about narrowing down the universe of possible hypotheses by effect size etc. rather than simply excluding one). The write-up should be welcomed for providing higher yield bits of information (e.g. effect sizes with CIs, regression coefficients, etc.) where it can.
Most statistical work never bothers to crisply explain exactly what it means by âsignificantly different (P = 0.03)â or similar, and I think it is defensible to leave it at that rather than wading into the treacherous territory of trying to give a clear explanation (notwithstanding the fact the typical reader will misunderstand what this means). My attempt would be not to provide an âin-line explanationâ, but offer an explanatory footnote (maybe after the first p value), something like this:
Our data suggests a trend/âassociation between X and Y. Yet we could also explain this as a matter of luck: even though in reality X and Y are not correlated [or whatever], it may we just happened to sample people where those high in X also tended to be high in Y, in the same way a fair coin might happen to give more heads than tails when we flip it a number of times.
A p-value tells us how surprising our results would be if they really were just a matter of luck: strictly, it is the probability of our study giving results as or more unusual than our data if the ânull hypothesisâ (in this case, there is no correlation between X and Y) was true. So a p-value of 0.01 means our data is in the top 1% of unusual results, a p-value of 0.5 means our data is in the top half of unusual results, and so on.
A p-value doesnât say all that much by itselfâcrucially, it doesnât tell us the probability of the null hypothesis itself being true. For example, a p-value of 0.01 doesnât mean thereâs a 99% probability the null hypothesis is false. A coin being flipped 10 times and landing heads on all of them is in the top percentile (indeed, roughly the top 0.1%) of unusual results presuming the coin is fair (the ânull hypothesisâ), but we might have reasons to believe, even after seeing only heads after flipping it 10 times, to believe it is probably fair anyway (maybe we made it ourselves with fastidious care, maybe its being simulated on a computer and weâve audited the code, or whatever). At the other extreme, a P value of 1.0 doesnât mean we know for sure the null hypothesis is true: although seeing 5 heads and 5 tails from 10 flips is the least unusual result given the null hypothesis (and so all possible results are âas more more unusualâ than what weâve seen), it could be the coin is unfair and we just didnât see it.
What we can use a p-value for is as a rule of thumb for which apparent trends are worth considering further. If the p-value is high the âjust a matter of luckâ explanation for the trend between X and Y is credible enough we shouldnât over-interpret it, on the other hand, a low p-value makes the apparent trend between X and Y an unusual result if it really were just a matter of luck, and so we might consider alternative explanations (e.g. our data wouldnât be such an unusual finding if there really was some factor that causes those high in X to also be high in Y).
âHighâ and âlowâ are matters of degree, but one usually sets a âsignificance thresholdâ to make the rule of thumb concrete: when a p-value is higher than this threshold, we dismiss an apparent trend as just a matter of luckâif it is lower, we deem it significant. The standard convention is for this threshold to be p=0.05.
It was commendable to seek advice, but I fear in this case the recommendation you got doesnât hit the mark.
I donât see the use of âact (as if)â as helping much. Firstly, it is not clear what it means to be âwrong aboutâ âacting as if the null hypothesis is falseâ, but I donât think however one cashes this out it avoids the problem of the absent prior. Even if we say âWe will follow the policy of rejecting the null whenever p < alphaâ, knowing the error rate of this policy overall still demands a âgrand priorâ of something like âhow likely is a given (/ârandomly selected?) null hypothesis we are considering to be true?â
Perhaps what Lakens has in mind is as we expand the set of null hypothesis we are testing to some very large set the prior becomes maximally uninformative (and so alpha converges to the significance threshold), but this is deeply uncertain to meâand, besides, we want to know (and a reader might reasonably interpret the rider as being about) the likelihood of this policy getting the wrong result for the particular null hypothesis under discussion.
--
As I fear this thread demonstrates, p values are a subject which tends to get more opaque the more one tries to make them clear (only typically rivalled by âconfidence intervalâ). Theyâre also generally much lower yield than most other bits of statistical information (i.e. we generally care a lot more about narrowing down the universe of possible hypotheses by effect size etc. rather than simply excluding one). The write-up should be welcomed for providing higher yield bits of information (e.g. effect sizes with CIs, regression coefficients, etc.) where it can.
Most statistical work never bothers to crisply explain exactly what it means by âsignificantly different (P = 0.03)â or similar, and I think it is defensible to leave it at that rather than wading into the treacherous territory of trying to give a clear explanation (notwithstanding the fact the typical reader will misunderstand what this means). My attempt would be not to provide an âin-line explanationâ, but offer an explanatory footnote (maybe after the first p value), something like this: