Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Good work. A minor point:
I don’t think the riders when discussing significant results along the lines of “being wrong 5% of the time in the long run” sometimes doesn’t make sense. Compare
To:
Although commonly the significance threshold is equated with the ‘type 1 error rate’ which in turn is equated with ‘the chance of falsely rejecting the null hypothesis’, this is mistaken (1). P values are not estimates of the likelihood of the null hypothesis, but of the observation (as or more extreme) conditioned on that hypothesis. P(Null|significant result) needs one to specify the prior. Likewise, T1 errors are best thought of as the ‘risk’ of the test giving the wrong indication, rather than the risk of you making the wrong judgement. (There’s also some remarks on family-wise versus false discovery rates which can be neglected.)
So the first quote is sort-of right (although assuming the null then talking about the probability of being wrong may confuse rather than clarify), but the second one isn’t: you may (following standard statistical practice) reject the null hypothesis given P < 0.05, but this doesn’t tell you there is a 5% chance of the null being true when you do so.
Hi, thanks.
I agree that “If I have observed a p < .05, what is the probability that the null hypothesis is true?” is a different question than “If the null hypothesis is true, what is the probability of observing this (or more extreme) data”. Only the latter question is answered by a p-value (the former needing some bayesian-style subjective prior). I haven’t yet seen a clear consensus on how to report this in a way that is easy for the lay reader.
The phrases I employed (highlighted in your comment) were suggested in writing by Daniel Lakens, although I added a caveat about the null in the second quote which is perhaps incorrect. His defence of the phrase “we can act as if the null hypothesis is false, and we would not be wrong more than 5% of the time in the long run” is the specific use of the word ‘act’, “which does not imply anything about whether this specific hypothesis is true or false, but merely states that if we act as if the null-hypothesis is false any time we observe p < alpha, we will not make an error more than alpha percent of the time”. I would be very interested if you have suggestions of a similar standard phrasing which captures both the probability of observing data (not a hypothesis) and is somewhat easy for a non-stats reader to grasp.
As an aside, what is your opinion on reporting p values greater than the relevant alpha level? I’ve read Daniel Lakens suggesting if you have p< .05 one could write something like “because given our sample size of 50 per group, and our alpha level of 0.05, only observed differences more extreme than 0.4 could be statistically significant, and our observed mean difference was 0.35, we could not reject the null hypothesis’.” This seems a bit wordy for any lay reader but would it be worth even including in a footnote?
It was commendable to seek advice, but I fear in this case the recommendation you got doesn’t hit the mark.
I don’t see the use of ‘act (as if)’ as helping much. Firstly, it is not clear what it means to be ‘wrong about’ ‘acting as if the null hypothesis is false’, but I don’t think however one cashes this out it avoids the problem of the absent prior. Even if we say “We will follow the policy of rejecting the null whenever p < alpha”, knowing the error rate of this policy overall still demands a ‘grand prior’ of something like “how likely is a given (/randomly selected?) null hypothesis we are considering to be true?”
Perhaps what Lakens has in mind is as we expand the set of null hypothesis we are testing to some very large set the prior becomes maximally uninformative (and so alpha converges to the significance threshold), but this is deeply uncertain to me—and, besides, we want to know (and a reader might reasonably interpret the rider as being about) the likelihood of this policy getting the wrong result for the particular null hypothesis under discussion.
--
As I fear this thread demonstrates, p values are a subject which tends to get more opaque the more one tries to make them clear (only typically rivalled by ‘confidence interval’). They’re also generally much lower yield than most other bits of statistical information (i.e. we generally care a lot more about narrowing down the universe of possible hypotheses by effect size etc. rather than simply excluding one). The write-up should be welcomed for providing higher yield bits of information (e.g. effect sizes with CIs, regression coefficients, etc.) where it can.
Most statistical work never bothers to crisply explain exactly what it means by ‘significantly different (P = 0.03)’ or similar, and I think it is defensible to leave it at that rather than wading into the treacherous territory of trying to give a clear explanation (notwithstanding the fact the typical reader will misunderstand what this means). My attempt would be not to provide an ‘in-line explanation’, but offer an explanatory footnote (maybe after the first p value), something like this:
Thanks for this!
To clarify, this refers to those who actually changed causes, specifically, right?
For the “Mean cause rating and sub grouping” table, it would be helpful to have the total ratings too, for comparison, so we can see how veg*ns, meat eaters, men and women differ from the average.
Hi,
On your first point, yes you are correct. Among those who prioritized Global Poverty OR Animal Welfare AND changed causes, pluralities of them changed to AI.
On your second point, I’ve now added a column in the group membership and demographics tables that shows the average for the sample as a whole. I hope this helps.
Thanks!
Thanks for this. I like the ribbon plots!
Did you by chance look at cause prio differences between countries and saw anything interesting? I dimly remember there used to be a trend along the lines of a bit more animal welfare in continental Europe, global poverty in UK and x-risk in the US.
Hi, thanks! We will explore cause prioritization and the geographic distribution of EAs in a forthcoming post. We tried to keep a narrower focus in this post, on involvement in EA and just a few demographics, as we did in last year’s post.