The p-value critique doesn’t apply to many scientific fields. As far as I can tell, it mostly applies to social science and maybe epidemiological research. In basic biological research, a paper wouldn’t be published in a good journal on the basis of a single p-value. In fact, many papers don’t have any p-values. When p-values are presented, they’re often so low (10^-15) that they’re unnecessary confirmations of a clearly visible effect. (Silly, in my opinion.) Most papers rely on many experiments, which ideally provide multiple lines of evidence. It’s also common to propose a mechanism that’s plausible given the existing literature. In some cases, you can see the fingerprints of skeptical reviewers. For example, when I see “to exclude the possibility that”, I assume that this experiment was added later at the demand of a reviewer. Published biology is often wrong, but for subtler reasons.
“The p-value critique doesn’t apply to many scientific fields.”
I agree with this, or at least that it is vastly weaker when overwhelming data are available to pin down results.
“As far as I can tell, it mostly applies to social science and maybe epidemiological research. ”
I disagree with this.
For instance, p-value issues have been catastrophic in quantitative genetics. The vast bulk of candidate gene research in genetics was non-replicable p-hacking of radically underpowered studies. E.g. schizophrenia candidate genes replicate at chance levels in massive replications but had literatures of p-hacked and publication bias artifact studies. The field moved to requiring genome-wide significance of 5*10^-8 (i.e. Bonferroni corrections for multiple testing at all measured variants). Results obtained in huge genome-wide association studies that meet that criterion replicate reliably.
ETA: It isn’t basic biological research, but medical and drug trials routinely have severe p-hacking issues. And there have been a lot of reproducibility problems reported with, e.g. preclinical cancer research, often lacking slam dunk evidence. The Reproducibility Project: Cancer is working on that.
Medical studies take up the bulk of biomedical research funds, and Eliezer’s example is at the intersection of medicine and nutrition.
ETA2: I don’t think issues of p-hacking would be solved just by using Bayesian statistics: people can instead selectively report Bayes factors, i.e. posterior hacking. It’s the selective use of analytic and reporting degrees of freedom that’s central. Here’s Daryl Bem and coauthors’ Bayesian meta-analysis purporting to show psi in Bem’s p-hacked experiments.
medical and drug trials routinely have severe p-hacking issues. And there have been a lot of reproducibility problems reported with, e.g. preclinical cancer research, often lacking slam dunk evidence.
Due to my medical problems I have been reading medical literature for 25 years, and indeed it is a catastrophe of p-hacking and the like, incompetent statistical analysis, ven very often there is a basic misunderstanding of what p-values mean. You routinely see researchers claiming “no effect” when the p value is slightly over 0.05.
Usually, medical papers are misleading in some serious way. The best you can hope for is that they waste the vast majority of the value in the data.
People who read abstracts only and thing they are learning something are deluding themselves. You can to go through the methods section carefully and even then not all the shenanigans are disclosed, and look very closely at sponsorship of the parties to the study (researchers, journal editors, institutions etc) to pick up the extreme biases that result from sponsorship.
I consider GWAS applied, not basic, because it’s not mechanistic. Most biologists I’ve spoken to have a fairly poor opinion of GWAS, as do I. Much of the biological research that gets funded is basic.
The p-value critique doesn’t apply to many scientific fields. As far as I can tell, it mostly applies to social science and maybe epidemiological research. In basic biological research, a paper wouldn’t be published in a good journal on the basis of a single p-value. In fact, many papers don’t have any p-values. When p-values are presented, they’re often so low (10^-15) that they’re unnecessary confirmations of a clearly visible effect. (Silly, in my opinion.) Most papers rely on many experiments, which ideally provide multiple lines of evidence. It’s also common to propose a mechanism that’s plausible given the existing literature. In some cases, you can see the fingerprints of skeptical reviewers. For example, when I see “to exclude the possibility that”, I assume that this experiment was added later at the demand of a reviewer. Published biology is often wrong, but for subtler reasons.
“The p-value critique doesn’t apply to many scientific fields.” I agree with this, or at least that it is vastly weaker when overwhelming data are available to pin down results.
“As far as I can tell, it mostly applies to social science and maybe epidemiological research. ”
I disagree with this.
For instance, p-value issues have been catastrophic in quantitative genetics. The vast bulk of candidate gene research in genetics was non-replicable p-hacking of radically underpowered studies. E.g. schizophrenia candidate genes replicate at chance levels in massive replications but had literatures of p-hacked and publication bias artifact studies. The field moved to requiring genome-wide significance of 5*10^-8 (i.e. Bonferroni corrections for multiple testing at all measured variants). Results obtained in huge genome-wide association studies that meet that criterion replicate reliably.
ETA: It isn’t basic biological research, but medical and drug trials routinely have severe p-hacking issues. And there have been a lot of reproducibility problems reported with, e.g. preclinical cancer research, often lacking slam dunk evidence. The Reproducibility Project: Cancer is working on that.
Medical studies take up the bulk of biomedical research funds, and Eliezer’s example is at the intersection of medicine and nutrition.
ETA2: I don’t think issues of p-hacking would be solved just by using Bayesian statistics: people can instead selectively report Bayes factors, i.e. posterior hacking. It’s the selective use of analytic and reporting degrees of freedom that’s central. Here’s Daryl Bem and coauthors’ Bayesian meta-analysis purporting to show psi in Bem’s p-hacked experiments.
Due to my medical problems I have been reading medical literature for 25 years, and indeed it is a catastrophe of p-hacking and the like, incompetent statistical analysis, ven very often there is a basic misunderstanding of what p-values mean. You routinely see researchers claiming “no effect” when the p value is slightly over 0.05.
Usually, medical papers are misleading in some serious way. The best you can hope for is that they waste the vast majority of the value in the data.
People who read abstracts only and thing they are learning something are deluding themselves. You can to go through the methods section carefully and even then not all the shenanigans are disclosed, and look very closely at sponsorship of the parties to the study (researchers, journal editors, institutions etc) to pick up the extreme biases that result from sponsorship.
I consider GWAS applied, not basic, because it’s not mechanistic. Most biologists I’ve spoken to have a fairly poor opinion of GWAS, as do I. Much of the biological research that gets funded is basic.