Yarrow Bouchard 🔸 comments on Beware of non-evidence-based argumentation

Yarrow Bouchard 🔸 29 Jan 2026 3:12 UTC
5 points
1 ∶ 1
People who have radical anti-institutionalist views often take reasonable criticisms of institutions and use them to argue for their preferred radical alternative. There are many reasonable criticisms of liberal democracy; these are eagerly seized on by Marxist-Leninists, anarchists, and right-wing authoritarians to insist that their preferred political system must be better. But of course this conclusion does not necessarily follow from those criticisms, even if the criticisms are sound. The task for the challenger is to support the claim that their preferred system is robustly superior, not simply that liberal democracy is flawed.

The same is true for radical anti-institutionalist views on institutional science (which the LessWrong community often espouses, or at least whenever it suits them). Pointing out legitimate failures in institutional science does not necessarily support the radical anti-institutionalists’ conclusion that peer-reviewed journals, universities, and government science agencies should be abandoned in favour of blogs, forums, tweets, and self-published reports or pre-prints. On what basis can the anti-institutionalists claim that this is a robustly superior alternative and not a vastly inferior one?

To be clear, I interpret you as making a moderate anti-institutionalist argument, not a radical one. But the problem with the reasoning is the same in either case — which is why I’m using the radical arguments for illustration. The guardrails in academic publishing sometimes fail, as in the case of research misconduct or in well-intentioned, earnestly conducted research that doesn’t replicate as you mentioned. But is this an argument for kicking down all guardrails? Shouldn’t it be the opposite? Doesn’t this just show us that deeply flawed research can slip under the radar? Shouldn’t this underscore the importance of savvy experts doing close, critical readings of research to find flaws? Shouldn’t the replication crisis remind of us of the importance of replication (which has always been a cornerstone of institutional science)? Why should the replication crisis be taken as license to give up on institutions and processes that attempt to enforce academic rigour, including replication?

In the case of both AI 2027 and the METR graph, half of the problem is the underlying substance — the methodology, the modelling choices, the data. The other half of the problem is the presentation. Both have been used to make bold, sweeping, confident claims. Academic journals referee both the substance and the presentation of submitted research; they push back on authors trying to use their data or modelling to make conclusions that are insufficiently supported.

In this vein, one of the strongest critiques of AI 2027 is that it is an exercise in judgmental forecasting, in which the authors make intuitive, subjective guesses about the future trajectory of AI research and technology development. There’s nothing inherently wrong with a judgmental forecasting exercise, but I don’t think the presentation of AI 2027 is clear enough that AI 2027 is nothing more than that. (80,000 Hours’ video on AI 2027, which is 34 minutes long and was carefully written and produced at a cost of $160,000, doesn’t even mention this.)

If AI 2027 had been submitted to a reputable peer-reviewed journal, besides hopefully catching the modelling errors, the reviewers probably would have insisted the authors make it clear from the outset what data the conclusions are based on (i.e. the authors’ judgmental forecasts) and where that data came from. They would probably also have insisted the conclusions are appropriately moderated and caveated in light of that. But, overall, I think AI 2027 would probably just be unpublishable.
- TFD 29 Jan 2026 5:22 UTC
  1 point
  0 ∶ 0
  Parent
  I don’t think my argument is even that anti-institutionalist. I have issues with how academic publishing works but I still think peer reviewed research is an extremely important and valuable source of information. I just think it has flaws and is much messier than discussions around the topic sometimes make it seem.
  My point isn’t to say that we should throw out traditional academic insitutions, it is to say that I feel like the claim that the arguments for short timelines are “non-evidence-based” are critiquing the same messiness that also is present in peer reviewed research. If I read a study whose conclusions I disagree with, I think it would be wrong to say “field X has a replication crisis, therefore we can’t really consider this study to be evidence”. I feel like a similar thing is going on when people say the arguments for short timelines are “non-evidence-based”. To me things like METR’s work definitely are evidence, even if they aren’t necessarily strong or definitive evidence or if that evidence is open to contested interpretations. I don’t think something needs to be peer reviewed to count as “evidence”, is essentially the point I was trying to make.
  - titotal 29 Jan 2026 10:32 UTC
    14 points
    3 ∶ 1
    Parent
    Generally, the scientific community is not going around arguing that drastic measures should be taken based on singular novel studies. Mainly, what a single novel study will produce is a wave of new studies on the same subject, to ensure that the results are valid and that the assumptions used hold up to scrutiny. Hence why that low-temperature superconductor was so quickly debunked.
    I do not see similar efforts in the AI safety community. The studies by METR are great first forays into difficult subjects, but then I see barely any scrutinity or follow-up by other researchers. And people accept much worse scholarship like AI2027 at face-value for seemingly no reason.
    I have experience in both academia and EA now, and I believe that the scholarship and skeptical standards in EA are substantially worse.
    - Kestrel🔸 29 Jan 2026 12:53 UTC
      10 points
      1 ∶ 2
      Parent
      I agree. EA has a cost-effectiveness problem that conflicts with its truth-seeking attempts. EA’s main driving force is cost-effectiveness, above all else—even above truth itself.
      EA is highly incentivised to create and spread apocalyptic doom narratives. This is because apocalyptic doom narratives are good at recruiting people to EA’s “let’s work to decrease the probability of apocalyptic doom (because that has lots of expected value given future population projections)” cause area. And funding-wise, EA community funding (at least in the UK) is pretty much entirely about trying to make more people work in these areas.
      EA is also populated by the kinds of people who respond to apocalyptic doom narratives, for the basic reason that if they didn’t they wouldn’t have ended up in EA. So stuff that promotes these narratives does well in EA’s attention economy.
      EA just doesn’t have anywhere near as much £$€ to spend as academia does. It’s also very interested in doing stuff and willing to tolerate errors as long as the stuff gets done. Therefore, its academic standards are far lower.
      I really don’t know how you’d fix this. I don’t think research into catastrophic risks should be conducted on a shoestring budget and by a pseudoreligion/citizen science community. I think it should be government funded and probably sit within the wider defense and security portfolio.
      However I’ll give EA some grace for essentially being a citizen science community, for the same reason I don’t waste effort grumping about the statistical errors made by participants in the Big Garden Birdwatch.
      - Yarrow Bouchard 🔸 13 Apr 2026 2:12 UTC
        2 points
        0 ∶ 1
        Parent
        This is a beautifully written comment, and succinct, and funny, and true.
        
        I would give EA much more grace if its self-image was the same as what I presume the Big Garden Birdwatch’s self-image is. Part of what gets me tilted out of my mind about the EA community is when people express this almost messianic Chosen Ones self-image — which ties into the pseudo-religious aspect you mentioned.
        
        The high-impact, low-probability logic of existential risk is hypnotically alluring. If a 1 in 1 quintillion chance of reducing existential risk is equivalent to 100 human lives, what does that imply in terms of your moral responsibility when discussing existential risk? If you have things to say that could cast doubt on existential risk arguments, should you self-censor and hold your tongue? If you speak out and you’re wrong, it could be the moral equivalent of killing 100 people. Would it be okay to lie? To exaggerate? Why not? Wouldn’t you lie or exaggerate to save 100 lives? If the Nazis knocked at your door, wouldn’t you lie to save Anne Frank in the attic?
        
        I don’t think many people are actually outright lying when it comes to existential risk. But I do think people are self-censoring when it comes to criticism, and I do think people are willing to make excuses for really low-quality products like AI 2027 or 80,000 Hours’ video on it because anything that builds momentum for existential risk fear is plausibly extremely high in expected value.
    - TFD 29 Jan 2026 13:51 UTC
      3 points
      0 ∶ 0
      Parent
      Generally, the scientific community is not going around arguing that drastic measures should be taken based on singular novel studies. Mainly, what a single novel study will produce is a wave of new studies on the same subject, to ensure that the results are valid and that the assumptions used hold up to scrutiny. Hence why that low-temperature superconductor was so quickly debunked.
      I agree that on average the scientific community does a great job of this, but I think the process is much much messier in practice than a general description of the process makes it seem. For example, you have the alzheimers research that got huge pick-up and massive funding by major scientific institutions where the original research included doctored images. You have power-posing getting viral attention in science-ajacent media. You have priming where Kahneman wrote in his book that even if it seems wild you have to believe in it largely for similar reasons to what is being suggested here I think, that multiple rigorous scientific studies demonstrate the phenomenon, and yet when the replication crisis came around priming looks a lot more shaky than it seemed when Kahneman wrote that.
      None of this means that we should throw out the existing scientific community or declare that most published research is false (although ironically there is a peer reviewed publication with this title!). Instead, my argument is that we should understand that this process is often messy and complicated. Imperfect research still has value and in my view is still “evidence” even if it is imperfect.
      The research and arguments around AI risk are not anywhere near as rigorous as a lot of scientific research (and I linked a comment above where I myself criticize AI risk advocates for overestimating the rigor of their arguments). At the same time, this doesn’t mean that these arguments do not contain any evidence or value. There is a huge amount of uncetainty about what will happen with AI. People worried about the risks from AI are trying to muddle through these issues, just like the scientific community has to muddle through figuring things out as well. I think it its completely valid to point of flaws in arguments, lack of rigor, or over confidence (as I have also done). But evidence or argument doesn’t have to appear in a journal or conference to count as “evidence”.
      My view is that we have to live with the uncertainty and make decisions based on the information we have, while also trying to get better information. Doing nothing and going with the status quo is itself a decision that can have important consequences. We should use the best evidence we have to make the best decision given uncertainty, not just default to the status quo when we lack ideal, rigorous evidence.