Yarrow Bouchard 🔸 comments on The tables have turned on AI sceptics

Yarrow Bouchard 🔸 11 May 2026 0:38 UTC
4 points
1 ∶ 2
The blog post by the Australian AI safety organization says, “We apply METR’s time-horizon methodology…” How would this address the criticisms raised of METR’s methodology?
At a glance, the FutureTech pre-print makes some interesting choices, e.g., task quality is only scored up to above-average and above-average gets a perfect score, and acknowledges some of the limitations with their methodology, e.g., all tasks used for this experiment must contain all relevant information in the LLM prompt. (Is that realistic for most work tasks?) I wonder if this pre-print will be submitted for publication in a journal? FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.

[Edit on 2026-05-14 at 18:56 UTC: After reading Peter Slattery’s comment below, I spent a few more minutes looking into it, and I’m still not sure what FutureTech is or what kind of stuff they publish. If someone knows and can explain it, that would be helpful. I could spend more time and get to the bottom of it, but I don’t want to spend more time on it right now.
Please also note the EA Forum team has limited my ability to reply to comments, so I can’t reply further. But if you want to continue the discussion, I’m reachable here.]
Someone could take the time to do a deep dive into the FutureTech pre-print and write a review, but I wonder if that’s a good use of anyone’s time? Is there a reason to think this group publishes high-quality research that is worth getting into?
If someone thinks it’s worthwhile, and they also think the pre-print is unlikely to be submitted for peer review, one option would be to ask the EA organization called The Unjournal to commission a review by an external expert.
- Peter Slattery 🔸 11 May 2026 19:28 UTC
  6 points
  1 ∶ 0
  Parent
  
  Are you sure you are thinking of the correct organization when you say:
  FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.
  I say that because the lab has many publications, including in top peer-reviewed journals like Science. For more context, here is the publications page and here is the bio for Neil Thompson, the head of the lab:
  
  Dr. Thompson’s work has over 3000 citations with an h-index of 21 across his publication portfolio, including such well known and renowned papers as Expertise, The Computational Limits of Deep Learning, and There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Dr. Thompson has been invited to present his work and recommendations to Congressional Staffers (House and Senate), the US Federal Reserve, the Pentagon, National Security Staff, the Department of Commerce, the Department of Energy, Brookings Institute, and most recently presented at a World Summit on the same program as the Prime Minister of India and Former Prime Ministers of England and Australia. With experience in 80+ countries, Dr. Thompson’s research and impact is on a global scale.
  - Peter Slattery 🔸 11 May 2026 19:29 UTC
    2 points
    0 ∶ 0
    Parent
    Oh, and the preprint will almost certainly be submitted for peer review, but it might take 1-2 years before it is published.
    - Yarrow Bouchard 🔸 21 May 2026 1:35 UTC
      2 points
      0 ∶ 0
      Parent
      Okay, if we suspect peer review will eventually happen but the process will be very slow, then it might still be worthwhile to commission an external review, whether through The Unjournal. I once actually did this with my own money just because I was really, desperately curious about a pre-print published by a company that would never be submitted for peer review. I think it ended up costing me $400-500, something like that.
      
      Whether it’s worth the time, effort, and money depends on how much people actually care about this pre-print and think it’s important. Does anyone actually, sincerely think whether we’re on the cusp of apocalypse/utopia hangs on whether this pre-print is correct or not? How much is this particular pre-print actually a crux for anyone?
      
      If it is actually a crux on which people’s expectations around AGI within the next decade hang, then it’s probably worth paying the $500 or $1,000 or whatever it costs to do a review. But if it isn’t on anyone’s top 10 list or even top 20 list of most important pieces of evidence for near-term AGI, then I guess… it probably doesn’t matter whether the pre-print’s findings are true or false.
      
      The argument from an AI safety perspective about why it would be a cost-effective use of funds is straightforward. First, knowing whether the pre-print’s findings stand up under scrutiny are important insofar as the informational content of the pre-print is important for understanding AI. Second, there is currently very little high-quality evidence, and especially very little academic-calibre evidence, to present to skeptics who want to be convinced that an existentially consequential AGI is on the horizon. What could convince them? Well, potentially scientific evidence of this sort. And if your hopes or plans for AI safety depend on, or would be greatly helped by, the ability to bring skeptics on board, well, then it’s worth a relatively small investment to marshal evidence to convince skeptics.
      
      Another potential candidate for external review is the Remote Labor Index pre-print. But the same caveat applies.
- Matrice Jacobine🔸🏳️‍⚧️ 11 May 2026 7:06 UTC
  4 points
  3 ∶ 0
  Parent
  How would this address the criticisms raised of METR’s methodology?
  How would this not? It doesn’t use the same tasks nor does it use the same human baseliner panel as the HCAST dataset.