Matrice Jacobine🔸🏳️‍⚧️ comments on The tables have turned on AI sceptics

Matrice Jacobine🔸🏳️‍⚧️ 10 May 2026 14:02 UTC
4 points
1 ∶ 1
Exponential growth in time horizon with a ~4mo doubling time has been confirmed by other organizations on very different distributions (1, 2). Furthermore, it correlates very well with the Epoch Capabilities Index.
- Yarrow Bouchard 🔸 11 May 2026 0:38 UTC
  4 points
  1 ∶ 2
  Parent
  The blog post by the Australian AI safety organization says, “We apply METR’s time-horizon methodology…” How would this address the criticisms raised of METR’s methodology?
  At a glance, the FutureTech pre-print makes some interesting choices, e.g., task quality is only scored up to above-average and above-average gets a perfect score, and acknowledges some of the limitations with their methodology, e.g., all tasks used for this experiment must contain all relevant information in the LLM prompt. (Is that realistic for most work tasks?) I wonder if this pre-print will be submitted for publication in a journal? FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.
  
  [Edit on 2026-05-14 at 18:56 UTC: After reading Peter Slattery’s comment below, I spent a few more minutes looking into it, and I’m still not sure what FutureTech is or what kind of stuff they publish. If someone knows and can explain it, that would be helpful. I could spend more time and get to the bottom of it, but I don’t want to spend more time on it right now.
  Please also note the EA Forum team has limited my ability to reply to comments, so I can’t reply further. But if you want to continue the discussion, I’m reachable here.]
  Someone could take the time to do a deep dive into the FutureTech pre-print and write a review, but I wonder if that’s a good use of anyone’s time? Is there a reason to think this group publishes high-quality research that is worth getting into?
  If someone thinks it’s worthwhile, and they also think the pre-print is unlikely to be submitted for peer review, one option would be to ask the EA organization called The Unjournal to commission a review by an external expert.
  - Peter Slattery 🔸 11 May 2026 19:28 UTC
    6 points
    1 ∶ 0
    Parent
    
    Are you sure you are thinking of the correct organization when you say:
    FutureTech seems to be one of those weird MIT hybrids between an academic research group and a management consultancy. I’m not sure if they’ve ever published a peer-reviewed paper.
    I say that because the lab has many publications, including in top peer-reviewed journals like Science. For more context, here is the publications page and here is the bio for Neil Thompson, the head of the lab:
    
    Dr. Thompson’s work has over 3000 citations with an h-index of 21 across his publication portfolio, including such well known and renowned papers as Expertise, The Computational Limits of Deep Learning, and There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Dr. Thompson has been invited to present his work and recommendations to Congressional Staffers (House and Senate), the US Federal Reserve, the Pentagon, National Security Staff, the Department of Commerce, the Department of Energy, Brookings Institute, and most recently presented at a World Summit on the same program as the Prime Minister of India and Former Prime Ministers of England and Australia. With experience in 80+ countries, Dr. Thompson’s research and impact is on a global scale.
    - Peter Slattery 🔸 11 May 2026 19:29 UTC
      2 points
      0 ∶ 0
      Parent
      Oh, and the preprint will almost certainly be submitted for peer review, but it might take 1-2 years before it is published.
  - Matrice Jacobine🔸🏳️‍⚧️ 11 May 2026 7:06 UTC
    4 points
    3 ∶ 0
    Parent
    How would this address the criticisms raised of METR’s methodology?
    How would this not? It doesn’t use the same tasks nor does it use the same human baseliner panel as the HCAST dataset.