Jan_Kulveit comments on Unfalsifiable stories of doom

Jan_Kulveit 2 Feb 2026 10:58 UTC
4 points
0 ∶ 0
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.

What about this:

MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.

Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their—secret—plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.

(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)
- Vasco Grilo🔸 2 Feb 2026 12:13 UTC
  4 points
  0 ∶ 0
  Parent
  Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
  - Jan_Kulveit 2 Feb 2026 20:41 UTC
    4 points
    0 ∶ 0
    Parent
    I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept” ; clearly MIRI is not making the offer because the labs don’t have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
    
    I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
    
    With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take. I’m happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like “3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated”.
    - Vasco Grilo🔸 2 Feb 2026 21:41 UTC
      2 points
      0 ∶ 0
      Parent
      I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept”
      When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
      With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take.
      I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?