For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
Are you open to bets about this? I would be happy to bet 10 k$ that Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good. We could ask the marketing team at Anthropic or marketing experts elsewhere. I am not officially proposing a bet just yet. We would have to agree on a concrete operationalisation.
This doesnât seem to be a reasonable way to operationalize. It would create much less value for the company if it was clear that they were being paid for endorsing them. And I highly doubt Amodei would be in a position to admit that theyâd want such an endorsement even if it indeed benefitted them.
Thanks for the good point, Nick. I still suspect Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good if they were hypothetically being honest. I understand this is difficult to operationalise, but it could still be asked to people outside Anthropic.
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
What about this:
MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
Or this: MIRI discusses stuff with xAI or Meta and convinces themselves theirâsecretâplan is by far the best chance humanity has, and everyone ML/âAI smart and conscious should stop whatever they are doing and join them.
(Obviously these are also unrealistic /â assume something like some lab coming with some plan which could even hypotehically work)
Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open tobetsagainst short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
I was considering hypothetical scenarios of the type âimagine this offer from MIRI arrived, would a lab acceptâ ; clearly MIRI is not making the offer because the labs donât have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
With your bets about timelinesâI did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the âconfident about short timelinesâ do you expect I should take. Iâm happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like â3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibratedâ.
I was considering hypothetical scenarios of the type âimagine this offer from MIRI arrived, would a lab acceptâ
When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
With your bets about timelinesâI did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the âconfident about short timelinesâ do you expect I should take.
I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?
Itâs not endorsing a specific model for marketing reasons; itâs about endorsing the effort, overall.
Given that Meta is willing to pay billions of dollars for people to join them, and that many people donât work on AI capabilities (or work, e.g., at Anthropic, as a lesser evil) because they share their concerns with E&S, an endorsement from E&S would have value in billions-tens of billions simply because of the talent that you can get as a result of this.
Meta is paying billions of dollars to recruit people with proven experience at developing relevant AI models.
Does the set of âpeople with proven experience in building AI modelsâ overlap with âpeople who defer to Eliezer on whether AI is safeâ at all? I doubt it.
Indeed given that Yudkowskyâs arguments on AI are not universally admired and people who have chosen building the thing he says will make everybody die as their career are particularly likely to be sceptical about his convictions on that issue, an endorsement might even be net negative.
Thanks for the comment, Mikhail. Gemini 3 estimates a total annualised compensation of the people working at Meta Superintelligence Labs (MSL) of 4.4 billion $. If an endorsement from Yudkowsky and Soares was as beneficial (including via bringing in new people) as making 10 % of people there 10 % more impactful over 10 years, it would be worth 440 M$ (= 0.10*0.10*10*4.4*10^9).
You could imagine a Yudkowsky endorsement (say with the narrative that Zuck talked to him and admits he went about it all wrong and is finally taking the issue seriously just to entertain the counterfactual...) to raise meta AI from ânobody serious wants to work there and they can only get talent by paying exorbitant pricesâ to âthey finally have access to serious talent and can get a critical mass of people to do serious workâ. Thisâd arguably be more valuable than whatever theyâre doing now.
I think your answer to the question of how much an endorsement would be worth mostly depends on some specific intuitions that I imagine Kulveit has for good reasons but most people donât, so itâs a bit hard to argue about it. It also doesnât help that in every other case than Anthropic and maybe deepmind itâd also require some weird hypotheticals to even entertain the possibility.
Hi Jan.
Are you open to bets about this? I would be happy to bet 10 k$ that Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good. We could ask the marketing team at Anthropic or marketing experts elsewhere. I am not officially proposing a bet just yet. We would have to agree on a concrete operationalisation.
This doesnât seem to be a reasonable way to operationalize. It would create much less value for the company if it was clear that they were being paid for endorsing them. And I highly doubt Amodei would be in a position to admit that theyâd want such an endorsement even if it indeed benefitted them.
Thanks for the good point, Nick. I still suspect Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good if they were hypothetically being honest. I understand this is difficult to operationalise, but it could still be asked to people outside Anthropic.
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
What about this:
MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
Or this: MIRI discusses stuff with xAI or Meta and convinces themselves theirâsecretâplan is by far the best chance humanity has, and everyone ML/âAI smart and conscious should stop whatever they are doing and join them.
(Obviously these are also unrealistic /â assume something like some lab coming with some plan which could even hypotehically work)
Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
I was considering hypothetical scenarios of the type âimagine this offer from MIRI arrived, would a lab acceptâ ; clearly MIRI is not making the offer because the labs donât have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
With your bets about timelinesâI did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the âconfident about short timelinesâ do you expect I should take. Iâm happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like â3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibratedâ.
When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?
Itâs not endorsing a specific model for marketing reasons; itâs about endorsing the effort, overall.
Given that Meta is willing to pay billions of dollars for people to join them, and that many people donât work on AI capabilities (or work, e.g., at Anthropic, as a lesser evil) because they share their concerns with E&S, an endorsement from E&S would have value in billions-tens of billions simply because of the talent that you can get as a result of this.
Meta is paying billions of dollars to recruit people with proven experience at developing relevant AI models.
Does the set of âpeople with proven experience in building AI modelsâ overlap with âpeople who defer to Eliezer on whether AI is safeâ at all? I doubt it.
Indeed given that Yudkowskyâs arguments on AI are not universally admired and people who have chosen building the thing he says will make everybody die as their career are particularly likely to be sceptical about his convictions on that issue, an endorsement might even be net negative.
Thanks for the comment, Mikhail. Gemini 3 estimates a total annualised compensation of the people working at Meta Superintelligence Labs (MSL) of 4.4 billion $. If an endorsement from Yudkowsky and Soares was as beneficial (including via bringing in new people) as making 10 % of people there 10 % more impactful over 10 years, it would be worth 440 M$ (= 0.10*0.10*10*4.4*10^9).
You could imagine a Yudkowsky endorsement (say with the narrative that Zuck talked to him and admits he went about it all wrong and is finally taking the issue seriously just to entertain the counterfactual...) to raise meta AI from ânobody serious wants to work there and they can only get talent by paying exorbitant pricesâ to âthey finally have access to serious talent and can get a critical mass of people to do serious workâ. Thisâd arguably be more valuable than whatever theyâre doing now.
I think your answer to the question of how much an endorsement would be worth mostly depends on some specific intuitions that I imagine Kulveit has for good reasons but most people donât, so itâs a bit hard to argue about it. It also doesnât help that in every other case than Anthropic and maybe deepmind itâd also require some weird hypotheticals to even entertain the possibility.