The financial basis for motivated reasoning is arguably even stronger in MIRI’s case than in Mechanize’s case. The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
It is somewhat difficult to react to this level of absolutely incredible nonsense politely, but I’ll try.
I disagree with both Yudkowsky and Soares about many things, but very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.
For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
“very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.”
“Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.”
fwiw both of these claims strike me as close to nonsense, so I don’t think this is a helpful reaction.
If you ask the AIs they get numbers in the tens of millions to tens of billions range, with around 1 billion being the central estimate. (I haven’t extensively controlled for the effect and some calculations appear driven by narrative)
Personally I find it hard to judge and tend to lean no when trying to think it through, but it’s not obviously nonsense.
I agree with Ben Stewart’s response that this is not a helpful thing to say. You are making some very strange and unintuitive claims. I can’t imagine how you would persuade a reasonable, skeptical, well-informed person outside the EA/LessWrong (or adjacent) bubble that these are credible claims, let alone that they are true. (Even within the EA Forum bubble, it seems like significantly more people disagree with you than agree.)
To pick on just one aspect of this claim: it is my understanding that Yudkowsky has no meaningful technical proficiency with deep learning-based or deep reinforcement learning-based AI systems. In my understanding, Yudkowsky lacks the necessary skills and knowledge to perform the role of an entry-level AI capabilities researcher or engineer at any AI company capable of paying multi-million-dollar salaries. If there is evidence that shows my understanding is mistaken, I would like to see that evidence. Otherwise, I can only conclude that you are mistaken.
I think the claim that an endorsement is worth billions of billions is also wrong, but it’s hard to disprove a claim about what would happen in the event of a strange and unlikely hypothetical. Yudkowsky, Soares, and MIRI have an outsized intellectual influence in the EA community (and obviously on LessWrong). There is some meaningful level of influence on the community of people working in the AI industry in the Bay Area, but it’s much less. Among the sort of people who could make decisions that would realize billions or tens of billions in value, namely the top-level executives at AI companies and investors, the influence seems pretty marginal. I would guess the overwhelming majority of investors either don’t know who Yudkowsky and Soares are or do but don’t care what their views are. Top-level executives do know who Yudkowsky is, but in every instance I’ve seen, they tend to be politely disdainful or dismissive toward his views on AGI and AI safety.
Anyway, this seems like a regrettably unproductive and unimportant tangent.
I think it could be a helpful response for people who are able to respond to signals of the type “someone who has demonstrably good forecasting skills, is an expert in the field, and works on this long time claims X” by at least re-evaluating if their models make sense and are not missing some important considerations.
If someone is at least able to that, they can for example ask a friendly AI or some other friendly AI and they will tell you, based on conservative estimates and reference classes, that the original claim is likely wrong. They will still miss important considerations—in a way in which typical forecaster would also do—so the results are underestimates.
I think at the level of [some combination of lack of ability to think and motivated reasoning] when people are uninterested in e.g. sanity checking their thinking with AIs, it is not worth the time correcting them. People are wrong on the internet all the time.
(I think the debate was moderately useful—I made an update from this debate & voting patterns, broadly in the direction EA Forum descending to a level of random place on the internet where confused people talk about AI and it is broadly not worth to read or engage. I’m no longer that much active on EAF, but I’ve made some update)
This thread seems to have gone in an unhelpful direction.
Questioning motivations is a hard point to make well. I’m unwilling to endorse that they are never relevant, but it immediately becomes personal. Keeping the focus primarily on the level of the arguments themselves is an approach more likely to enlighten and less likely to lead to flamewars.
I’m not here to issue a moderation warning to anyone for the conversation ending up on the point of motivations. I do want to take my moderation hat off and suggest that people spend more time on the object level.
I will then put my moderation hat back on and say that this and Jan’s previous comment breaks norms. You can disagree with someone without being this insulting.
I agree the thread direction may be unhelpful, and flame wars are bad.
I disagree though about the merits of questioning motivations, I think its super important.
In the AI sphere, there are great theoretical arguments on all sides, good arguments for accelleration, caution, pausing etc. We can discuss these ad nauseum and I do think that’s useful. But I think motivations likely shape the history and current state of AI development more than unmotivated easoning and rational thought. Money and Power are strong motivators—EA’s have sidelined them at their peril before. Although we cannot know people’s hearts, we can see and analyse what they havedone and said in the past and what motivational pressure might affect them right now.
I also think its possislbe to have a somewhat object level about motivations.
For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
Are you open to bets about this? I would be happy to bet 10 k$ that Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good. We could ask the marketing team at Anthropic or marketing experts elsewhere. I am not officially proposing a bet just yet. We would have to agree on a concrete operationalisation.
This doesn’t seem to be a reasonable way to operationalize. It would create much less value for the company if it was clear that they were being paid for endorsing them. And I highly doubt Amodei would be in a position to admit that they’d want such an endorsement even if it indeed benefitted them.
Thanks for the good point, Nick. I still suspect Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good if they were hypothetically being honest. I understand this is difficult to operationalise, but it could still be asked to people outside Anthropic.
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
What about this:
MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their—secret—plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.
(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)
Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open tobetsagainst short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept” ; clearly MIRI is not making the offer because the labs don’t have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take. I’m happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like “3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated”.
I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept”
When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take.
I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?
It’s not endorsing a specific model for marketing reasons; it’s about endorsing the effort, overall.
Given that Meta is willing to pay billions of dollars for people to join them, and that many people don’t work on AI capabilities (or work, e.g., at Anthropic, as a lesser evil) because they share their concerns with E&S, an endorsement from E&S would have value in billions-tens of billions simply because of the talent that you can get as a result of this.
Meta is paying billions of dollars to recruit people with proven experience at developing relevant AI models.
Does the set of “people with proven experience in building AI models” overlap with “people who defer to Eliezer on whether AI is safe” at all? I doubt it.
Indeed given that Yudkowsky’s arguments on AI are not universally admired and people who have chosen building the thing he says will make everybody die as their career are particularly likely to be sceptical about his convictions on that issue, an endorsement might even be net negative.
Thanks for the comment, Mikhail. Gemini 3 estimates a total annualised compensation of the people working at Meta Superintelligence Labs (MSL) of 4.4 billion $. If an endorsement from Yudkowsky and Soares was as beneficial (including via bringing in new people) as making 10 % of people there 10 % more impactful over 10 years, it would be worth 440 M$ (= 0.10*0.10*10*4.4*10^9).
You could imagine a Yudkowsky endorsement (say with the narrative that Zuck talked to him and admits he went about it all wrong and is finally taking the issue seriously just to entertain the counterfactual...) to raise meta AI from “nobody serious wants to work there and they can only get talent by paying exorbitant prices” to “they finally have access to serious talent and can get a critical mass of people to do serious work”. This’d arguably be more valuable than whatever they’re doing now.
I think your answer to the question of how much an endorsement would be worth mostly depends on some specific intuitions that I imagine Kulveit has for good reasons but most people don’t, so it’s a bit hard to argue about it. It also doesn’t help that in every other case than Anthropic and maybe deepmind it’d also require some weird hypotheticals to even entertain the possibility.
It is somewhat difficult to react to this level of absolutely incredible nonsense politely, but I’ll try.
I disagree with both Yudkowsky and Soares about many things, but very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.
For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
“very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.”
“Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.”
fwiw both of these claims strike me as close to nonsense, so I don’t think this is a helpful reaction.
If you ask the AIs they get numbers in the tens of millions to tens of billions range, with around 1 billion being the central estimate. (I haven’t extensively controlled for the effect and some calculations appear driven by narrative)
Personally I find it hard to judge and tend to lean no when trying to think it through, but it’s not obviously nonsense.
I agree with Ben Stewart’s response that this is not a helpful thing to say. You are making some very strange and unintuitive claims. I can’t imagine how you would persuade a reasonable, skeptical, well-informed person outside the EA/LessWrong (or adjacent) bubble that these are credible claims, let alone that they are true. (Even within the EA Forum bubble, it seems like significantly more people disagree with you than agree.)
To pick on just one aspect of this claim: it is my understanding that Yudkowsky has no meaningful technical proficiency with deep learning-based or deep reinforcement learning-based AI systems. In my understanding, Yudkowsky lacks the necessary skills and knowledge to perform the role of an entry-level AI capabilities researcher or engineer at any AI company capable of paying multi-million-dollar salaries. If there is evidence that shows my understanding is mistaken, I would like to see that evidence. Otherwise, I can only conclude that you are mistaken.
I think the claim that an endorsement is worth billions of billions is also wrong, but it’s hard to disprove a claim about what would happen in the event of a strange and unlikely hypothetical. Yudkowsky, Soares, and MIRI have an outsized intellectual influence in the EA community (and obviously on LessWrong). There is some meaningful level of influence on the community of people working in the AI industry in the Bay Area, but it’s much less. Among the sort of people who could make decisions that would realize billions or tens of billions in value, namely the top-level executives at AI companies and investors, the influence seems pretty marginal. I would guess the overwhelming majority of investors either don’t know who Yudkowsky and Soares are or do but don’t care what their views are. Top-level executives do know who Yudkowsky is, but in every instance I’ve seen, they tend to be politely disdainful or dismissive toward his views on AGI and AI safety.
Anyway, this seems like a regrettably unproductive and unimportant tangent.
I think it could be a helpful response for people who are able to respond to signals of the type “someone who has demonstrably good forecasting skills, is an expert in the field, and works on this long time claims X” by at least re-evaluating if their models make sense and are not missing some important considerations.
If someone is at least able to that, they can for example ask a friendly AI or some other friendly AI and they will tell you, based on conservative estimates and reference classes, that the original claim is likely wrong. They will still miss important considerations—in a way in which typical forecaster would also do—so the results are underestimates.
I think at the level of [some combination of lack of ability to think and motivated reasoning] when people are uninterested in e.g. sanity checking their thinking with AIs, it is not worth the time correcting them. People are wrong on the internet all the time.
(I think the debate was moderately useful—I made an update from this debate & voting patterns, broadly in the direction EA Forum descending to a level of random place on the internet where confused people talk about AI and it is broadly not worth to read or engage. I’m no longer that much active on EAF, but I’ve made some update)
This thread seems to have gone in an unhelpful direction.
Questioning motivations is a hard point to make well. I’m unwilling to endorse that they are never relevant, but it immediately becomes personal. Keeping the focus primarily on the level of the arguments themselves is an approach more likely to enlighten and less likely to lead to flamewars.
I’m not here to issue a moderation warning to anyone for the conversation ending up on the point of motivations. I do want to take my moderation hat off and suggest that people spend more time on the object level.
I will then put my moderation hat back on and say that this and Jan’s previous comment breaks norms. You can disagree with someone without being this insulting.
I agree the thread direction may be unhelpful, and flame wars are bad.
I disagree though about the merits of questioning motivations, I think its super important.
In the AI sphere, there are great theoretical arguments on all sides, good arguments for accelleration, caution, pausing etc. We can discuss these ad nauseum and I do think that’s useful. But I think motivations likely shape the history and current state of AI development more than unmotivated easoning and rational thought. Money and Power are strong motivators—EA’s have sidelined them at their peril before. Although we cannot know people’s hearts, we can see and analyse what they have done and said in the past and what motivational pressure might affect them right now.
I also think its possislbe to have a somewhat object level about motivations.
I think this article on the history of Modern AI outlines some of this well https://substack.com/home/post/p-185759007
I might write more about this later...
Hi Jan.
Are you open to bets about this? I would be happy to bet 10 k$ that Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good. We could ask the marketing team at Anthropic or marketing experts elsewhere. I am not officially proposing a bet just yet. We would have to agree on a concrete operationalisation.
This doesn’t seem to be a reasonable way to operationalize. It would create much less value for the company if it was clear that they were being paid for endorsing them. And I highly doubt Amodei would be in a position to admit that they’d want such an endorsement even if it indeed benefitted them.
Thanks for the good point, Nick. I still suspect Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good if they were hypothetically being honest. I understand this is difficult to operationalise, but it could still be asked to people outside Anthropic.
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
What about this:
MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their—secret—plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.
(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)
Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept” ; clearly MIRI is not making the offer because the labs don’t have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take. I’m happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like “3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated”.
When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?
It’s not endorsing a specific model for marketing reasons; it’s about endorsing the effort, overall.
Given that Meta is willing to pay billions of dollars for people to join them, and that many people don’t work on AI capabilities (or work, e.g., at Anthropic, as a lesser evil) because they share their concerns with E&S, an endorsement from E&S would have value in billions-tens of billions simply because of the talent that you can get as a result of this.
Meta is paying billions of dollars to recruit people with proven experience at developing relevant AI models.
Does the set of “people with proven experience in building AI models” overlap with “people who defer to Eliezer on whether AI is safe” at all? I doubt it.
Indeed given that Yudkowsky’s arguments on AI are not universally admired and people who have chosen building the thing he says will make everybody die as their career are particularly likely to be sceptical about his convictions on that issue, an endorsement might even be net negative.
Thanks for the comment, Mikhail. Gemini 3 estimates a total annualised compensation of the people working at Meta Superintelligence Labs (MSL) of 4.4 billion $. If an endorsement from Yudkowsky and Soares was as beneficial (including via bringing in new people) as making 10 % of people there 10 % more impactful over 10 years, it would be worth 440 M$ (= 0.10*0.10*10*4.4*10^9).
You could imagine a Yudkowsky endorsement (say with the narrative that Zuck talked to him and admits he went about it all wrong and is finally taking the issue seriously just to entertain the counterfactual...) to raise meta AI from “nobody serious wants to work there and they can only get talent by paying exorbitant prices” to “they finally have access to serious talent and can get a critical mass of people to do serious work”. This’d arguably be more valuable than whatever they’re doing now.
I think your answer to the question of how much an endorsement would be worth mostly depends on some specific intuitions that I imagine Kulveit has for good reasons but most people don’t, so it’s a bit hard to argue about it. It also doesn’t help that in every other case than Anthropic and maybe deepmind it’d also require some weird hypotheticals to even entertain the possibility.