Thanks for winding back through the conversation so far, as you understood it; that helped me understand better where you’re coming from.
He replied by pointing out that MIRI promotes AI risk as an organization and there’s no equivalent organization putting out arguments against AI risk.
Nuno said: “Idk, I can’t help but notice that your title at MIRI is ‘Research Communications’, but there is nobody paid by the ‘Machine Intelligence Skepticism Institute’ to put forth claims that you are wrong.”
I interpreted that as Nuno saying: MIRI is giving arguments for stuff, but I cited an allegation that CFAR is being dishonest, manipulative, and one-sided in their evaluation of AI risk arguments, and I note that MIRI is a one-sided doomer org that gives arguments for your side, while there’s nobody paid to raise counter-points.
My response was a concrete example showing that MIRI isn’t a one-sided doomer org that only gives arguments for doom. That isn’t a proof that we’re correct about this stuff, but it’s a data point against “MIRI is a one-sided doomer org that only gives arguments for doom”. And it’s at least some evidence that we aren’t doing the specific dishonest thing Nuno accused CFAR of doing, which got a lot of focus in the OP.
I said this doesn’t mean much because the vast majority of writing by you and MIRI emphasizes AI risks.
The specific thing you said was: “I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI. ”
My reply mostly wasn’t an objection to “I’m not sure this is much of a rebuttal to Nuno’s point” or “This is a single post”. My objection was to “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”. As I said to Nuno upthread:
[… O]ne blog post is a small data point to weigh against lots of other data points, but the relevant data to weigh it against isn’t “MIRI wrote other things that emphasize risks from AGI” in isolation, as though “an organization or individual wrote a lot of arguments for X” on its own is strong reason to discount those arguments as filtered.
The thing doing the work has to be some background model of the arguers (or of some process upstream of the arguers), not a raw count of how often someone argues for a thing. Otherwise you run into the “damned if you argue a lot for X, damned if you don’t argue a lot for X” problem.
Regarding those models of MIRI and other orgs in the space, and of upstream processes that might influence us:
I think you and Nuno are just wrong to think of MIRI as “an org organized around trying to make people more pessimistic about AI outcomes”, any more than FHI is an org organized around trying to make people think anthropics, whole-brain emulation, and biosecurity are really important. Those are things that people at FHI tend to believe, but that’s because researchers there (rightly or wrongly) looked at the arguments and reached that conclusion, while at the same time looking at other topics and concluding they weren’t very important (e.g., brain-computer interfaces, nuclear fusion, and asteroid risk). If FHI researchers had evaluated the arguments differently, the organization would have continued existing, just with a different set of research interests.
Similarly, MIRI was originally an accelerationist org, founded with a goal of advancing humanity to AGI as quickly as possible. We had an incentive to think AGI is important, but not (AFAICT) to think AGI is scary. “Oh wait, AGI is scary” is a conclusion Eliezer came to in the first few years of MIRI’s existence, via applying more scrutiny to his assumption that AGI would go great by default.
I’m all in favor of asking questions like “What were the myopic financial incentives in this case?” and seeing how much behavior this predicts. But I think applying that lens in an honest way should sometimes result in “oh weird, that behavior is the opposite of the one I’d naively have predicted with this model”, as opposed to it being a lens that can explain every observation equally.
MIRI deciding that AGI is scary and risky, in an excited techno-optimist social environment and funding landscape, seems like a canonical example of something different from that going on.
(Which doesn’t mean MIRI was right, then or now. People can be wrong for reasons other than “someone was paying me to be wrong”.)
Our first big donor, Peter Thiel, got excited about us because he thought of us as techno-optimists, and stopped supporting us within a few years when he concluded we were too dour about humanity’s prospects. This does not strike me as a weird or surprising outcome, except insofar as it’s weird someone in Thiel’s reference class took an interest in MIRI even temporarily.
I don’t think MIRI has more money today than if we were optimists about AI. I also don’t think we crystal-ball-predicted that funders like Open Phil would exist 5 or 15 years in the future, or that they’d have any interest in “superintelligent AI destroys the world” risk if they did exist. Nor do I think we’ve made more money, expanded more, felt better about ourselves, or had more-fun social lives via our opening up in 2020-2023 that we’ve become even more pessimistic and think things are going terribly, both at MIRI and in the alignment field at large.
Speaking to the larger question: is there a non-epistemic selection effect in the world at large, encouraging humanity to generate more arguments for AI risk than against it? This does not follow from the mere observation of a bunch of arguments for AI risk, because that observation is also predicted by those arguments being visibly correct, and accepted and shared because of their correctness.
For different groups, I’d guess that...
Random academics probably have a myopic incentive to say things that sound pretty respectable and normal, as opposed to wild and sensational. Beyond that, I don’t think there’s a large academia-wide incentive to either be pro-tech or anti-tech, or to have net-optimistic or net-pessimistic beliefs about AI in particular. There is a strong incentive to just ignore the topic, since it’s hard to publish papers about it in top journals or conferences.
Journalists do have an incentive to say things that sound sensational, both positive (“AI could transform the world in amazing positive way X!”) and negative (“AI could transform the world in horrifying way Y”). I’d guess there’s more myopic incentive to go negative than positive, by default. That said, respected newspapers tend to want to agree with academics and sound respectable and normal, which will similarly encourage a focus on small harms and small benefits. I don’t know how these different forces are likely to balance out, though I can observe empirically that I see a wide range of views expressed, including a decent number of articles worrying about AI doom.
The social network MIRI grew out of (transhumanists, Marginal-Revolution-style libertarians, extropians, techno-utopians, etc.) has strong myopic social incentives to favor accelerationism, “tech isn’t scary”, “regulation and safety concerns cause way more harm than the tech itself”, etc. The more optimistic you are about the default outcome of rapid technological progress, the better.
Though I think these incentives have weakened over the last 20 years, in large part due to MIRI persuading a lot of transhumanists to worry about misaligned AGI in particular, as a carve-out from our general techno-optimism.
EA circa 2010 probably had myopic incentives to not worry much about AGI doom, because “AGI breaks free of our control and kills us all” sounds weird and crankish, and didn’t help EA end malaria or factory farming any faster. And indeed, the earliest write-ups on AI risk by Open Phil and others strike me as going out of their way to talk about milder risks and be pretty cautious, abstract, and minimal in how they addressed “superintelligence takes over and kills us”, much more so than recent material like Cold Takes and the 80K Podcast. (Even though it’s not apparent to me that there’s more evidence for “superintelligence takes over and kills us” now than there was in 2014.)
EA circa 2023 probably has myopic incentives to have “medium-sized” probabilities of AI doom—unlike in the early days, EA leadership and super-funders like Open Phil nowadays tend to be very worried about AGI risk, which creates both financial and social incentives to look similarly worried about AI. The sweet spot is probably to think AI is a big enough risk to take seriously, but not as big as the weirder orgs like MIRI think. Within EA, this is a respected and moderate-sounding position, whereas in ML or academia even having a 10% chance of AGI doom might make you sound pretty crazy.
(Obviously none of this is true across the board, and different social networks within EA will have totally different local social incentives—some EA friend groups will think you’re dumb if you think AI risk is worth thinking about at all, some will think you’re dumb if your p(doom) is below 90%, and so on for a variety of different probability ranges. There’s a rich tapestry of diverging intuitions about which views are crazy here.)
The myopic incentives for ML itself, both financial and social, probably skew heavily toward “argue against ML being scary or dangerous at all”, mitigated mainly by a desire to sound moderate and respectable.
The “moderate and respectable” goal pushes toward ML people acknowledging that there are some risks, but relatively minor ones — this feels like a safe and sober middle ground between “AI is totally risk-free and there will be no problems” and “there’s a serious risk of AI causing a major global catastrophe”.
“Moderate and respectable” also pushes against ML people arguing against AGI risk because it pushes for ML people to just not talk about the subject at all. (Though I’d guess this is a smaller factor than “ML people don’t feel like they have a strong argument, and don’t want to broach the topic if there isn’t an easy powerful rebuttal”. People tend to be pretty happy to dunk on views they think are crazy—e.g., on social media—if they have a way of pointing at something about the view that their peers will be able to see is clearly wrong.)
I would say that the most important selection effect is ML-specific (favoring lower p(doom)), because ML is “the experts” who smart people would most naturally want to defer to, is a lot larger than the AI x-risk ecosystem (and especially larger than the small part of the x-risk ecosystem that has way higher p(doom) than Nuno), and ML researchers can focus a large share of their attention on generating arguments for “ML is not scary or dangerous at all”, whereas journalists, academia-at-large, etc. have their attention split between AI and a thousand other topics.
But mostly my conclusion from all this, and from the history of object-level discussion so far, is that there just aren’t super strong myopic incentives favoring either “humanity only generates arguments for higher p(doom)” or “humanity only generates arguments for lower p(doom)”. There’s probably some non-epistemic tilt toward “humanity generates more arguments against AI risk than for AI risk”, at least within intellectual circles (journalism may be another matter entirely). But I don’t think the arguments are so impenetrably difficult to evaluate on their own terms, or so scarce (on anyone’s side), that it ends up mattering much.
From inside MIRI, it appears much more plausible to me that we’ve historically understated how worried we are about AI, than that we’ve historically overstated it. (Which seems like a mistake to me now.) And I think our arguments are good on their own terms, and the reasoning checks out. Selection effects strike me as a nontrivial but minor factor in all of this.
I don’t think everyone has access to the same evidence as me, so I don’t think everyone should have probabilities as doomy as mine. But the above hopefully explains why I disagree with “the selection effects argument packs a whole lot of punch behind it”, as well as “having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect”.
I take the latter to be asserting, not just that Nuno thinks he lacks enough evidence to have 70% p(doom in our lifetime), but that he places vanishingly small probability on anyone else having the evidence required to have an extreme belief about this question.
Showing that this is overconfident on Nuno’s part requires a lot less evidence than providing a full decomposition of all the factors going into my level of worry about AGI: it should be easier for us to reach agreement that the other point of view isn’t crazy than for us to reach agreement about all the object-level questions.
I’m sorry, I’m not sure I understood correctly. Are you saying you agree there are selection effects, but you object to how you think Nuno and I are modeling MIRI and the processes generating MIRI-style models on AGI?
I’m confused by your phrasing “there are selection effects”, because it sounds so trivial to me. Every widespread claim faces some nonzero amount of (non-epistemic) selection bias.
E.g., I’d assume that twelve-syllable sentences get asserted at least slightly less often than eleven-syllable sentences, because they’re a bit more cumbersome. This is a non-epistemic selection effect, but it doesn’t cause me to worry that I’ll be unable to evaluate the truth of eleven- or twelve-syllable sentences for myself.
There are plenty of selection effects in the world, but typically they don’t put us into a state of epistemic helplessness; they just imply that it takes a bit of extra effort to dig up all the relevant arguments (since they’re out there, some just take some more minutes to find on Google).
When the world has already spent decades arguing about a question, and there are plenty of advocates for both sides of the question, selection effects usually mean “it takes you some more minutes to dig up all the key arguments on Google”, not “we must default to uncertainty no matter how strong the arguments look”. AI risk is pretty normal in that respect, on my view.
Thanks for winding back through the conversation so far, as you understood it; that helped me understand better where you’re coming from.
Nuno said: “Idk, I can’t help but notice that your title at MIRI is ‘Research Communications’, but there is nobody paid by the ‘Machine Intelligence Skepticism Institute’ to put forth claims that you are wrong.”
I interpreted that as Nuno saying: MIRI is giving arguments for stuff, but I cited an allegation that CFAR is being dishonest, manipulative, and one-sided in their evaluation of AI risk arguments, and I note that MIRI is a one-sided doomer org that gives arguments for your side, while there’s nobody paid to raise counter-points.
My response was a concrete example showing that MIRI isn’t a one-sided doomer org that only gives arguments for doom. That isn’t a proof that we’re correct about this stuff, but it’s a data point against “MIRI is a one-sided doomer org that only gives arguments for doom”. And it’s at least some evidence that we aren’t doing the specific dishonest thing Nuno accused CFAR of doing, which got a lot of focus in the OP.
The specific thing you said was: “I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI. ”
My reply mostly wasn’t an objection to “I’m not sure this is much of a rebuttal to Nuno’s point” or “This is a single post”. My objection was to “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”. As I said to Nuno upthread:
Regarding those models of MIRI and other orgs in the space, and of upstream processes that might influence us:
I think you and Nuno are just wrong to think of MIRI as “an org organized around trying to make people more pessimistic about AI outcomes”, any more than FHI is an org organized around trying to make people think anthropics, whole-brain emulation, and biosecurity are really important. Those are things that people at FHI tend to believe, but that’s because researchers there (rightly or wrongly) looked at the arguments and reached that conclusion, while at the same time looking at other topics and concluding they weren’t very important (e.g., brain-computer interfaces, nuclear fusion, and asteroid risk). If FHI researchers had evaluated the arguments differently, the organization would have continued existing, just with a different set of research interests.
Similarly, MIRI was originally an accelerationist org, founded with a goal of advancing humanity to AGI as quickly as possible. We had an incentive to think AGI is important, but not (AFAICT) to think AGI is scary. “Oh wait, AGI is scary” is a conclusion Eliezer came to in the first few years of MIRI’s existence, via applying more scrutiny to his assumption that AGI would go great by default.
I’m all in favor of asking questions like “What were the myopic financial incentives in this case?” and seeing how much behavior this predicts. But I think applying that lens in an honest way should sometimes result in “oh weird, that behavior is the opposite of the one I’d naively have predicted with this model”, as opposed to it being a lens that can explain every observation equally.
MIRI deciding that AGI is scary and risky, in an excited techno-optimist social environment and funding landscape, seems like a canonical example of something different from that going on.
(Which doesn’t mean MIRI was right, then or now. People can be wrong for reasons other than “someone was paying me to be wrong”.)
Our first big donor, Peter Thiel, got excited about us because he thought of us as techno-optimists, and stopped supporting us within a few years when he concluded we were too dour about humanity’s prospects. This does not strike me as a weird or surprising outcome, except insofar as it’s weird someone in Thiel’s reference class took an interest in MIRI even temporarily.
I don’t think MIRI has more money today than if we were optimists about AI. I also don’t think we crystal-ball-predicted that funders like Open Phil would exist 5 or 15 years in the future, or that they’d have any interest in “superintelligent AI destroys the world” risk if they did exist. Nor do I think we’ve made more money, expanded more, felt better about ourselves, or had more-fun social lives via our opening up in 2020-2023 that we’ve become even more pessimistic and think things are going terribly, both at MIRI and in the alignment field at large.
Speaking to the larger question: is there a non-epistemic selection effect in the world at large, encouraging humanity to generate more arguments for AI risk than against it? This does not follow from the mere observation of a bunch of arguments for AI risk, because that observation is also predicted by those arguments being visibly correct, and accepted and shared because of their correctness.
For different groups, I’d guess that...
Random academics probably have a myopic incentive to say things that sound pretty respectable and normal, as opposed to wild and sensational. Beyond that, I don’t think there’s a large academia-wide incentive to either be pro-tech or anti-tech, or to have net-optimistic or net-pessimistic beliefs about AI in particular. There is a strong incentive to just ignore the topic, since it’s hard to publish papers about it in top journals or conferences.
Journalists do have an incentive to say things that sound sensational, both positive (“AI could transform the world in amazing positive way X!”) and negative (“AI could transform the world in horrifying way Y”). I’d guess there’s more myopic incentive to go negative than positive, by default. That said, respected newspapers tend to want to agree with academics and sound respectable and normal, which will similarly encourage a focus on small harms and small benefits. I don’t know how these different forces are likely to balance out, though I can observe empirically that I see a wide range of views expressed, including a decent number of articles worrying about AI doom.
The social network MIRI grew out of (transhumanists, Marginal-Revolution-style libertarians, extropians, techno-utopians, etc.) has strong myopic social incentives to favor accelerationism, “tech isn’t scary”, “regulation and safety concerns cause way more harm than the tech itself”, etc. The more optimistic you are about the default outcome of rapid technological progress, the better.
Though I think these incentives have weakened over the last 20 years, in large part due to MIRI persuading a lot of transhumanists to worry about misaligned AGI in particular, as a carve-out from our general techno-optimism.
EA circa 2010 probably had myopic incentives to not worry much about AGI doom, because “AGI breaks free of our control and kills us all” sounds weird and crankish, and didn’t help EA end malaria or factory farming any faster. And indeed, the earliest write-ups on AI risk by Open Phil and others strike me as going out of their way to talk about milder risks and be pretty cautious, abstract, and minimal in how they addressed “superintelligence takes over and kills us”, much more so than recent material like Cold Takes and the 80K Podcast. (Even though it’s not apparent to me that there’s more evidence for “superintelligence takes over and kills us” now than there was in 2014.)
EA circa 2023 probably has myopic incentives to have “medium-sized” probabilities of AI doom—unlike in the early days, EA leadership and super-funders like Open Phil nowadays tend to be very worried about AGI risk, which creates both financial and social incentives to look similarly worried about AI. The sweet spot is probably to think AI is a big enough risk to take seriously, but not as big as the weirder orgs like MIRI think. Within EA, this is a respected and moderate-sounding position, whereas in ML or academia even having a 10% chance of AGI doom might make you sound pretty crazy.
(Obviously none of this is true across the board, and different social networks within EA will have totally different local social incentives—some EA friend groups will think you’re dumb if you think AI risk is worth thinking about at all, some will think you’re dumb if your p(doom) is below 90%, and so on for a variety of different probability ranges. There’s a rich tapestry of diverging intuitions about which views are crazy here.)
The myopic incentives for ML itself, both financial and social, probably skew heavily toward “argue against ML being scary or dangerous at all”, mitigated mainly by a desire to sound moderate and respectable.
The “moderate and respectable” goal pushes toward ML people acknowledging that there are some risks, but relatively minor ones — this feels like a safe and sober middle ground between “AI is totally risk-free and there will be no problems” and “there’s a serious risk of AI causing a major global catastrophe”.
“Moderate and respectable” also pushes against ML people arguing against AGI risk because it pushes for ML people to just not talk about the subject at all. (Though I’d guess this is a smaller factor than “ML people don’t feel like they have a strong argument, and don’t want to broach the topic if there isn’t an easy powerful rebuttal”. People tend to be pretty happy to dunk on views they think are crazy—e.g., on social media—if they have a way of pointing at something about the view that their peers will be able to see is clearly wrong.)
I would say that the most important selection effect is ML-specific (favoring lower p(doom)), because ML is “the experts” who smart people would most naturally want to defer to, is a lot larger than the AI x-risk ecosystem (and especially larger than the small part of the x-risk ecosystem that has way higher p(doom) than Nuno), and ML researchers can focus a large share of their attention on generating arguments for “ML is not scary or dangerous at all”, whereas journalists, academia-at-large, etc. have their attention split between AI and a thousand other topics.
But mostly my conclusion from all this, and from the history of object-level discussion so far, is that there just aren’t super strong myopic incentives favoring either “humanity only generates arguments for higher p(doom)” or “humanity only generates arguments for lower p(doom)”. There’s probably some non-epistemic tilt toward “humanity generates more arguments against AI risk than for AI risk”, at least within intellectual circles (journalism may be another matter entirely). But I don’t think the arguments are so impenetrably difficult to evaluate on their own terms, or so scarce (on anyone’s side), that it ends up mattering much.
From inside MIRI, it appears much more plausible to me that we’ve historically understated how worried we are about AI, than that we’ve historically overstated it. (Which seems like a mistake to me now.) And I think our arguments are good on their own terms, and the reasoning checks out. Selection effects strike me as a nontrivial but minor factor in all of this.
I don’t think everyone has access to the same evidence as me, so I don’t think everyone should have probabilities as doomy as mine. But the above hopefully explains why I disagree with “the selection effects argument packs a whole lot of punch behind it”, as well as “having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect”.
I take the latter to be asserting, not just that Nuno thinks he lacks enough evidence to have 70% p(doom in our lifetime), but that he places vanishingly small probability on anyone else having the evidence required to have an extreme belief about this question.
Showing that this is overconfident on Nuno’s part requires a lot less evidence than providing a full decomposition of all the factors going into my level of worry about AGI: it should be easier for us to reach agreement that the other point of view isn’t crazy than for us to reach agreement about all the object-level questions.
I’m sorry, I’m not sure I understood correctly. Are you saying you agree there are selection effects, but you object to how you think Nuno and I are modeling MIRI and the processes generating MIRI-style models on AGI?
I’m confused by your phrasing “there are selection effects”, because it sounds so trivial to me. Every widespread claim faces some nonzero amount of (non-epistemic) selection bias.
E.g., I’d assume that twelve-syllable sentences get asserted at least slightly less often than eleven-syllable sentences, because they’re a bit more cumbersome. This is a non-epistemic selection effect, but it doesn’t cause me to worry that I’ll be unable to evaluate the truth of eleven- or twelve-syllable sentences for myself.
There are plenty of selection effects in the world, but typically they don’t put us into a state of epistemic helplessness; they just imply that it takes a bit of extra effort to dig up all the relevant arguments (since they’re out there, some just take some more minutes to find on Google).
When the world has already spent decades arguing about a question, and there are plenty of advocates for both sides of the question, selection effects usually mean “it takes you some more minutes to dig up all the key arguments on Google”, not “we must default to uncertainty no matter how strong the arguments look”. AI risk is pretty normal in that respect, on my view.