Academic philosopher, co-editor of utilitarianism.net, writes goodthoughts.blog
Richard Y Chappellšø
You keep describing possible scenarios in which the actual value of averting extinction would be low. Do you not understand the scenario conditions under which the actual value would be high? Iām suggesting that those kinds of scenario (in which astronomically valuable futures are reliably securable so long as we avoid extinction this century) seem reasonably credible to me, which is what grounds my background belief in x-risk mitigation having high expected value.
Edited to add: You may in a sense be talking about āexpected valueā, but insofar as it is based on a single model of how you expect the future to go (ārapid diminutionā), it seems problematically analogous to actual value ā like the fallacy I diagnose in āRule High Stakes In, Not Outā. We need to abstract from that specific model and ask how confident we should be in that one model compared to competing ones, and thus reach a kind of higher-order (or all models considered) expected value estimate.The point of RHSINO is that the probability you assign to low stakes models (like rapid diminution) makes surprisingly little difference: you could assign them 99% probability and that still wouldnāt establish that our all-models-considered EV must be low. Our ultimate judgment instead depends more on what credibility we assign to the higher-stakes models/āscenarios.
I really want to stress the difference between saying something will (definitely) happen versus saying thereās a credible chance (>1%) that it will happen. Theyāre very different claims!
Lovelace and Menabrea probably should have regarded their time as disproportionately likely (compared to arbitrary decades) to see continued rapid progress. Thatās compatible with thinking it overwhelmingly likely (~99%) that theyād soon hit a hurdle.
As a heuristic, ask: if one were, at the end of history, to plot the 100 (or even just the 50) greatest breakthrough periods in computer science prior to ASI (of course there could always be more breakthroughs after that), should we expect our current period to make the cut? I think it would be incredible to deny it.
Thanks for explaining your view!
On the first point: I think we should view ASI as disproportionately likely in decades that already feature (i) recent extraordinary progress in AI capabilities that surprises almost everyone, and (ii) a fair number of experts in the field who appear to take seriously the possibility that continued progress in this vein could soon result in ASI.
Iād then think we should view it as disproportionately unlikely that ASI will either be (a) achieved before any such initial signs of impressive progress, OR (b) achieved centuries after such initial progress. (If not achieved within a century, Iād think it more likely to be outright unachievable.)
I donāt really know enough about AI safety research to comment on the latter disagreement. Iām curious to hear othersā views.
Iām a bit confused by this response. Are you just saying that high expected value is not sufficient for actual value because we might get unlucky?
Some possible extinction-averting events merely extend life for 1 second, and so provide little value. Other possibilities offer far greater extensions. Obviously the latter possibilities are the ones that ground high expected value estimates.
I think Shulmanās points give us reason to think thereās a non-negligible chance of averting extinction (extending civilization) for a long time. Pointing out that other possibilities are also possible doesnāt undermine this claim.
Distinguish pro tanto vs all-things-considered (or ānetā) high stakes. The statement is literally true of pro tanto high stakes: the 1% chance of extremely high stakes is by itself, as far as it goes, an expected high stake. But itās possible that this high stake might be outweighed or cancelled out by other sufficiently high stakes among the remaining probability space (hence the subsequent parenthetical about āunless one inverts the high stakes in a way that cancels out...ā).
The general lesson of my post is that saying āthereās a 99% chance thereās nothing to see hereā has surprisingly little influence on the overall expected value. You canāt show the expected stakes are low by showing that itās extremely likely that the actual stakes are low. You have to focus on the higher-stakes portions of probability space, even if small (but non-negligible).
The responses to my comment have provided a real object lesson to me about how a rough throwaway remark (in this case: my attempt to very briefly indicate what my other post was about) can badly distract readers from oneās actual point! Perhaps I would have done better to entirely leave out any positive attempt to here describe the content of my other post, and merely offer the negative claim that it wasnāt about asserting specific probabilities.
My brief characterization was not especially well optimized for conveying the complex dialectic in the other post. Nor was it asserting that my conclusion was logically unassailable. I keep saying that if anyone wants to engage with my old post, Iād prefer that they did so in the comments to that postāensuring that they engage with the real post rather than the inadequate summary I gave here. My ultra-brief summary is not an adequate substitute, and was never intended to be engaged with as such.
On the substantive point: Of course, ideally one would like to be able to āmodel the entire space of possibilitiesā. But as finite creatures, we need heuristics. If you think my other post was offering a bad heuristic for approximating EV, Iām happy to discuss that more over there.
On (what I take to be) the key substantive claim of the post:
I think that nontrivial probability assignments to strong and antecedently implausible claims should be supported by extensive argument rather than manufactured probabilities.
There seems room for people to disagree on priors about which claims are āstrong and antecedently implausibleā. For example, I think Carl Shulman offers a reasonably plausible case for existential stability if we survive the next few centuries. By contrast, I find a lot of Davidās apparent assumptions about which propositions warrant negligible credence to be extremely strong and antecedently implausible. As I wrote in x-risk agnosticism:
David Thorstad seems to assume that interstellar colonization could not possibly happen within the next two millennia. This strikes me as a massive failure to properly account for model uncertainty. I canāt imagine being so confident about our technological limitations even a few centuries from now, let alone millennia. He also holds the suggestion that superintelligent AI might radically improve safety to be āgag-inducingly counterintuitiveā, which again just seems a failure of imagination. You donāt have to find it the most likely possibility in order to appreciate the possibility as worth including in your range of models.
I think itās important to recognize that reasonable people can disagree about what they find antecedently plausible or implausible, and to what extent. (Also: some eventsālike your home burning down in a fireāmay be āimplausibleā in the sense that you donāt regard them as outright likely to happen, while still regarding them as sufficiently probable as to be worth insuring against.)
Such disagreements may be hard to resolve. One canāt simply assume that oneās own priors are objectively justified by default whereas oneās interlocutor is necessarily unjustified by default until āsupported by extensive argumentā. Thatās just stacking the deck.
I think a healthier dialectical approach involves stepping back to more neutral ground, and recognizing that if you want to persuade someone who disagrees with you, you will need to offer them some argument to change their mind. Of course, itās fine to just report oneās difference in view. But insisting, āYou must agree with my priors unless you can provide extensive argument to support a different view, otherwise Iāll accuse you of bad epistemics!ā is not really a reasonable dialectical stance.
If the suggestion is instead that one shouldnāt attempt to assign probabilities at all then I think this gets into the problems I explore in Good Judgment with Numbers and (especially) Refusing to Quantify is Refusing to Think, that it effectively implies giving zero weight. But we can often be in a position to know that a non-zero (and indeed non-trivially positive) estimate is better than zero, even if we canāt be highly confident of precisely what the ideal estimate would be.
Hi David, Iām afraid you might have gotten caught up in a tangent here! The main point of my comment was that your post criticizes me on the basis of a misrepresentation. You claim that my āprimary argumentative move is to assign nontrivial probabilities without substantial new evidence,ā but actually thatās false. Thatās just not what my blog post was about.
In retrospect, I think my attempt to briefly summarize what my post was about was too breezy, and misled many into thinking that its point was trivial. But it really isnāt. (In fact, Iād say that my core point there about taking higher-order uncertainty into account is far more substantial and widely neglected than the ānaming gameā fallacy that you discuss in the present post!) I mention in another comment how it applied to Schwitzgebelās ānegligibility argumentā against longtermism, for example, where he very explicitly relies on a single constant probability model in order to make his case. Failing to adequately take model uncertainty into account is a subtle and easily-overlooked mistake!
A lot of your comment here seems to misunderstand my criticism of your earlier paper. Iām not objecting that you failed to share your personal probabilities. Iām objecting that your paper gives the impression that longtermism is undermined so long as the time of perils hypothesis is judged to be likely false. But actually the key question is whether its probability is negligible. Your paper fails to make clear what the key question to assess is, and the point of my āRule High Stakes Inā post is to explain why itās really the question of negligibility that matters.
To keep discussions clean and clear, Iād prefer to continue discussion of my other post over on that post rather than here. Again, my objection to this post is simply that it misrepresented me.
Itās not a psychological question. I wrote a blog post offering a philosophical critique of some published academic papers that, it seemed to me, involved an interesting and important error of reasoning. Anyone who thinks my critique goes awry is welcome to comment on it there. But whether my philosophical critique is ultimately correct or not, I donāt think that the attempt is aptly described as āpersonal insultā, āridiculous on [its] faceā, or ācorrosive to productive, charitable discussionā. Itās literally just doing philosophy.
Iād like it if people read my linked post before passing judgment on it.
The meta-dispute here isnāt the most important thing in the world, but for clarityās sake, I think itās worth distinguishing the following questions:
Does a specific textāThorstad (2022)āeither actually or apparently commit a kind of ābest model fallacyā, arguing as though establishing Time of Perils hypothesis as unlikely to be true thereby suffices to undermine longtermism?
Does another specific textāmy āRule High Stakes In, Not Outāāeither actually or apparently have as its āprimary argumentative move⦠to assign nontrivial probabilities without substantial new evidenceā?
My linked post suggests that the answer to Q1 is āYesā. I find it weird that others in the comments here are taking stands on this textual dispute a priori, rather than by engaging with the specifics of the text in question, the quotes I respond to, etc.
My primary complaint in this comment thread has simply been that the answer to Q2 is āNoā (if you read my post, youāll see that itās instead warning against what Iām now calling the ābest model fallacyā, and explaining how I think various other writingsāincluding Thorstadāsāseem to go awry as a result of not attending to this subtle point about model uncertainty). The point of my post is not to try to assert or argue for any particular probability assignment. Hence Thorstadās current blog post misrepresents mine.
***
Thereās a more substantial issue in the background:
Q3. What is the most reasonable prior probability estimate to assign to the time of perils hypothesis? In case of disagreement, does one party bear a special āburden of proofā to convince the other, who should otherwise be regarded as better justified by default?
I have some general opinions about the probability being non-negligibleāI think Carl Shulman makes a good case hereābut itās not something Iām trying to argue about with those who regard it as negligible. I donāt feel like I have anything distinctive to contribute on that question at this time, and prefer to focus my arguments on more tractable points (like the point I was making about the best model fallacy). I independently think Thorstad is wrong about how the burden of proof applies, but thatās an argument for another day.
So I agree that there is some ātalking pastā happening here. Specifically, Thorstad seems to have read my post as addressing a different question (and advancing a different argument) than what it actually does, and made unwarranted epistemic charges on that basis. If anyone thinks my āRule High Stakes Inā post similarly misrepresents Thorstad (2022), theyāre welcome to make the case in the comments to that post.
As I see it, I responded entirely reasonably to the actual text of what you wrote. (Maybe what you wrote gave a misleading impression of what you meant or intended; again, I made no claims about the latter.)
Is there a way to mute comment threads? Pursuing this disagreement further seems unlikely to do anyone any good. For what itās worth, I wish you well, and Iām sorry that I wasnāt able to provide you with the agreement that youāre after.
Honestly, I still think my comment was a good one! I responded to what struck me as the most cruxy claim in your post, explaining why I found it puzzling and confused-seeming. I then offered what I regard as an important corrective to a bad style of thinking that your post might encourage, whatever your intentions. (I made no claims about your intentions.) Youāre free to view things differently, but I disagree that there is anything ādiscourteousā about any of this.
Thereās āunderstandingā in the weak sense of having the info tokened in a belief-box somewhere, and then thereās understanding in the sense of never falling for tempting-but-fallacious inferences like those I discuss in my post.
Have you read the paper I was responding to? I really donāt think itās at all āobviousā that all āhighly trained moral philosophersā have internalized the point I make in my blog post (that was the whole point of my writing it!), and I offered textual support. For example, Thorstad wrote: āthe time of perils hypothesis is probably false. I conclude that existential risk pessimism may tell against the overwhelming importance of existential risk mitigation.ā This is a strange thing to write if he recognized that merely being āprobably falseā doesnāt suffice to threaten the longtermist argument!
(Edited to add: the obvious reading is that heās making precisely the sort of ābest model fallacyā that I critique in my post: assessing which empirical model we should regard as true, and then determining expected value on the basis of that one model. Even very senior philosophers, like Eric Schwitzgebel, have made the same mistake.)
Going back to the OPās claims about what is or isnāt āa good way to argue,ā I think itās important to pay attention to the actual text of what someone wrote. Thatās what my blog post did, and itās annoying to be subject to criticism (and now downvoting) from people who arenāt willing to extend the same basic courtesy to me.
This sort of āmany godsā-style response is precisely what I was referring to with my parenthetical: āunless one inverts the high stakes in a way that cancels out the other high-stakes possibility.ā
I donāt think that dystopian ātime of carolsā scenarios are remotely as credible as the time of perils hypothesis. If someone disagrees, then certainly resolving that substantive disagreement would be important for making dialectical progress on the question of whether x-risk mitigation is worthwhile or not.
What makes both arguments instances of the nontrivial probability gambit is that they do not provide significant new evidence for the challenged claims. Their primary argumentative move is to assign nontrivial probabilities without substantial new evidence.
I donāt think this is a good way to argue. I think that nontrivial probability assignments to strong and antecedently implausible claims should be supported by extensive argument rather than manufactured probabilities.
Iād encourage Thorstad to read my post more carefully and pay attention to what I am arguing there. I was making an in principle point about how expected value works, highlighting a logical fallacy in Thorstadās published work on this topic. (Nothing in the paper I responded to seemed to acknowledge that a 1% chance of the time of perils would suffice to support longtermism. He wrote about the hypothesis being āinconclusiveā as if that sufficed to rule it out, and I think itās important to recognize that this is bad reasoning on his part.)
Saying that my āprimary argumentative move is to assign nontrivial probabilities without substantial new evidenceā is poor reading comprehension on Thorstadās part. Actually, my primary argumentative move was explaining how expected value works. The numbers are illustrative, and suffice for anyone who happens to share my priors (or something close enough). Obviously, Iām not in that post trying to persuade someone who instead thinks the correct probability to assign is negligible. Thorstad is just radically misreading what my post is arguing.
(What makes this especially strange is that, iirc, the published paper of Thorstadās to which I was replying did not itself argue that the correct probability to assign to the ToP hypothesis is negligible, but just that the case for the hypothesis is āinconclusiveā. So it sounds like heās now accusing me of poor epistemics because I failed to respond to a different paper than the one he actually wrote? Geez.)
The AI bubble popping would be a strong signal that this [capabilities] optimism has been misplaced.
Are you presupposing that good practical reasoning involves (i) trying to picture the most-likely future, and then (ii) doing what would be best in that event (while ignoring other credible possibilities, no matter their higher stakes)?
It would be interesting to read a post where someone tries to explicitly argue for a general principle of ignoring credible risks in order to slightly improve most-probable outcomes. Seems like such a principle would be pretty disastrous if applied universally (e.g. to aviation safety, nuclear safety, and all kinds of insurance), but maybe thereās more to be said? But itās a bit frustrating to read takes where people just seem to presuppose some such anti-precautionary principle in the background.
To be clear: I take the decision-relevant background question here to not be the binary question Is AGI imminent? but rather something more degreed, like Is there a sufficient chance of imminent AGI to warrant precautionary measures? And I donāt see how the AI bubble popping would imply that answering āYesā to the latter was in any way unreasonable. (A bit like how you canāt say an election forecaster did a bad job just because their 40% candidate won rather than the one they gave a 60% chance to. Sometimes seeing the actual outcome seems to make people worse at evaluating othersā forecasts.)
Some supporters of AI Safety may overestimate the imminence of AGI. Itās not clear to me how much of a problem that is? (Many people overestimate risks from climate change. That seems important to correct if it leads them to, e.g., anti-natalism, or to misallocate their resources. But if it just leads them to pollute less, then it doesnāt seem so bad, and Iād be inclined to worry more about climate change denialism. Similarly, I think, for AI risk.) There are a lot more people who persist in dismissing AI risk in a way that strikes me as outrageously reckless and unreasonable, and so that seems by far the more important epistemic error to guard against?
That said, Iād like to see more people with conflicting views about AGI imminence arrange public bets on the topic. (Better calibration efforts are welcome. Iām just very dubious of the OPās apparent assumption that losing such a bet ought to trigger deep āsoul-searchingā. Itās just not that easy to resolve deep disagreements about what priors /ā epistemic practices are reasonable.)
Quick clarification: My target here is not so much people with radically different empirical beliefs (such that they regard vaccines as net-negative), but rather the particular form of status quo bias that I discuss in the original post.
My guess is that for relatively elite audiences (like those who read philosophy blogs), theyāre unlikely to feel attached to this status quo bias as part of their identity, but their default patterns of thought may lead them to (accidentally, as it were) give it more weight than it deserves. So a bit of heated rhetoric and stigmatization of the thought-pattern in question may help to better inoculate them against it.
(Just a guess though ā I could be wrong!)
I think if some people are importantly right about something big, and others (esp. with more power) are importantly wrong, itās worth cheerleading getting things right even if it happens to correlate with your in-group!
Interesting post! Re: āhow spotlight sizes should be chosenā, I think a natural approach is to think about the relative priorities of representatives in a moral parliament. Take the meat eater problem, for example. Suppose you have some mental representatives of human interests, and some representatives of factory farmed animal interests. Then we can ask each representative: āHow high a priority is it for you to get your way on whether or not to prevent this child from dying of malaria?ā The human representatives will naturally see this as a very high priorityāwe donāt have many better options for saving human lives. But the animal representatives, even if they arenāt thrilled by retaining another omnivore, have more pressing priorities than trying to help animals by eliminating meat-eaters one by one. Given how incredibly cost-effective animal-focused charities can be, it will make sense for them to make the moral trade: āOK, save this life, but then letās donate more to the Animal Welfare Fund.ā
Of course, for spotlighting to work out well for all representatives, itās going to be important to actually follow through on supporting the (otherwise unopposed) top priorities of neglected representatives (like those for wild animal welfare). But I think the basic approach here does a decent job of capturing why it isnāt intuitively appropriate to take animal interests into account when deciding whether to save a personās life. In short: insofar as we want to take animal interests into account, there are better ways to do it, that donāt require creating conflict with another representativeās top priorities. Avoiding such suboptimal conflict, and instead being open to moral trade, seems an important part of being a āgood moral colleagueā.
Funnily enough, the main example that springs to mind is the excessive self-flagellation post-FTX. Many distanced themselves from the community and its optimizing norms/āmindsetāfor understandable reasons, but ones more closely tied to āexpressingā (and personal reputation management) than to actually āhelpingā, IMO.
Iād be curious to hear if others think of further candidate examples.
Ok, thanks for expanding upon your view! It sounds broadly akin to how Iām inclined to address Pascalās Mugging cases (treat the astronomical stakes as implying proportionately negligible probability). Astronomical stakes from x-risk mitigation seems much more substantively credible to me, but I donāt have much to add at this point if you donāt share that substantive judgment!