Some thoughts on deference and inside-view models


  • It’s some­times rea­son­able to be­lieve things based on heuris­tic ar­gu­ments, but it’s use­ful to be clear with your­self about when you be­lieve things for heuris­tic rea­sons as op­posed to hav­ing strong ar­gu­ments that take you all the way to your con­clu­sion.

  • A lot of the time, I think that when you hear a heuris­tic ar­gu­ment for some­thing, you should be in­ter­ested in con­vert­ing this into the form of an ar­gu­ment which would take you all the way to the con­clu­sion ex­cept that you haven’t done a bunch of the steps—I think it’s healthy to have a map of all the ar­gu­men­ta­tive steps which you haven’t done, or which you’re tak­ing on faith.

  • I think that all the above can be com­bined to form a set of at­ti­tudes which are healthy on both an in­di­vi­d­ual and com­mu­nity level. For ex­am­ple, one way that our com­mu­nity could be un­healthy would be if peo­ple felt in­hibited to say when they don’t feel per­suaded by ar­gu­ments. But an­other un­healthy cul­ture would be if we acted like you’re a chump if you be­lieve things just be­cause peo­ple who you trust and re­spect be­lieve them. We should have a cul­ture where it’s okay to act on ar­gu­ments with­out hav­ing ver­ified ev­ery step for your­self, and you can ex­press con­fu­sion about in­di­vi­d­ual steps with­out that be­ing an act of re­bel­lion against the con­clu­sion of those ar­gu­ments.

I wrote this post to de­scribe the philos­o­phy be­hind the sched­ule of a work­shop that I ran in Fe­bru­ary. The work­shop is kind of like AIRCS, but aimed at peo­ple who are more hard­core EAs, less fo­cused on CS peo­ple, and with a cul­ture which is a bit less like MIRI and more like the cul­ture of other longter­mist EAs.

Thanks to the dozens of peo­ple who I’ve talked to about these con­cepts for their use­ful com­ments; thanks also to var­i­ous peo­ple who read this doc for their crit­i­cism. Many of these ideas came from con­ver­sa­tions with a va­ri­ety of EAs, in par­tic­u­lar Claire Za­bel, Anna Sala­mon, other staff of AIRCS work­shops, and the staff of the work­shop I’m go­ing to run.

I think this post isn’t re­ally in­sight­ful enough or well-ar­gued enough to jus­tify how ex­pan­sive it is. I posted it any­way be­cause it seemed bet­ter than not do­ing so, and be­cause I thought it would be use­ful to ar­tic­u­late these claims even if I don’t do a very good job of ar­gu­ing for them.

I tried to write the fol­low­ing with­out caveat­ing ev­ery sen­tence with “I think” or “It seems”, even though I wanted to. I am pretty con­fi­dent that the ideas I de­scribe here are a healthy way for me to re­late to think­ing about EA stuff; I think that these ideas are fairly likely to be a use­ful lens for other peo­ple to take; I am less con­fi­dent but think it’s plau­si­ble that I’m de­scribing ways that the EA com­mu­nity could be differ­ent that would be very helpful.

Part 1: ways of thinking

Proofs vs proof sketches

When I first heard about AI safety, I was con­vinced that AI safety tech­ni­cal re­search was use­ful by an ar­gu­ment that was some­thing like “su­per­in­tel­li­gence would be a big deal; it’s not clear how to pick a good goal for a su­per­in­tel­li­gence to max­i­mize, so maybe it’s valuable to try to figure that out.” In hind­sight this ar­gu­ment was mak­ing a bunch of hid­den as­sump­tions. For ex­am­ple, here are three ob­jec­tions:

  • It’s less clear that su­per­in­tel­li­gence can lead to ex­tinc­tion if you think that AI sys­tems will in­crease in power grad­u­ally, and be­fore we have AI sys­tems which are as ca­pa­ble of the whole of hu­man­ity we have AI sys­tems which are as ca­pa­ble as dozens of hu­mans.

  • Maybe some other crazy thing (whole brain em­u­la­tion, nan­otech, tech­nol­ogy-en­abled to­tal­i­tar­i­anism) is likely to hap­pen be­fore su­per­in­tel­li­gence, which would make work­ing on AI safety seem worse in a bunch of ways

  • Maybe it’s re­ally hard to work on tech­ni­cal AI safety be­fore you know more about the tech­nol­ogy that will be used to build AGI than we cur­rently know.

I think that all these ob­jec­tions are pretty rea­son­able, and I think that in fact there is a pretty good an­swer to all of them.

It seems like it in hind­sight it worked out well that I was in­stantly cre­d­u­lous of the AI safety ar­gu­ment, given that ten years later I’m still con­vinced by it—I don’t want to crit­i­cize my­self for epistemic moves which em­piri­cally worked fine. But I think it was a mis­take for me to not re­al­ize that I didn’t have an end-to-end story for AI safety be­ing im­por­tant, I just had a sketch of an ar­gu­ment which was heuris­ti­cally per­sua­sive.

I’m re­minded of the dis­tinc­tion be­tween proofs and proof sketches in math—in a proof, you’re sup­posed to take care of all the nig­gling de­tails, while in a proof sketch you can just gen­er­ally ges­ture at the kind of rea­son why some­thing might be true.

I think it’s cor­rect to be­lieve things when you can’t spell out the whole ar­gu­ment for them. But I think it’s good to be clear with your­self about when you’re do­ing that as op­posed to when you ac­tu­ally know the whole ar­gu­ment, be­cause if you aren’t clear about that, you have prob­lems like the fol­low­ing:

  • You will be worse at rea­son­ing with that ar­gu­ment and about that ar­gu­ment. By anal­ogy, when I’m study­ing an in­tel­lec­tual sub­ject like eco­nomics or math or biol­ogy, I’m con­stantly try­ing to pre­vent my­self from hav­ing a false illu­sion of un­der­stand­ing of what I’m read­ing, be­cause if I only have a fake un­der­stand­ing I won’t be able to ap­ply it cor­rectly.

  • If you are in a con­ver­sa­tion where that ar­gu­ment comes up, you might re­peat the ar­gu­ment with­out un­der­stand­ing whether it’s rele­vant.

  • If you hear a coun­ter­ar­gu­ment which should per­suade you that the origi­nal ar­gu­ment is wrong, you might not re­al­ize that you should change your mind.

  • If you talk to peo­ple about the ar­gu­ment and then turn out to not un­der­stand it, you’ll look like an ar­ro­gant and care­less fool; this re­flects badly on EA when it hap­pens, and it hap­pens of­ten. (i am par­tic­u­larly guilty of hav­ing done this one.)

I think it’s par­tic­u­larly healthy to some­times try to think about the world in terms of end-to-end ar­gu­ments for why what you’re do­ing is good. By this I mean try­ing to backchain all the way from your work to good out­comes in the world. Some­times I talk to peo­ple who are do­ing work that IMO won’t be very helpful. I think that of­ten they’re mak­ing the mis­take of not think­ing about the end to end pic­ture of how their work could be helpful. (Eg once I asked an AI safety re­searcher “Sup­pose your re­search pro­ject went as well as it could pos­si­bly go; how would it make it eas­ier to al­ign pow­er­ful AI sys­tems?”, and they said that they hadn’t re­ally thought about that. I think that this makes your work less use­ful.)

A key move here is the “notic­ing your con­fu­sion” move where you re­al­ize that an ar­gu­ment you be­lieved ac­tu­ally has a hole in it.

Know­ing where the “sor­rys” are

Here’s an ob­nox­ious com­puter sci­ence metaphor.

I’ve spent a bit of time play­ing around with proof as­sis­tants, which are pro­grams which al­low you to write down math­e­mat­i­cal proofs in a way that al­lows them to be au­to­mat­i­cally checked. Often when you’re us­ing them, you break down your proof into mul­ti­ple steps. Eg per­haps you prove A, and that A im­plies B, and that B im­plies C, and then you join this all to­gether into a proof of C. Or maybe you show that A is true if both B and C are true, and then you prove B and C and now you have a proof of A.

While you’re in the mid­dle of prov­ing some­thing, of­ten you want to know whether the over­all struc­ture of your proof works be­fore you have filled in all the de­tails. To en­able this, the­o­rem provers give you a spe­cial key­word which you can use to tell the the­o­rem prover “Please just pre­tend that I have suc­cess­fully proven this lit­tle thing and then move on to check­ing other steps”. In Lean, this key­word is called “sorry”. To prove a re­ally com­pli­cated thing, you might start out by hav­ing the whole proof be a sorry. And then you break down the prob­lem into three steps, and you write sorry for each. Slowly you ex­pand out the struc­ture of your proof, us­ing sor­rys as you go as nec­es­sary, and then even­tu­ally you turn all of them into valid proofs.

I think that some­thing like this might be a good metaphor for how you should re­late to do­ing good in the world, or to ques­tions like “is it good to work on AI safety”. You try to write down the struc­ture of an ar­gu­ment, and then fill out the steps of the ar­gu­ment, break­ing them into more and more fine-grained as­sump­tions. I am en­thu­si­as­tic about peo­ple know­ing where the sor­rys are—that is, know­ing what as­sump­tions about the world they’re mak­ing. Once you’ve writ­ten down in your ar­gu­ment “I be­lieve this be­cause Nick Bostrom says so”, you’re perfectly free to con­tinue be­liev­ing the same things as be­fore, but at least now you’ll know more pre­cisely what kinds of ex­ter­nal in­for­ma­tion could change your mind.

The key event which I think does good here is when you re­al­ize that you had an ad­di­tional as­sump­tion than you re­al­ized, or when you re­al­ized that you’d thought that you un­der­stood the ar­gu­ment for X but ac­tu­ally you don’t know how to per­suade your­self of X given only the ar­gu­ments you already have.

Small clar­ifi­ca­tion: Many small arguments

In con­trast to when you’re do­ing math­e­mat­i­cal proofs, when you’re think­ing about real life I of­ten think that it’s bet­ter to come to con­clu­sions based on weigh­ing a large num­ber of ar­gu­ments, rather than try­ing to make one com­plete calcu­la­tion of your con­clu­sion (see cluster think­ing vs se­quence think­ing, or fox vs hedge­hox mind­setf).

I struc­ture a lot of my be­liefs this way: I try to learn lots of differ­ent ar­gu­ments that feel like they’re ev­i­dence for var­i­ous things, and I am in­ter­ested in the val­idity of each ar­gu­ment, in­de­pen­dent of whether it’s de­ci­sion rele­vant. So I of­ten change my mind about whether a par­tic­u­lar ar­gu­ment is good, while my larger scale be­liefs shift more grad­u­ally.

Bonus mis­cel­la­neous points

Learn­ing some­one’s be­liefs, vs scrap­ping for parts

Two ways you can re­late to some talk you’re listen­ing to:

  • Learn­ing their be­liefs. You try to be­come able to an­swer the ques­tion “what would this per­son say about how use­ful it is to have EA-al­igned peo­ple in var­i­ous parts of the gov­ern­ment”?

  • Alter­na­tively, you can scrap them for parts—you can try to take lit­tle parts of the things that they’re say­ing and see whether you want to in­cor­po­rate them into your per­sonal wor­ld­view based on the in­di­vi­d­ual mer­its of those lit­tle parts.

You shouldn’t always do the lat­ter, but (due to time con­straints) you also shouldn’t always do the former, and it’s IMO healthy to have a phrase for this dis­tinc­tion.

This is re­lated to the CFAR-style look­ing-for-cruxes method of con­ver­sa­tion. One re­ally nice fea­ture of the look­ing-for-cruxes style con­ver­sa­tion is that it fails grace­fully in the case where it turns out you’re talk­ing to some­one smarter/​more knowl­edge­able/​bet­ter in­formed than you, which means that if we have a cul­ture where we by de­fault have con­ver­sa­tions in a look­ing-for-cruxes style, it’s less likely that smart peo­ple will be turned off EA by un­pleas­ant con­ver­sa­tions with over­con­fi­dent EAs. (Thanks to Anna Sala­mon for this last point.)

Part 2: Out­side views, defer­ence, EA culture

I think we can use the above ideas to de­scribe a healthy set of at­ti­tudes for the EA com­mu­nity to have about think­ing about EA ar­gu­ments.

Here are some ten­sions I am wor­ried about:

  • Some EAs know more and have thought more and bet­ter about var­i­ous im­por­tant ques­tions than oth­ers—eg, EAs gen­er­ally have bet­ter opinions when they’ve been around EA longer, when they have jobs that cause them to think about EA top­ics a lot or which ex­pose them to pri­vate dis­cus­sions about EA top­ics with peo­ple who work on them full time. It’s of­ten healthy to defer to the opinions of such peo­ple. But if you only defer, you don’t prac­tice think­ing on your own, which is ter­rible be­cause think­ing on your own is the skill which EA re­quires in or­der to have their full timers have good opinions! And it also means that peo­ple are overly cre­d­u­lous of what ful­l­timer EAs think (or what peo­ple (po­ten­tially in­ac­cu­rately) think that they think).

    • When I was in­volved with Stan­ford EA in 2015, we spent a lot of time dis­cussing core EA ques­tions like the rel­a­tive value of differ­ent cause ar­eas, philo­soph­i­cal foun­da­tions, and what kind of strate­gies might be most valuable for EA to pur­sue for var­i­ous goals. Most of us had a de­fault at­ti­tude of skep­ti­cism and un­cer­tainty to­wards what EA orgs thought about things. When I talk to EA stu­dent group mem­bers now, I don’t think I get the sense that peo­ple are as skep­ti­cal or in­de­pen­dent-think­ing.

      • A lot of this is prob­a­bly be­cause EA pre­sents it­self more con­sis­tently now. In par­tic­u­lar, longter­mism is more clearly the dom­i­nant wor­ld­view. I think this makes things feel re­ally differ­ent. In 2015, my friends and I were very un­cer­tain about cause pri­ori­ti­za­tion, and this meant that we were con­stantly ac­tively re­minded that it wasn’t pos­si­ble that ev­ery­one was right about what to do, be­cause they dis­agreed so much.

      • Another fac­tor here is that EA feels more to me now like it dis­ap­proves of peo­ple ar­gu­ing pub­li­cly about cause pri­ori­ti­za­tion. I have the sense that peo­ple would now view it as bad be­hav­ior to tell peo­ple that you think they’re mak­ing a ter­rible choice to donate to AMF—I feel much more re­stricted say­ing this nowa­days, but this is at least par­tially just be­cause I am per­son­ally now more risk averse about peo­ple think­ing I’m ob­nox­ious.

      • I think that it’s po­ten­tially very bad that young EAs don’t prac­tice skep­ti­cal in­de­pen­dent think­ing as much (if this is in­deed true).

      • On the other hand, one way that things have got­ten much bet­ter is that I think it’s much more ap­proach­able to learn about AI safety than it used to be, be­cause of things like the in­creas­ing size of the field, the Align­ment Newslet­ter, the 80K pod­cast, and the in­creas­ing qual­ity of ex­pla­na­tions available.

    • Also, if peo­ple are too in­clined to defer and not think through ar­gu­ments them­selves, they might not just not as­sess the ar­gu­ments them­selves, they prob­a­bly won’t even learn the ar­gu­ments that the ex­perts find per­sua­sive.

  • I want a cul­ture where re­searchers try to think about whether the re­search they’re do­ing is valuable. To en­courage this, I want a cul­ture where peo­ple are in­ter­ested in try­ing to un­der­stand the whole end-to-end pic­ture of what’s im­por­tant. But si­mul­ta­neously I want it to be okay for some­one to just work do­ing ops or what­ever and not feel in­se­cure about the fact that their mod­els of the world aren’t as good as the mod­els of peo­ple whose full time job is to make good mod­els.

  • Similarly, I think that it’s very valuable for EAs to get sta­tus from do­ing ac­tu­ally use­ful stuff, as op­posed to from be­ing re­ally good at ar­gu­ing about what EA should be do­ing.

  • I think it’s kind of tricky to have the right re­la­tion­ship to skep­ti­cism of es­tab­lished EA be­liefs.

    • One bad cul­ture is one where peo­ple are em­bar­rassed to ask ques­tions and say that they don’t get the ar­gu­ments for pieces of the con­ven­tional wis­dom. We have a bunch of em­peror’s-new-clothes-style dumb con­sen­sus be­liefs, and we don’t spot holes in them. We don’t get to prac­tice notic­ing our con­fu­sion and im­prov­ing our ar­gu­ments.

      • And when peo­ple who are new to EA talk to us, they no­tice that we don’t re­ally un­der­stand the ar­gu­ments for our be­liefs, and so we turn off peo­ple who care the most about care­ful ex­am­i­na­tion of claims. I think this is a pretty se­ri­ous prob­lem.

    • But there’s an­other bad cul­ture where we can’t up­date based on what other peo­ple think, or where we aren’t sup­posed to be­lieve things based on trust­ing other peo­ple. Or where it’s con­sid­ered low sta­tus to work on things that don’t give you a man­date to think about the com­plete story.

I think that now that I have the above con­cepts, I can de­scribe some fea­tures of what I want.

  • I think it’s much healthier if we have the at­ti­tude that in EA, peo­ple try to in­cre­men­tally im­prov­ing their un­der­stand­ings of things, and in par­tic­u­lar they’re in­ter­ested in know­ing which parts of their ar­gu­ments are ro­bust vs frag­ile.

    • In this world, the de­fault un­der­stand­ing is that when you change your mind about an ar­gu­ment about a sub­ques­tion, you aren’t ex­pected to im­me­di­ately have an opinion about how this changes your mind about the main ques­tion.

  • EAs are en­couraged to try to build mod­els of what­ever parts of EA they’re in­ter­ested in, and it’s con­sid­ered a nor­mal and good thing to try to think through ar­gu­ments that you’ve heard and try to figure out if they make sense to you. But it’s clear that you’re not obli­gated to have mod­els of ev­ery­thing.

  • When ask­ing ques­tions of a pres­ti­gious, smart EA, peo­ple are in­ter­ested in try­ing to un­der­stand what ex­actly the per­son thinks and how their be­liefs are con­nected to each other, as op­posed to just try­ing to learn their over­all judge­ments or ar­gue with them.


I wish I had bet­ter ideas for how to do EA move­ment build­ing in ways that lead to a healthy EA cul­ture around all these ques­tions.