Three Biases That Made Me Believe in AI Risk

Below I de­scribe three faulty thoughts that made me too con­fi­dent that AI-re­lated x-risk would be real. After I iden­ti­fied these thoughts, I grew a lot more scep­ti­cal of AI Safety as an effec­tive cause area. This post con­sists of ed­ited parts of some blog posts on my own web­site.


All sen­tences are wrong, but some are use­ful. I think that a cer­tain emo­tional salience makes me talk about AI in a way that is more wrong than nec­es­sary. For ex­am­ple, a self-driv­ing car and a pre-driven car are the ex­act same thing, but I can feel my­self think­ing about the two in com­pletely differ­ent ways.

A self-driv­ing car is easy to imag­ine: they are smart and au­tonomous and you can trust the car like you trust a cab driver. They can make mis­takes but prob­a­bly have good in­tent. When they en­counter an un­fa­mil­iar situ­a­tion they can think about the cor­rect way to pro­ceed. They be­have in con­cor­dance with the goal their cre­ator set them and they tend to make smart de­ci­sions. If any­thing goes wrong then the car is at fault.

A pre-driven car is hard to imag­ine: it has to have a bunch of rules coded into it by the man­u­fac­turer and you can trust the car like you trust a bridge; it does ex­actly what it was built to do, but un­fore­seen cir­cum­stances will lead to in­el­e­gant failure. The func­tion­ing of these sys­tems de­pends deeply on how the pro­gram­mers mod­el­led the task of driv­ing and the code’s func­tion­al­ity is very brit­tle. When some­thing goes wrong, the com­pany and en­g­ineers are at fault.

This is to say, the lan­guage I use to talk about au­tonomous sys­tems in­fluences how I think about them, and what out­comes I con­sider more or less likely. If you want to un­der­stand pre­sent-day al­gorithms, the “pre-driven car” model of think­ing works a lot bet­ter than the “self-driv­ing car” model of think­ing. The pre­sent and past are the only tools we have to think about the fu­ture, so I ex­pect the “pre-driven car” model to make more ac­cu­rate pre­dic­tions. I try to move my own lan­guage away from words like “ar­tifi­cial in­tel­li­gence” and “self-driv­ing cars” to­wards words like “clas­sifi­ca­tion al­gorithms” and “pre-driven cars”.

One can make these sub­sti­tu­tions on any sen­tence in which a com­puter is as­cribed agency. In the best case, “The neu­ral net­work learned to rec­og­nize ob­jects in images” be­comes “The fit­ted model clas­sifies images in close cor­re­spon­dence with the hu­man-given la­bels”. (In re­al­ity, even that de­scrip­tion might be too gen­er­ous.)

It helps to keep in mind the hu­man com­po­nent. “The YouTube au­to­play al­gorithm shows you ex­actly those videos that make you spend more time on the plat­form” is ac­cu­rate in some sense, but it com­pletely glances over the ways in which in the al­gorithm does not do that. When you listen to mu­sic us­ing YouTube’s au­to­play, it isn’t hard to no­tice that sug­ges­tions tend to point back­wards in time com­pared to the up­load date of the video you’re watch­ing right now, and that, apart from pre­vent­ing re­peats, au­to­play is pretty Marko­vian (that is math­s­peak for the al­gorithm not do­ing any­thing clever based on your view­ing his­tory, just “this video is best fol­lowed by that video”). Both of those prop­er­ties are clearly a re­sult from the way in which YouTube’s en­g­ineers mod­el­led the prob­lem they were try­ing to solve. I would de­scribe YouTube’s sug­ges­tions as “The YouTube au­to­play al­gorithm shows you videos that most peo­ple watched and liked af­ter watch­ing the cur­rent video”.

When you rewrite AI-re­lated state­ments, they tend to be­come more wordy. That is ex­actly what you would ex­pect, since the com­mon words did get se­lected based on ease of use, but does make it un­wieldy to have ac­cu­rate con­ver­sa­tions.

I have not yet found any ar­gu­ment in favour of AI Risk be­ing real that re­mained con­vinc­ing af­ter the above trans­la­tion.

A sense of meaning

My back­ground makes me prone to over­rate how im­por­tant AI Safety is.

My fields of ex­per­tise and en­joy­ment are math­e­mat­ics and com­puter sci­ence. Th­ese skills are use­ful for the econ­omy and in high de­mand. The gen­eral pub­lic is in awe of math­e­mat­ics and thinks highly of any­one who can do it well. Com­puter sci­ence is the clos­est thing we have to literal magic.

Wealth, fun, re­spect, power. The only thing left for me to de­sire is cos­mic sig­nifi­cance, which is ex­actly the sales pitch of ex­is­ten­tial risk causes ar­eas. It would be nice if AI-re­lated ex­is­ten­tial risk were real; for my labour to po­ten­tially make the differ­ence be­tween a mean­ingless life­less uni­verse or a uni­verse filled with hap­piness. It would give ob­jec­tive sig­nifi­cance to my life.

This is fer­tile ground for mo­ti­vated rea­son­ing.


When EA’s de­scribe how much util­ity could fit in the uni­verse, the refer­ence class for num­bers is “how many X fits in the uni­verse”, where X ranges over things like atoms, peo­ple, planets, stars. Th­ese num­bers are huge, typ­i­cally ex­pressed by quan­tities of the form for .

When we de­scribe how likely cer­tain events are, the tempt­ing refer­ence class is “state­ments of prob­a­bil­ity”, which are typ­i­cally ex­pressed as . It seems ab­surd to as­sign AI-risk less than 0.0000000000000000000000000000001% prob­a­bil­ity be­cause that would be a lot of ze­ros.

The com­bi­na­tion of these vastly differ­ent ex­pres­sions of scale to­gether with an­chor­ing makes that we should ex­pect peo­ple to over-es­ti­mate the prob­a­bil­ity of un­likely risks and hence to over-es­ti­mate the ex­pected util­ity of x-risk pre­ven­tion mea­sures.

I think there is a good ar­gu­ment to be made that the prob­a­bil­ity of AI Safety work be­ing effec­tive is less than . I wrote a very rough draft try­ing to defend that num­ber in this blog post, but it’s very much a proof-of-con­cept writ­ten for my­self and so the writ­ing isn’t very pol­ished.


I used to think that work­ing in AI Safety would be a good fit for me, but I stopped think­ing that af­ter I no­ticed that most of my be­lief in AI risk was caused by bi­ased think­ing: self-ag­gran­diz­ing mo­ti­vated rea­son­ing, mis­lead­ing lan­guage, and an­chor­ing on un­jus­tified prob­a­bil­ity es­ti­mates.

If peo­ple here would ap­pre­ci­ate it, I would be happy to write one or more posts on ob­ject-level ar­gu­ments as to why I am now scep­ti­cal of AI risk. Let me know in the com­ments.