Why I think the Foundational Research Institute should rethink its approach

The fol­low­ing is my con­sid­ered eval­u­a­tion of the Foun­da­tional Re­search In­sti­tute, circa July 2017. I dis­cuss its goal, where I fore­see things go­ing wrong with how it defines suffer­ing, and what it could do to avoid these prob­lems.

TL;DR ver­sion: func­tion­al­ism (“con­scious­ness is the sum-to­tal of the func­tional prop­er­ties of our brains”) sounds a lot bet­ter than it ac­tu­ally turns out to be in prac­tice. In par­tic­u­lar, func­tion­al­ism makes it im­pos­si­ble to define ethics & suffer­ing in a way that can me­di­ate dis­agree­ments.

I. What is the Foun­da­tional Re­search In­sti­tute?

The Foun­da­tional Re­search In­sti­tute (FRI) is a Ber­lin-based group that “con­ducts re­search on how to best re­duce the suffer­ing of sen­tient be­ings in the near and far fu­ture.” Ex­ec­u­tive Direc­tor Max Daniel in­tro­duced them at EA Global Bos­ton as “the only EA or­ga­ni­za­tion which at an or­ga­ni­za­tional level has the mis­sion of fo­cus­ing on re­duc­ing s-risk.” S-risks are, ac­cord­ing to Daniel, “risks where an ad­verse out­come would bring about suffer­ing on an as­tro­nom­i­cal scale, vastly ex­ceed­ing all suffer­ing that has ex­isted on Earth so far.”

Essen­tially, FRI wants to be­come the re­search arm of suffer­ing-fo­cused ethics, and help pre­vent ar­tifi­cial gen­eral in­tel­li­gence (AGI) failure-modes which might pro­duce suffer­ing on a cos­mic scale.

What I like about FRI:

While I have se­ri­ous qualms about FRI’s re­search frame­work, I think the peo­ple be­hind FRI de­serve a lot of credit- they seem to be se­ri­ous peo­ple, work­ing hard to build some­thing good. In par­tic­u­lar, I want to give them a shoutout for three things:

  • First, FRI takes suffer­ing se­ri­ously, and I think that’s im­por­tant. When times are good, we tend to for­get how tongue-chew­ingly hor­rific suffer­ing can be. S-risks seem par­tic­u­larly hor­rify­ing.

  • Se­cond, FRI isn’t afraid of be­ing weird. FRI has been work­ing on s-risk re­search for a few years now, and if peo­ple are start­ing to come around to the idea that s-risks are worth think­ing about, much of the credit goes to FRI.

  • Third, I have great per­sonal re­spect for Brian To­masik, one of FRI’s co-founders. I’ve found him highly thought­ful, gen­er­ous in de­bates, and un­failingly prin­ci­pled. In par­tic­u­lar, he’s always will­ing to bite the bul­let and work ideas out to their log­i­cal end, even if it in­volves re­pug­nant con­clu­sions.

What is FRI’s re­search frame­work?

FRI be­lieves in an­a­lytic func­tion­al­ism, or what David Chalmers calls “Type-A ma­te­ri­al­ism”. Essen­tially, what this means is there’s no ’the­o­ret­i­cal essence’ to con­scious­ness; rather, con­scious­ness is the sum-to­tal of the func­tional prop­er­ties of our brains. Since ‘func­tional prop­er­ties’ are rather vague, this means con­scious­ness it­self is rather vague, in the same way words like “life,” “jus­tice,” and “virtue” are messy and vague.

Brian sug­gests that this vague­ness means there’s an in­her­ently sub­jec­tive, per­haps ar­bi­trary el­e­ment to how we define con­scious­ness:

An­a­lytic func­tion­al­ism looks for func­tional pro­cesses in the brain that roughly cap­ture what we mean by words like “aware­ness”, “happy”, etc., in a similar way as a biol­o­gist may look for pre­cise prop­er­ties of repli­ca­tors that roughly cap­ture what we mean by “life”. Just as there can be room for fuzzi­ness about where ex­actly to draw the bound­aries around “life”, differ­ent an­a­lytic func­tion­al­ists may have differ­ent opinions about where to define the bound­aries of “con­scious­ness” and other men­tal states. This is why con­scious­ness is “up to us to define”. There’s no hard prob­lem of con­scious­ness for the same rea­son there’s no hard prob­lem of life: con­scious­ness is just a high-level word that we use to re­fer to lots of de­tailed pro­cesses, and it doesn’t mean any­thing in ad­di­tion to those pro­cesses.

Fi­nally, Brian ar­gues that the phe­nomenol­ogy of con­scious­ness is iden­ti­cal with the phe­nomenol­ogy of com­pu­ta­tion:

I know that I’m con­scious. I also know, from neu­ro­science com­bined with Oc­cam’s ra­zor, that my con­scious­ness con­sists only of ma­te­rial op­er­a­tions in my brain—prob­a­bly mostly pat­terns of neu­ronal firing that help pro­cess in­puts, com­pute in­ter­me­di­ate ideas, and pro­duce be­hav­ioral out­puts. Thus, I can see that con­scious­ness is just the first-per­son view of cer­tain kinds of com­pu­ta­tions—as Eliezer Yud­kowsky puts it, “How An Al­gorithm Feels From In­side”. Con­scious­ness is not some­thing sep­a­rate from or epiphe­nom­e­nal to these com­pu­ta­tions. It is these com­pu­ta­tions, just from their own per­spec­tive of try­ing to think about them­selves.

In other words, con­scious­ness is what minds com­pute. Con­scious­ness is the col­lec­tion of in­put op­er­a­tions, in­ter­me­di­ate pro­cess­ing, and out­put be­hav­iors that an en­tity performs.

And if con­scious­ness is all these things, so too is suffer­ing. Which means suffer­ing is com­pu­ta­tional, yet also in­her­ently fuzzy, and at least a bit ar­bi­trary; a leaky high-level reifi­ca­tion im­pos­si­ble to speak about ac­cu­rately, since there’s no for­mal, ob­jec­tive “ground truth”.

II. Why do I worry about FRI’s re­search frame­work?

In short, I think FRI has a wor­thy goal and good peo­ple, but its meta­physics ac­tively pre­vent mak­ing progress to­ward that goal. The fol­low­ing de­scribes why I think that, draw­ing heav­ily on Brian’s writ­ings (of FRI’s re­searchers, Brian seems the most fo­cused on meta­physics):

Note: FRI is not the only EA or­ga­ni­za­tion which holds func­tion­al­ist views on con­scious­ness; much of the fol­low­ing cri­tique would also ap­ply to e.g. MIRI, FHI, and OpenPhil. I fo­cus on FRI be­cause (1) Brian’s writ­ings on con­scious­ness & func­tion­al­ism have been hugely in­fluen­tial in the com­mu­nity, and are clear enough *to* crit­i­cize; (2) the fact that FRI is par­tic­u­larly clear about what it cares about- suffer­ing- al­lows a par­tic­u­larly clear cri­tique about what prob­lems it will run into with func­tion­al­ism; (3) I be­lieve FRI is at the fore­front of an im­por­tant cause area which has not crys­tal­lized yet, and I think it’s crit­i­cally im­por­tant to get these ob­jec­tions bounc­ing around this sub­com­mu­nity.

Ob­jec­tion 1: Motte-and-bailey

Brian: “Con­scious­ness is not a thing which ex­ists ‘out there’ or even a sep­a­rate prop­erty of mat­ter; it’s a defi­ni­tional cat­e­gory into which we clas­sify minds. ‘Is this digi­tal mind re­ally con­scious?’ is analo­gous to ‘Is a rock that peo­ple use to eat on re­ally a table?’ [How­ever,] That con­scious­ness is a cluster in thingspace rather than a con­crete prop­erty of the world does not make re­duc­ing suffer­ing less im­por­tant.”

The FRI model seems to im­ply that suffer­ing is in­ef­fable enough such that we can’t have an ob­jec­tive defi­ni­tion, yet suffi­ciently ef­fable that we can co­her­ently talk and care about it. This at­tempt to have it both ways seems con­tra­dic­tory, or at least in deep ten­sion.

In­deed, I’d ar­gue that the de­gree to which you can care about some­thing is pro­por­tional to the de­gree to which you can define it ob­jec­tively. E.g., If I say that “gnire­ffus” is liter­ally the most ter­rible thing in the cos­mos, that we should spread gnire­ffus-fo­cused ethics, and that min­i­miz­ing g-risks (far-fu­ture sce­nar­ios which in­volve large amounts of gnire­ffus) is a moral im­per­a­tive, but also that what is and what isn’t gnire­ffus is rather sub­jec­tive with no priv­ileged defi­ni­tion, and it’s im­pos­si­ble to ob­jec­tively tell if a phys­i­cal sys­tem ex­hibits gnire­ffus, you might raise any num­ber of ob­jec­tions. This is not an ex­act metaphor for FRI’s po­si­tion, but I worry that FRI’s work leans on the in­tu­ition that suffer­ing is real and we can speak co­her­ently about it, to a de­gree greater than its meta­physics for­mally al­low.

Max Daniel (per­sonal com­mu­ni­ca­tion) sug­gests that we’re com­fortable with a de­gree of in­ef­fa­bil­ity in other con­texts; “Brian claims that the con­cept of suffer­ing shares the allegedly prob­le­matic prop­er­ties with the con­cept of a table. But it seems a stretch to say that the alleged ten­sion is prob­le­matic when talk­ing about ta­bles. So why would it be prob­le­matic when talk­ing about suffer­ing?” How­ever, if we take the anti-re­al­ist view that suffer­ing is ‘merely’ a node in the net­work of lan­guage, we have to live with the con­se­quences of this: that ‘suffer­ing’ will lose mean­ing as we take it away from the net­work in which it’s em­bed­ded (Wittgen­stein). But FRI wants to do ex­actly this, to speak about suffer­ing in the con­text of AGIs, simu­lated brains, even video game char­ac­ters.

We can be anti-re­al­ists about suffer­ing (suffer­ing-is-a-node-in-the-net­work-of-lan­guage), or we can ar­gue that we can talk co­her­ently about suffer­ing in novel con­texts (AGIs, mind crime, aliens, and so on), but it seems in­her­ently trou­ble­some to claim we can do both at the same time.

Ob­jec­tion 2: In­tu­ition duels

Two peo­ple can agree on FRI’s po­si­tion that there is no ob­jec­tive fact of the mat­ter about what suffer­ing is (no priv­ileged defi­ni­tion), but this also means they have no way of com­ing to any con­sen­sus on the ob­ject-level ques­tion of whether some­thing can suffer. This isn’t just an aca­demic point: Brian has writ­ten ex­ten­sively about how he be­lieves non-hu­man an­i­mals can and do suffer ex­ten­sively, whereas Yud­kowsky (who holds com­pu­ta­tion­al­ist views, like Brian) has writ­ten about how he’s con­fi­dent that an­i­mals are not con­scious and can­not suffer, due to their lack of higher-or­der rea­son­ing.

And if func­tion­al­ism is hav­ing trou­ble ad­ju­di­cat­ing the easy cases of suffer­ing—whether mon­keys can suffer, or whether dogs can— it doesn’t have a sliver of a chance at deal­ing with the up­com­ing hard cases of suffer­ing: whether a given AGI is suffer­ing, or en­gag­ing in mind crime; whether a whole-brain em­u­la­tion (WBE) or syn­thetic or­ganism or emer­gent in­tel­li­gence that doesn’t have the ca­pac­ity to tell us how it feels (or that we don’t have the ca­pac­ity to un­der­stand) is suffer­ing; if any aliens that we meet in the fu­ture can suffer; whether chang­ing the in­ter­nal ar­chi­tec­ture of our qualia re­ports means we’re also chang­ing our qualia; and so on.

In short, FRI’s the­ory of con­scious­ness isn’t ac­tu­ally a the­ory of con­scious­ness at all, since it doesn’t do the thing we need a the­ory of con­scious­ness to do: ad­ju­di­cate dis­agree­ments in a prin­ci­pled way. In­stead, it gives up any claim on the sorts of ob­jec­tive facts which could in prin­ci­ple ad­ju­di­cate dis­agree­ments.

This is a source of fric­tion in EA to­day, but it’s miti­gated by the sense that

(1) The EA pie is grow­ing, so it’s bet­ter to ig­nore dis­agree­ments than pick fights;

(2) Disagree­ments over the defi­ni­tion of suffer­ing don’t re­ally mat­ter yet, since we haven’t got­ten into the busi­ness of mak­ing morally-rele­vant syn­thetic be­ings (that we know of) that might be un­able to vo­cal­ize their suffer­ing.

If the per­cep­tion of one or both of these con­di­tions change, the lack of some dis­agree­ment-ad­ju­di­cat­ing the­ory of suffer­ing will mat­ter quite a lot.

Ob­jec­tion 3: Con­ver­gence re­quires com­mon truth

Mike: “[W]hat makes one defi­ni­tion of con­scious­ness bet­ter than an­other? How should we eval­u­ate them?”

Brian: “Con­silience among our feel­ings of em­pa­thy, prin­ci­ples of non-dis­crim­i­na­tion, un­der­stand­ings of cog­ni­tive sci­ence, etc. It’s similar to the ques­tion of what makes one defi­ni­tion of jus­tice or virtue bet­ter than an­other.”

Brian is hop­ing that af­fec­tive neu­ro­science will slowly con­verge to ac­cu­rate views on suffer­ing as more and bet­ter data about sen­tience and pain ac­cu­mu­lates. But con­ver­gence to truth im­plies some­thing (ob­jec­tive) driv­ing the con­ver­gence- in this way, Brian’s frame­work still seems to re­quire an ob­jec­tive truth of the mat­ter, even though he dis­claims most of the benefits of as­sum­ing this.

Ob­jec­tion 4: As­sum­ing that con­scious­ness is a reifi­ca­tion pro­duces more con­fu­sion, not less

Brian: “Con­scious­ness is not a reified thing; it’s not a phys­i­cal prop­erty of the uni­verse that just ex­ists in­trin­si­cally. Rather, in­stances of con­scious­ness are al­gorithms that are im­ple­mented in spe­cific steps. … Con­scious­ness in­volves spe­cific things that brains do.”

Brian ar­gues that we treat con­scious/​phe­nomenol­ogy as more ‘real’ than it is. Tra­di­tion­ally, when­ever we’ve dis­cov­ered some­thing is a leaky reifi­ca­tion and shouldn’t be treated as ‘too real’, we’ve been able to break it down into more co­her­ent con­stituent pieces we can treat as real. Life, for in­stance, wasn’t due to élan vi­tal but a bun­dle of self-or­ga­niz­ing prop­er­ties & dy­nam­ics which gen­er­ally co-oc­cur. But car­ry­ing out this “de-reifi­ca­tion” pro­cess on con­scious­ness—enu­mer­at­ing its co­her­ent con­stituent pieces—has proven difficult, es­pe­cially if we want to pre­serve some way to speak co­gently about suffer­ing.

Speak­ing for my­self, the more I stared into the depths of func­tion­al­ism, the less cer­tain ev­ery­thing about moral value be­came—and ar­guably, I see the same tra­jec­tory in Brian’s work and Luke Muehlhauser’s re­port. Their model un­cer­tainty has seem­ingly be­come larger as they’ve looked into tech­niques for how to “de-reify” con­scious­ness while pre­serv­ing some fla­vor of moral value, not smaller. Brian and Luke seem to in­ter­pret this as ev­i­dence that moral value is in­tractably com­pli­cated, but this is also con­sis­tent with con­scious­ness not be­ing a reifi­ca­tion, and in­stead be­ing a real thing. Try­ing to “de-reify” some­thing that’s not a reifi­ca­tion will pro­duce deep con­fu­sion, just as surely try­ing to treat a reifi­ca­tion as ‘more real’ than it ac­tu­ally is will.

Eds­ger W. Dijk­stra fa­mously noted that “The pur­pose of ab­strac­tion is not to be vague, but to cre­ate a new se­man­tic level in which one can be ab­solutely pre­cise.” And so if our ways of talk­ing about moral value fail to ‘carve re­al­ity at the joints’- then by all means let’s build bet­ter ones, rather than giv­ing up on pre­ci­sion.

Ob­jec­tion 5: The Hard Prob­lem of Con­scious­ness is a red herring

Brian spends a lot of time dis­cussing Chalmers’ “Hard Prob­lem of Con­scious­ness”, i.e. the ques­tion of why we’re sub­jec­tively con­scious, and seems to base at least part of his con­clu­sion on not find­ing this ques­tion com­pel­ling— he sug­gests “There’s no hard prob­lem of con­scious­ness for the same rea­son there’s no hard prob­lem of life: con­scious­ness is just a high-level word that we use to re­fer to lots of de­tailed pro­cesses, and it doesn’t mean any­thing in ad­di­tion to those pro­cesses.” I.e., no ‘why’ is nec­es­sary; when we take con­scious­ness and sub­tract out the de­tails of the brain, we’re left with an empty set.

But I think the “Hard Prob­lem” isn’t helpful as a con­trastive cen­ter­piece, since it’s un­clear what the prob­lem is, and whether it’s an­a­lytic or em­piri­cal, a state­ment about cog­ni­tion or about physics. At the Qualia Re­search In­sti­tute (QRI), we don’t talk much about the Hard Prob­lem; in­stead, we talk about Qualia For­mal­ism, or the idea that any phe­nomenolog­i­cal state can be crisply and pre­cisely rep­re­sented by some math­e­mat­i­cal ob­ject. I sus­pect this would be a bet­ter foil for Brian’s work than the Hard Prob­lem.

Ob­jec­tion 6: Map­ping to reality

Brian ar­gues that con­scious­ness should be defined at the func­tional/​com­pu­ta­tional level: given a Tur­ing ma­chine, or neu­ral net­work, the right ‘code’ will pro­duce con­scious­ness. But the prob­lem is that this doesn’t lead to a the­ory which can ‘com­pile’ to physics. Con­sider the fol­low­ing:

Imag­ine you have a bag of pop­corn. Now shake it. There will ex­ist a cer­tain ad-hoc in­ter­pre­ta­tion of bag-of-pop­corn-as-com­pu­ta­tional-sys­tem where you just simu­lated some­one get­ting tor­tured, and other in­ter­pre­ta­tions that don’t im­ply that. Did you tor­ture any­one? If you’re a com­pu­ta­tion­al­ist, no clear an­swer ex­ists- you both did, and did not, tor­ture some­one. This sounds like a ridicu­lous edge-case that would never come up in real life, but in re­al­ity it comes up all the time, since there is no prin­ci­pled way to *ob­jec­tively de­rive* what com­pu­ta­tion(s) any phys­i­cal sys­tem is perform­ing.

I don’t think this is an out­landish view of func­tion­al­ism; Brian sug­gests much the same in How to In­ter­pret a Phys­i­cal Sys­tem as a Mind: “Phys­i­cal­ist views that di­rectly map from physics to moral value are rel­a­tively sim­ple to un­der­stand. Func­tion­al­ism is more com­plex, be­cause it maps from physics to com­pu­ta­tions to moral value. More­over, while physics is real and ob­jec­tive, com­pu­ta­tions are fic­tional and ‘ob­server-rel­a­tive’ (to use John Searle’s ter­minol­ogy). There’s no ob­jec­tive mean­ing to ‘the com­pu­ta­tion that this phys­i­cal sys­tem is im­ple­ment­ing’ (un­less you’re refer­ring to the spe­cific equa­tions of physics that the sys­tem is play­ing out).”

Gor­don McCabe (McCabe 2004) pro­vides a more for­mal ar­gu­ment to this effect— that pre­cisely map­ping be­tween phys­i­cal pro­cesses and (Tur­ing-level) com­pu­ta­tional pro­cesses is in­her­ently im­pos­si­ble— in the con­text of simu­la­tions. First, McCabe notes that:

[T]here is a one-[to-]many cor­re­spon­dence be­tween the log­i­cal states [of a com­puter] and the ex­act elec­tronic states of com­puter mem­ory. Although there are bi­jec­tive map­pings be­tween num­bers and the log­i­cal states of com­puter mem­ory, there are no bi­jec­tive map­pings be­tween num­bers and the ex­act elec­tronic states of mem­ory.

This lack of an ex­act bi­jec­tive map­ping means that sub­jec­tive in­ter­pre­ta­tion nec­es­sar­ily creeps in, and so a com­pu­ta­tional simu­la­tion of a phys­i­cal sys­tem can’t be ‘about’ that sys­tem in any rigor­ous way:

In a com­puter simu­la­tion, the val­ues of the phys­i­cal quan­tities pos­sessed by the simu­lated sys­tem are rep­re­sented by the com­bined states of mul­ti­ple bits in com­puter mem­ory. How­ever, the com­bined states of mul­ti­ple bits in com­puter mem­ory only rep­re­sent num­bers be­cause they are deemed to do so un­der a nu­meric in­ter­pre­ta­tion. There are many differ­ent in­ter­pre­ta­tions of the com­bined states of mul­ti­ple bits in com­puter mem­ory. If the num­bers rep­re­sented by a digi­tal com­puter are in­ter­pre­ta­tion-de­pen­dent, they can­not be ob­jec­tive phys­i­cal prop­er­ties. Hence, there can be no ob­jec­tive re­la­tion­ship be­tween the chang­ing pat­tern of mul­ti­ple bit-states in com­puter mem­ory, and the chang­ing pat­tern of quan­tity-val­ues of a simu­lated phys­i­cal sys­tem.

McCabe con­cludes that, meta­phys­i­cally speak­ing,

A digi­tal com­puter simu­la­tion of a phys­i­cal sys­tem can­not ex­ist as, (does not pos­sess the prop­er­ties and re­la­tion­ships of), any­thing else other than a phys­i­cal pro­cess oc­cur­ring upon the com­po­nents of a com­puter. In the con­tem­po­rary case of an elec­tronic digi­tal com­puter, a simu­la­tion can­not ex­ist as any­thing else other than an elec­tronic phys­i­cal pro­cess oc­cur­ring upon the com­po­nents and cir­cuitry of a com­puter.

Where does this leave ethics? In Fla­vors of Com­pu­ta­tion Are Fla­vors of Con­scious­ness, Brian notes that “In some sense all I’ve pro­posed here is to think of differ­ent fla­vors of com­pu­ta­tion as be­ing var­i­ous fla­vors of con­scious­ness. But this still leaves the ques­tion: Which fla­vors of com­pu­ta­tion mat­ter most? Clearly what­ever com­pu­ta­tions hap­pen when a per­son is in pain are vastly more im­por­tant than what’s hap­pen­ing in a brain on a lazy af­ter­noon. How can we cap­ture that differ­ence?”

But if Brian grants the former point- that “There’s no ob­jec­tive mean­ing to ‘the com­pu­ta­tion that this phys­i­cal sys­tem is im­ple­ment­ing’”- then this lat­ter task of figur­ing out “which fla­vors of com­pu­ta­tion mat­ter most” is prov­ably im­pos­si­ble. There will always be mul­ti­ple com­pu­ta­tional (and thus eth­i­cal) in­ter­pre­ta­tions of a phys­i­cal sys­tem, with no way to figure out what’s “re­ally” hap­pen­ing. No way to figure out if some­thing is suffer­ing or not. No con­silience; not now, not ever.

Note: de­spite ap­par­ently grant­ing the point above, Brian also re­marks that:

I should add a note on ter­minol­ogy: All com­pu­ta­tions oc­cur within physics, so any com­pu­ta­tion is a phys­i­cal pro­cess. Con­versely, any phys­i­cal pro­cess pro­ceeds from in­put con­di­tions to out­put con­di­tions in a reg­u­lar man­ner and so is a com­pu­ta­tion. Hence, the set of com­pu­ta­tions equals the set of phys­i­cal pro­cesses, and where I say “com­pu­ta­tions” in this piece, one could just as well sub­sti­tute “phys­i­cal pro­cesses” in­stead.

This seems to be (1) in­cor­rect, for the rea­sons I give above, or (2) tak­ing sub­stan­tial po­etic li­cense with these terms, or (3) refer­ring to hy­per­com­pu­ta­tion (which might be able to sal­vage the metaphor, but would in­val­i­date many of FRI’s con­clu­sions deal­ing with the com­putabil­ity of suffer­ing on con­ven­tional hard­ware).

This ob­jec­tion may seem es­o­teric or pedan­tic, but I think it’s im­por­tant, and that it rip­ples through FRI’s the­o­ret­i­cal frame­work with dis­as­trous effects.

Ob­jec­tion 7: FRI doesn’t fully bite the bul­let on computationalism

Brian sug­gests that “fla­vors of com­pu­ta­tion are fla­vors of con­scious­ness” and that some com­pu­ta­tions ‘code’ for suffer­ing. But if we do in fact bite the bul­let on this metaphor and place suffer­ing within the realm of com­pu­ta­tional the­ory, we need to think in “near mode” and ac­cept all the para­doxes that brings. Scott Aaron­son, a noted ex­pert on quan­tum com­put­ing, raises the fol­low­ing ob­jec­tions to func­tion­al­ism:

I’m guess­ing that many peo­ple in this room side with Den­nett, and (not co­in­ci­den­tally, I’d say) also with Everett. I cer­tainly have sym­pa­thies in that di­rec­tion too. In fact, I spent seven or eight years of my life as a Den­nett/​Everett hard­core be­liever. But, while I don’t want to talk any­one out of the Den­nett/​Everett view, I’d like to take you on a tour of what I see as some of the ex­tremely in­ter­est­ing ques­tions that that view leaves unan­swered. I’m not talk­ing about “deep ques­tions of mean­ing,” but about some­thing much more straight­for­ward: what ex­actly does a com­pu­ta­tional pro­cess have to do to qual­ify as “con­scious”?

There’s this old chest­nut, what if each per­son on earth simu­lated one neu­ron of your brain, by pass­ing pieces of pa­per around. It took them sev­eral years just to simu­late a sin­gle sec­ond of your thought pro­cesses. Would that bring your sub­jec­tivity into be­ing? Would you ac­cept it as a re­place­ment for your cur­rent body? If so, then what if your brain were simu­lated, not neu­ron-by-neu­ron, but by a gi­gan­tic lookup table? That is, what if there were a huge database, much larger than the ob­serv­able uni­verse (but let’s not worry about that), that hard­wired what your brain’s re­sponse was to ev­ery se­quence of stim­uli that your sense-or­gans could pos­si­bly re­ceive. Would that bring about your con­scious­ness? Let’s keep push­ing: if it would, would it make a differ­ence if any­one ac­tu­ally con­sulted the lookup table? Why can’t it bring about your con­scious­ness just by sit­ting there do­ing noth­ing?

To these stan­dard thought ex­per­i­ments, we can add more. Let’s sup­pose that, purely for er­ror-cor­rec­tion pur­poses, the com­puter that’s simu­lat­ing your brain runs the code three times, and takes the ma­jor­ity vote of the out­comes. Would that bring three “copies” of your con­scious­ness into be­ing? Does it make a differ­ence if the three copies are widely sep­a­rated in space or time—say, on differ­ent planets, or in differ­ent cen­turies? Is it pos­si­ble that the mas­sive re­dun­dancy tak­ing place in your brain right now is bring­ing mul­ti­ple copies of you into be­ing?


Maybe my fa­vorite thought ex­per­i­ment along these lines was in­vented by my former stu­dent Andy Drucker. In the past five years, there’s been a rev­olu­tion in the­o­ret­i­cal cryp­tog­ra­phy, around some­thing called Fully Ho­mo­mor­phic En­cryp­tion (FHE), which was first dis­cov­ered by Craig Gen­try. What FHE lets you do is to perform ar­bi­trary com­pu­ta­tions on en­crypted data, with­out ever de­crypt­ing the data at any point. So, to some­one with the de­cryp­tion key, you could be prov­ing the­o­rems, simu­lat­ing plane­tary mo­tions, etc. But to some­one with­out the key, it looks for all the world like you’re just shuffling ran­dom strings and pro­duc­ing other ran­dom strings as out­put.

You can prob­a­bly see where this is go­ing. What if we ho­mo­mor­phi­cally en­crypted a simu­la­tion of your brain? And what if we hid the only copy of the de­cryp­tion key, let’s say in an­other galaxy? Would this com­pu­ta­tion—which looks to any­one in our galaxy like a reshuffling of gob­bledy­gook—be silently pro­duc­ing your con­scious­ness?

When we con­sider the pos­si­bil­ity of a con­scious quan­tum com­puter, in some sense we in­herit all the pre­vi­ous puz­zles about con­scious clas­si­cal com­put­ers, but then also add a few new ones. So, let’s say I run a quan­tum sub­rou­tine that simu­lates your brain, by ap­ply­ing some uni­tary trans­for­ma­tion U. But then, of course, I want to “un­com­pute” to get rid of garbage (and thereby en­able in­terfer­ence be­tween differ­ent branches), so I ap­ply U-1. Ques­tion: when I ap­ply U-1, does your simu­lated brain ex­pe­rience the same thoughts and feel­ings a sec­ond time? Is the sec­ond ex­pe­rience “the same as” the first, or does it differ some­how, by virtue of be­ing re­versed in time? Or, since U-1U is just a con­voluted im­ple­men­ta­tion of the iden­tity func­tion, are there no ex­pe­riences at all here?

Here’s a bet­ter one: many of you have heard of the Vaid­man bomb. This is a fa­mous thought ex­per­i­ment in quan­tum me­chan­ics where there’s a pack­age, and we’d like to “query” it to find out whether it con­tains a bomb—but if we query it and there is a bomb, it will ex­plode, kil­ling ev­ery­one in the room. What’s the solu­tion? Well, sup­pose we could go into a su­per­po­si­tion of query­ing the bomb and not query­ing it, with only ε am­pli­tude on query­ing the bomb, and √(1-ε2) am­pli­tude on not query­ing it. And sup­pose we re­peat this over and over—each time, mov­ing ε am­pli­tude onto the “query the bomb” state if there’s no bomb there, but mov­ing ε2 prob­a­bil­ity onto the “query the bomb” state if there is a bomb (since the ex­plo­sion de­co­heres the su­per­po­si­tion). Then af­ter 1/​ε rep­e­ti­tions, we’ll have or­der 1 prob­a­bil­ity of be­ing in the “query the bomb” state if there’s no bomb. By con­trast, if there is a bomb, then the to­tal prob­a­bil­ity we’ve ever en­tered that state is (1/​ε)×ε2 = ε. So, ei­ther way, we learn whether there’s a bomb, and the prob­a­bil­ity that we set the bomb off can be made ar­bi­trar­ily small. (In­ci­den­tally, this is ex­tremely closely re­lated to how Grover’s al­gorithm works.)

OK, now how about the Vaid­man brain? We’ve got a quan­tum sub­rou­tine simu­lat­ing your brain, and we want to ask it a yes-or-no ques­tion. We do so by query­ing that sub­rou­tine with ε am­pli­tude 1/​ε times, in such a way that if your an­swer is “yes,” then we’ve only ever ac­ti­vated the sub­rou­tine with to­tal prob­a­bil­ity ε. Yet you still man­age to com­mu­ni­cate your “yes” an­swer to the out­side world. So, should we say that you were con­scious only in the ε frac­tion of the wave­func­tion where the simu­la­tion hap­pened, or that the en­tire sys­tem was con­scious? (The an­swer could mat­ter a lot for an­thropic pur­poses.)

To sum up: Brian’s no­tion that con­scious­ness is the same as com­pu­ta­tion raises more is­sues than it solves; in par­tic­u­lar, the pos­si­bil­ity that if suffer­ing is com­putable, it may also be un­com­putable/​re­versible, would sug­gest s-risks aren’t as se­ri­ous as FRI treats them.

Ob­jec­tion 8: Danger­ous combination

Three themes which seem to per­me­ate FRI’s re­search are:

(1) Suffer­ing is the thing that is bad.

(2) It’s crit­i­cally im­por­tant to elimi­nate bad­ness from the uni­verse.

(3) Suffer­ing is im­pos­si­ble to define ob­jec­tively, and so we each must define what suffer­ing means for our­selves.

Taken in­di­vi­d­u­ally, each of these seems rea­son­able. Pick two, and you’re still okay. Pick all three, though, and you get A Fully Gen­eral Jus­tifi­ca­tion For Any­thing, based on what is ul­ti­mately a sub­jec­tive/​aes­thetic call.

Much can be said in FRI’s defense here, and it’s un­fair to sin­gle them out as risky: in my ex­pe­rience they’ve always brought a very thought­ful, mea­sured, co­op­er­a­tive ap­proach to the table. I would just note that ideas are pow­er­ful, and I think theme (3) is es­pe­cially per­ni­cious if in­cor­rect.

III. QRI’s alternative

An­a­lytic func­tion­al­ism is es­sen­tially a nega­tive hy­poth­e­sis about con­scious­ness: it’s the ar­gu­ment that there’s no or­der to be found, no rigor to be had. It ob­scures this with talk of “func­tion”, which is a red her­ring it not only doesn’t define, but ad­mits is un­defin­able. It doesn’t make any pos­i­tive as­ser­tion. Func­tion­al­ism is skep­ti­cism- noth­ing more, noth­ing less.

But is it right?

Ul­ti­mately, I think these a pri­ori ar­gu­ments are much like peo­ple in the mid­dle ages ar­gu­ing whether one could ever for­mal­ize a Proper Sys­tem of Alchemy. Such ar­gu­ments may in many cases hold wa­ter, but it’s of­ten difficult to tell good ar­gu­ments apart from ar­gu­ments where we’re just clev­erly fool­ing our­selves. In ret­ro­spect, the best way to *prove* sys­tem­atized alchemy was pos­si­ble was to just go out and *do* it, and in­vent Chem­istry. That’s how I see what we’re do­ing at QRI with Qualia For­mal­ism: we’re as­sum­ing it’s pos­si­ble to build stuff, and we’re work­ing on build­ing the ob­ject-level stuff.

What we’ve built with QRI’s framework

Note: this is a brief, sur­face-level tour of our re­search; it will prob­a­bly be con­fus­ing for read­ers who haven’t dug into our stuff be­fore. Con­sider this a down-pay­ment on a more sub­stan­tial in­tro­duc­tion.

My most no­table work is Prin­cipia Qualia, in which I lay out my meta-frame­work for con­scious­ness (a fla­vor of dual-as­pect monism, with a fo­cus on Qualia For­mal­ism) and put forth the Sym­me­try The­ory of Valence (STV). Essen­tially, the STV is an ar­gu­ment that much of the ap­par­ent com­plex­ity of emo­tional valence is evolu­tion­ar­ily con­tin­gent, and if we con­sider a math­e­mat­i­cal ob­ject iso­mor­phic to a phe­nomenolog­i­cal ex­pe­rience, the math­e­mat­i­cal prop­erty which cor­re­sponds to how pleas­ant it is to be that ex­pe­rience is the ob­ject’s sym­me­try. This im­plies a bunch of testable pre­dic­tions and rein­ter­pre­ta­tions of things like what ‘plea­sure cen­ters’ do (Sec­tion XI; Sec­tion XII). Build­ing on this, I offer the Sym­me­try The­ory of Homeo­static Reg­u­la­tion, which sug­gests un­der­stand­ing the struc­ture of qualia will trans­late into knowl­edge about the struc­ture of hu­man in­tel­li­gence, and I briefly touch on the idea of Neu­roa­cous­tics.

Like­wise, my col­league An­drés Gomez Emils­son has writ­ten about the likely math­e­mat­ics of phe­nomenol­ogy, in­clud­ing The Hyper­bolic Geom­e­try of DMT Ex­pe­riences, Tyranny of the In­ten­tional Ob­ject, and Al­gorith­mic Re­duc­tion of Psychedelic States. If I had to sug­gest one thing to read in all of these links, though, it would be the tran­script of his re­cent talk on Quan­tify­ing Bliss, which lays out the world’s first method to ob­jec­tively mea­sure valence from first prin­ci­ples (via fMRI) us­ing Se­len Ata­soy’s Con­nec­tome Har­mon­ics frame­work, the Sym­me­try The­ory of Valence, and An­drés’s CDNS model of ex­pe­rience.

Th­ese are risky pre­dic­tions and we don’t yet know if they’re right, but we’re con­fi­dent that if there is some el­e­gant struc­ture in­trin­sic to con­scious­ness, as there is in many other parts of the nat­u­ral world, these are the right kind of risks to take.

I men­tion all this be­cause I think an­a­lytic func­tion­al­ism- which is to say rad­i­cal skep­ti­cism/​elimi­na­tivism, the meta­physics of last re­sort- only looks as good as it does be­cause no­body’s been build­ing out any al­ter­na­tives.

IV. Clos­ing thoughts

FRI is pur­su­ing a cer­tain re­search agenda, and QRI is pur­su­ing an­other, and there’s lots of value in in­de­pen­dent ex­plo­ra­tions of the na­ture of suffer­ing. I’m glad FRI ex­ists, ev­ery­body I’ve in­ter­acted with at FRI has been great, I’m happy they’re fo­cus­ing on s-risks, and I look for­ward to see­ing what they pro­duce in the fu­ture.

On the other hand, I worry that no­body’s push­ing back on FRI’s meta­physics, which seem to un­avoid­ably lead to the in­tractable prob­lems I de­scribe above. FRI seems to be­lieve these prob­lems are part of the ter­ri­tory, un­avoid­able messes that we just have to make philo­soph­i­cal peace with. But I think that func­tion­al­ism is a bad map, that the meta­phys­i­cal messes it leads to are much worse than most peo­ple re­al­ize (fatal to FRI’s mis­sion), and there are other op­tions that avoid these prob­lems (which, to be fair, is not to say they have no prob­lems).

Ul­ti­mately, FRI doesn’t owe me a defense of their po­si­tion. But if they’re open to sug­ges­tions on what it would take to con­vince a skep­tic like me that their brand of func­tion­al­ism is vi­able, or at least res­cuable, I’d offer the fol­low­ing:

Re: Ob­jec­tion 1 (motte-and-bailey), I sug­gest FRI should be as clear and com­plete as pos­si­ble in their ba­sic defi­ni­tion of suffer­ing. In which par­tic­u­lar ways is it in­ef­fable/​fuzzy, and in which par­tic­u­lar ways is it pre­cise? What can we definitely say about suffer­ing, and what can we definitely never de­ter­mine? Pr­ereg­is­ter­ing on­tolog­i­cal com­mit­ments and method­olog­i­cal pos­si­bil­ities would help guard against FRI’s defi­ni­tion of suffer­ing chang­ing based on con­text.

Re: Ob­jec­tion 2 (in­tu­ition du­els), FRI may want to in­ter­nally “war game” var­i­ous fu­ture sce­nar­ios in­volv­ing AGI, WBE, etc, with one side ar­gu­ing that a given syn­thetic (or even ex­trater­res­trial) or­ganism is suffer­ing, and the other side ar­gu­ing that it isn’t. I’d ex­pect this would help di­ag­nose what sorts of dis­agree­ments fu­ture the­o­ries of suffer­ing will need to ad­ju­di­cate, and per­haps illu­mi­nate im­plicit eth­i­cal in­tu­itions. Shar­ing the re­sults of these simu­lated dis­agree­ments would also be helpful in mak­ing FRI’s rea­son­ing less opaque to out­siders, al­though mak­ing ev­ery­thing trans­par­ent could lead to cer­tain strate­gic dis­ad­van­tages.

Re: Ob­jec­tion 3 (con­ver­gence re­quires com­mon truth), I’d like FRI to ex­plore ex­actly might drive con­silience/​con­ver­gence in the­o­ries of suffer­ing, and what pre­cisely makes one the­ory of suffer­ing bet­ter than an­other, and ideally to eval­u­ate a range of ex­am­ple the­o­ries of suffer­ing un­der these crite­ria.

Re: Ob­jec­tion 4 (as­sum­ing that con­scious­ness is a reifi­ca­tion pro­duces more con­fu­sion, not less), I would love to see a his­tor­i­cal treat­ment of reifi­ca­tion: lists of reifi­ca­tions which were later dis­solved (e.g., élan vi­tal), vs scat­tered phe­nom­ena that were later unified (e.g., elec­tro­mag­netism). What pat­terns do the former have, vs the lat­ter, and why might con­scious­ness fit one of these buck­ets bet­ter than the other?

Re: Ob­jec­tion 5 (the Hard Prob­lem of Con­scious­ness is a red her­ring), I’d like to see a more de­tailed treat­ment of what kinds of prob­lem peo­ple have in­ter­preted the Hard Prob­lem as, and also more anal­y­sis on the prospects of Qualia For­mal­ism (which I think is the max­i­mally-em­piri­cal, max­i­mally-char­i­ta­ble in­ter­pre­ta­tion of the Hard Prob­lem). It would be helpful for us, in par­tic­u­lar, if FRI pre­reg­istered their ex­pec­ta­tions about QRI’s pre­dic­tions, and their view of the rel­a­tive ev­i­dence strength of each of our pre­dic­tions.

Re: Ob­jec­tion 6 (map­ping to re­al­ity), this is per­haps the heart of most of our dis­agree­ment. From Brian’s quotes, he seems split on this is­sue; I’d like clar­ifi­ca­tion about whether he be­lieves we can ever pre­cisely/​ob­jec­tively map spe­cific com­pu­ta­tions to spe­cific phys­i­cal sys­tems, and vice-versa. And if so— how? If not, this seems to prop­a­gate through FRI’s eth­i­cal frame­work in a dis­as­trous way, since any­one can ar­gue that any phys­i­cal sys­tem does, or does not, ‘code’ for mas­sive suffer­ing, and there’s no prin­ci­pled way de­rive any ‘ground truth’ or even pick be­tween in­ter­pre­ta­tions in a prin­ci­pled way (e.g. my pop­corn ex­am­ple). If this isn’t the case— why not?

Brian has sug­gested that “cer­tain high-level in­ter­pre­ta­tions of phys­i­cal sys­tems are more ‘nat­u­ral’ and use­ful than oth­ers” (per­sonal com­mu­ni­ca­tion); I agree, and would en­courage FRI to ex­plore sys­tem­atiz­ing this.

It would be non-triv­ial to port FRI’s the­o­ries and com­pu­ta­tional in­tu­itions to the frame­work of “hy­per­com­pu­ta­tion”—i.e., the un­der­stand­ing that there’s a for­mal hi­er­ar­chy of com­pu­ta­tional sys­tems, and that Tur­ing ma­chines are only one level of many—but it may have benefits too. Namely, it might be the only way they could avoid Ob­jec­tion 6 (which I think is a fatal ob­jec­tion) while still al­low­ing them to speak about com­pu­ta­tion & con­scious­ness in the same breath. I think FRI should look at this and see if it makes sense to them.

Re: Ob­jec­tion 7 (FRI doesn’t fully bite the bul­let on com­pu­ta­tion­al­ism), I’d like to see re­sponses to Aaron­son’s afore­men­tioned thought ex­per­i­ments.

Re: Ob­jec­tion 8 (dan­ger­ous com­bi­na­tion), I’d like to see a clar­ifi­ca­tion about why my in­ter­pre­ta­tion is un­rea­son­able (as it very well may be!).


In con­clu­sion- I think FRI has a crit­i­cally im­por­tant goal- re­duc­tion of suffer­ing & s-risk. How­ever, I also think FRI has painted it­self into a cor­ner by ex­plic­itly dis­al­low­ing a clear, dis­agree­ment-me­di­at­ing defi­ni­tion for what these things are. I look for­ward to fur­ther work in this field.


Mike Johnson

Qualia Re­search Institute

Ac­knowl­edge­ments: thanks to An­drés Gomez Emils­son, Brian To­masik, and Max Daniel for re­view­ing ear­lier drafts of this.


My sources for FRI’s views on con­scious­ness:

Fla­vors of Com­pu­ta­tion are Fla­vors of Con­scious­ness:


Is There a Hard Prob­lem of Con­scious­ness?


Con­scious­ness Is a Pro­cess, Not a Moment


How to In­ter­pret a Phys­i­cal Sys­tem as a Mind


Dis­solv­ing Con­fu­sion about Consciousness


De­bate be­tween Brian & Mike on con­scious­ness:


Max Daniel’s EA Global Bos­ton 2017 talk on s-risks:


Mul­tipo­lar de­bate be­tween Eliezer Yud­kowsky and var­i­ous ra­tio­nal­ists about an­i­mal suffer­ing:


The In­ter­net En­cy­clo­pe­dia of Philos­o­phy on func­tion­al­ism:


Gor­don McCabe on why com­pu­ta­tion doesn’t map to physics:


Toby Ord on hy­per­com­pu­ta­tion, and how it differs from Tur­ing’s work:


Luke Muehlhauser’s OpenPhil-funded re­port on con­scious­ness and moral pa­tient­hood:


Scott Aaron­son’s thought ex­per­i­ments on com­pu­ta­tion­al­ism:


Se­len Ata­soy on Con­nec­tome Har­mon­ics, a new way to un­der­stand brain ac­tivity:


My work on for­mal­iz­ing phe­nomenol­ogy:

My meta-frame­work for con­scious­ness, in­clud­ing the Sym­me­try The­ory of Valence:


My hy­poth­e­sis of home­o­static reg­u­la­tion, which touches on why we seek out plea­sure:


My ex­plo­ra­tion & parametriza­tion of the ‘neu­roa­cous­tics’ metaphor sug­gested by Ata­soy’s work:


My col­league An­drés’s work on for­mal­iz­ing phe­nomenol­ogy:

A model of DMT-trip-as-hy­per­bolic-ex­pe­rience:


June 2017 talk at Con­scious­ness Hack­ing, de­scribing a the­ory and ex­per­i­ment to pre­dict peo­ple’s valence from fMRI data:


A parametriza­tion of var­i­ous psychedelic states as op­er­a­tors in qualia space:


A brief post on valence and the fun­da­men­tal at­tri­bu­tion er­ror:


A sum­mary of some of Se­len Ata­soy’s cur­rent work on Con­nec­tome Har­mon­ics: