Exploring consciousness, AI alignment, moral psychology and how they interact in decision-making.
Always happy to chat!
Exploring consciousness, AI alignment, moral psychology and how they interact in decision-making.
Always happy to chat!
Thought-provoking argument! However, I see some gaps:
1.
The reason why preferences matter is straightforward: if you prefer a moral theory according to which preferences do not matter, your preference for that moral theory cannot matter either.
I think this mixes up two senses of “mattering”.
I might not think my preference for hedonic utilitarianism is terminally valuable but still think that believing true things is instrumentally valuable to achieve utility.
2. I share your intuition that “preferences need to have a weight” but I don’t think that’s the same thing as “being represented within an attention schema”. I think plants can weigh their preferences (e.g. tolerate drier soil if the luminosity is sufficient) without having explicit preferences (i.e. language?) or a comprehensive world model, let alone a meta-model.
While I agree with the thesis from the title, I think it might be better anchored in ~Sharon H. Rawlette’s argument that conscious sensations of “(un)desirability” construct/define what we mean by morality.
Thanks for clarifying—sorry it might sound like I was twisting your words—I was trying to think through multiple versions of the experiment you propose.
The amount to which we attribute/misattribute consciousness to different entities depends on the correct theory, so it is very uncertain at this point. But I would endorse this broader research program of systematically decoding which of our intuitions about consciousness are biases and which are valid measurements of brain data.
One reason why I thought about Trolley problems was that they show not only % of people who have an abstract belief about consciousness but also the degree / intensity of its perceived experiences. I’m surprised to see a significant fraction of people (1, 2) say current AI is conscious, although a poll about a personal sacrifice like this one (in a less narrow Twitter bubble) might be more relevant to assess how serious they are—and might better model the kind of moral error that we’re more likely to make during the AGI transformation.
Regarding “biochemical processes”—the phrasing matters a lot here. Searle, who came up with The Chinese Room, concludes this thought experiment by suggesting thinking requires the specific biochemistry that brains use just like lactation or photosynthesis are defined by specific molecules, rather than algorithms. This formulation is specifically chosen in contrast to functionalist/computationalist views which are mainstream nowadays.
Disintermediated by a computer to replace biases introduced by cuteness and non-verbal expressiveness with biases introduced by symbolic manipulation, humans would without exception rate a pocket calculator or Eliza- style toy script as more likely to be conscious than a dog or a two year old child. I don’t think anyone sincerely believes this to actually be the case.
I’m not sure what you’re imagining here. If you give people a trolley problem (only via text) and say on one track, there’s a dog and on the other one, there’s a computer program Eliza and they can chat to either, most would choose to save the dog, even if its only text output were “whoof whoof”.
If you’re imagining the thought experiment would somehow block them to make the inference that one entity is an actual dog and the other a program, then yes: I agree with the point that language increases empathy but I’d say the magnitude is much smaller than “non-verbal cues”. If you had a Trolley dilemma with one blank track and one track with either a dog or Eliza, I think 90% would pay $0 to save Eliza but a often a lot of money to save the dog.
Unless we’re positing dualism, what we perceive at consciousness is an emergent property of complex chemical processes rooted in our biology (and the imperatives of our biology to survive and self replicate.
Most non-dualists would say consciousness is a feature of information processing (functionalists, illusionists, non-reductive materialists) or something as fundamental as physics (Russelian monism, pan(proto)psychism). The particular emergentist and biological theory that is rooted in the instinct to self-replicate and survive is something I’d expect 0.1-7% of philosophers of mind to endorse. But whatever the actual percentages, I definitely disagree dualism and this theory are the only options. The phrase “rooted in [biochemical processes]” is the least controversial but it still connotes something most might not endorse—i.e. that biology and chemistry is the correct category or level of description (Axis 3 in this taxonomy).
I endorse the temperature approach. I’m not sure illusionists would accept the question “What’s the % probability that an entity is conscious?” as meaningful but maybe a similar question could indeed be universally accepted, like “Compared to your pain intensity 1 (being poked by a needle), what’s your central estimate for the intensity of suffering experienced in scenario X?”
Just to clarify, my argument didn’t concern classical p-zombies but what I call “honest p-zombies”—intelligent humanoid entities capable of metacognition but without any intuition similar to our phenomenal intuitions.
Asking whether a process is “close enough [to the brain] to produce the same effect” implicitly begs the question—i.e. assumes consciousness is biological.
P-zombies who wouldn’t describe their sensations in terms like “qualia” would likely have an evolutionary fit that’s equal to humans. I don’t know if they’re possible, but I think it demonstrates evolution wasn’t optimizing for consciousness. Therefore, we shouldn’t ask “is such system sufficiently close to the brain” but “is it sufficiently close to the processes that happen to make brain (phenomenally) conscious”.
In general, there isn’t agreement about any correlate of consciousness within philosophy of mind—there are well regarded thinkers who claim it’s not real (Frankish) or that it’s the basic substance of the universe (Goff). I think it’s possible consciousness is similar to, say, intelligence or humor, which means you need a complex system to meaningfully implement it. However, I think it’s unlikely that “complexity itself” is what gives rise to consciousness, e.g. sunspots are very complex (~unpredictable interaction of many elements).
I’m not convinced by Anil Seth’s narrative about our biases in mind attribution.
I’ve been to his talk where he summarized these points. He talked about our inherent tendency to emotionally relate to entities that can use language. Later, he presented a picture of a transistor and a picture of a monkey and asked which seems more conscious on priors.
The prime mechanism by which human decide whether an entity is valuable and conscious is empathy. We are evolved to feel empathy—that is, modelling “what it is like to be them”—towards entities that have faces, limbs, fur and a squishy body. We feel a lot of empathy for pets and babies—entities that don’t control language. And we feel zero empathy for the Chinese room.
The argument relies a lot on trying to depict computers as something rigid, cold and dead and life as something interesting, warm and energetic. This works well for our empathy module but does not convince me as a philosophical argument.
I’m curious whether there’s any definition of brain’s processes as “non-algorithmic” that doesn’t end up in Russellian monism (which I’m inclined to support but suspect Seth isn’t). Aren’t the laws of physics themselves an algorithm? I see autopoiesis as the most interesting connection between consciousness and life but precisely when you find a clear conceptualization like this, it becomes unclear
why it couldn’t be implemented digitally—e.g. aren’t LLMs autopoietic systems, where each token determines the next one?
what predictions it makes about the variation in human consciousness (in terms of modalities, intensity and reportability)? E.g. if consciousness is dependent on the degree of embodiment, does it predict Stephen Hawking had a low intensity of consciousness? Is the variance found in human consciousness better explained by the computational differences or differences in the mentioned random biological interactions?
Great job! Personally, I’d alter the landing page to include recommendations on how to take action outside of working in AI safety (e.g. donation recommendations, “meet people” link) - or some comment why learning more seems crucial.
We have seen an order-of-magnitude increase in the interest in AI alignment, according to Google Trends. Part of it (July peak) can be attributed to Grok’s behavior (see my little analysis). The YouTube channel AI in Context correctly identified this opportunity and swiftly released a viral video explaining how the incident connects to alignment. September peak might be attributed to the release of If Anyone Builds It.
Fortunately, the WWOTF link still works: https://whatweowethefuture.com/wp-content/uploads/2023/06/Climate-Change-Longtermism.pdf
Alternatively, it loads a little faster on Web Archive: https://web.archive.org/web/20250426191314/https://whatweowethefuture.com/wp-content/uploads/2023/06/Climate-Change-Longtermism.pdf
I disagree with your argumentation but agree there’s quite a significant (e.g. 6.5%) chance that you’re correct about the thesis that consciousness has causal efficacy through quantum indeterminacy and that this might be helpful for alignment.
However, my take is that if the effects were very significant and similarly straightforward, they would be scientifically detectable even with very simple fun experiments like the “global consciousness project”. It’s hard to imagine “selection” among possibly infinite universes and planets and billions of years—but if you manage to do so, the “coincidences” that brought about life can be easily explained with the anthropic principle.
I see this as a more general lesson: People are often overconfident about a theory because they can’t imagine an alternative. When it comes to consciousness, the whole debate comes down to to what extent something that seems impossible to imagine is a failure of imagination vs failure of a theory. Personally, I myself give most weight to Rusellian monism but I definitely recommend letting some room for reductionism, especially if you don’t see how anyone could possibly believe that, as that was the case for me, before I deeply engaged with the reductionist literature.
But I’m glad whenever people aren’t afraid to be public about weird ideas—someone should be trying this and I’m really curious whether e.g. Nirvanic AI finds anything.
The MechaHitler incident seems to have worked as something of a warning shot, Google interest in AI risk has reached an absolute all time high. Trump’s AI plan came out on the same day but the comparisons suggests Grok accounts for ~70% of the peak.
I can’t quite dismiss the possibility that the interest was driven by new Chinese AI norms, because Chinese people have to use VPNs, so the geography isn’t trustworthy. However, if this were true, I would expect that the number of searches for AI risk in Chinese on Google would be higher than roughly zero (link).
I think objective ordering does imply “one should” so I subscribe to moral realism. However, recently I’ve been highly appreciating the importance of your insistence that the “should” part is kind of fake—i.e. it means something like “action X is objectively the best way to create most value from the point of view of all moral patients” but it doesn’t imply that an ASI that figures out what is morally valuable will be motivated to act on it.
(Naively, it seems like if morality is objective, there’s basically a physical law formulated as “you should do actions with characteristics X”. Then, it seems like a superintelligence that figures out all the physical laws internalizes “I should do X”. I think this is wrong mainly because in human brains, that sentence deceptively seems to imply “I want to do x” (or perhaps “I want to want x”) whereas it actually means “Provided I want to create maximum value from an impartial perspective, I want to do x”. In my own case, the kind of argument for optimism around AI doom in the style that @Bentham’s Bulldog advocated in Doom Debates seemed a bit more attractive before I truly spelled this out in my head.)
My impression is that CEA’s goal is to fund the meta cause area and the main goal of local groups is to organize events. While funding is hard to democratize unless you convince some billionaire, democratizing the organizations that run events is trivial. [Edit: Also, while it makes sense to organize local events directly based on the local community’s preferences / demand, I think it makes sense to take a more top-down (principles-oriented) approach when it comes to distributing funding, because the “demand-side” here comprises of every person on the planet who appreciates money.]
But now I do realize that in my head, I equated CEA with OpenPhil’s wing for the meta cause area, which might not be accurate. I also feel good about democratizing CEA if I imagine it implemented as an indirect democracy (i.e. with local organizations voting, instead of every EA member). This probably moves me towards the middle of the poll—i.e. I would be in favor of this kind of democracy. Indirect democracy would reduce the problem of uninformed voters, the problem of dealing with problems publicly and the problem of disbalance in the level of reflection between the average member and highly-engaged members.
Thank you! Democratizing local groups sounds clearly good to me and I assumed it was the norm but I didn’t find any data on that.
I can see two realistic models for the parallel organization, which I’m not a fan of:
1) A competitor to CEA. Just like CEA, this org would mainly fundraise and fund projects.
I think the problems with selecting members mentioned in this thread are overstated. Any political party faces the same problem. I suspect that in practice, strategically recruiting weakly engaged EAs just isn’t a big problem. But it could be either mitigated by requiring members to meet any of the conditions you mentioned (fees, EA org employment, course certificate), or setting a number of votes per regions, e.g. based on similar indicators of the # of engaged members.
Personally, I’m sufficiently satisfied with the general CEA agenda, that I suspect this would be a waste of effort. That’s in part because I think highly engaged EAs who dominate these orgs have more philosophically robust views and in part because I don’t think this competitor organization would be able to raise more than 10 % of CEA’s budget (~80 % of it comes from OpenPhil). So, given the main goal of funding projects, I don’t think this org would be sufficiently better to be worth all the costs—and not just costs inherent in the operations, but also the emotional costs of having these debates publicly and the costs of coordinating “who is willing to fund what” which I imagine might already be a nightmare.
2) A union. A soft counter-power to CEA.
If this org’s only power were the possibility to strike or produce resolutions, I’m concerned this would artificially inflate unproductive discord. My impression is that unions often produce irrational policies perhaps because they only have quite extreme measures at their disposal, which creates an illusory “us vs. them” aesthetics for relationships that are overall very positive-sum.
However, I have some sympathy for the idea of
3) A community ambassador who would be democratically voted e.g. by all EA Forum members and who’s job would be to facilitate the communication between CEA and the community in both directions. I imagine someone at CEA might already effectively hold this job, so perhaps they would be interested in having their choice ratified by the community. Ideally, this community ambassador would collect people’s concerns and visit CEA board meetings, in order to be able to integrate both perspectives.
However, I think the cost of this position is non-negligible. Given the power-law distribution of impact among people and given the many rounds of tests, which employees at EA organizations allegedly undergo—a democratic vote would probably yield a much less discerning choice (as most people wouldn’t spend more than 30 minutes picking a candidate). I’m not sure to what extent the wisdom of the crowd might apply here.
Because of similar uncertainties and because I wouldn’t count this as a “leadership role”, I’m voting “moderately disagree”.
I think the comparison in energy consumption is misleading because phones use unintuively little energy, as much as 10 Google searches per one charging, (Andy Masley has good articles on AI emissions), using a smartphone for one year costs less than a dollar. I think a good heuristic is “if it’s free, it uses so little energy that it’s not worth considering”.
If you’re not paying to generate it, you’re also not taking any income away from artists.
The argument that it’s bad vibes for artists is a good one.
Maybe I misunderstood it as an argument for seeing preferences as a terminal value. I think almost all theories would agree “this theory matters in some sense” but I can imagine many ethical theories that do not see “believing in this moral theory” as either good or bad, each for different reasons. Hedonic utilitarianism, as an example of a consequentialist theory, does not see “believing in hedonic utilitarianism” as inherently valuable—depending on the consequences of believing it, it might even recommend not believing it in some contexts—while still positing hedonism itself is true. For a nihilist, “nihilism itself” probably does not matter morally but it does “matter” in terms of its alleged explanatory power.
You could see the luminosity itself as a factor that moderates the intensity of the signal to grow. But there also seem to be many internal processes that moderate the intensity of signals processed by plants: AI points me to a barley study where stress-related calcium signals varied by stimulus, dose, and tissue (pmc.ncbi.nlm.nih.gov). It still doesn’t require an attention schema—i.e. weighted preferences don’t imply consciousness (at least under your conception of preferences and Graziano’s of consciousness, as I understand them).