Great post, and agreed about the dynamics involved. I worry the current EA synthesis has difficulty addressing this class of criticism (power corrupts; transactional donations, geeks/mops/sociopaths), but perhaps we haven’t seen EA’s final form.
MikeJohnson
As a small comment, I believe discussions of consciousness and moral value tend to downplay the possibility that most consciousness may arise outside of what we consider the biological ecosystem.
It feels a bit silly to ask “what does it feel like to be a black hole, or a quasar, or the Big Bang,” but I believe a proper theory of consciousness should have answers to these questions.
We don’t have that proper theory. But I think we can all agree that these megaphenomena involve a great deal of matter/negentropy and plausibly some interesting self-organized microstructure- though that’s purely conjecture. If we’re charting out EV, let’s keep the truly big numbers in mind (even if we don’t know how to count them yet).
Thank you for this list.
#2: I left a comment on Matthew’s post that I feel is relevant: https://forum.effectivealtruism.org/posts/CRvFvCgujumygKeDB/my-current-thoughts-on-the-risks-from-seti?commentId=KRqhzrR3o3bSmhM7c
#16: I gave a talk for Mathematical Consciousness Science in 2020 that covers some relevant items: I’d especially point to 7,8,9,10 in my list here: https://opentheory.net/2022/04/it-from-bit-revisited/
#18+#20: I feel these are ultimately questions for neuroscience, not psychology. We may need a new sort of neuroscience to address them. (What would that look like?)
I posted this as a comment to Robin Hanson’s “Seeing ANYTHING Other Than Huge-Civ Is Bad News” —
————
I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). Being able to make reasonable conjectures here will greatly improve our a priori expectations and our interpretation of available cosmological evidence.
Premise 1: Eventually, civilizations progress until they can engage in megascale engineering: Dyson spheres, etc.
Premise 2: Consciousness is the home of value: Disneyland with no children is valueless.
Premise 2.1: Over the long term we should expect at least some civilizations to fall into the attractor of treating consciousness as their intrinsic optimization target.
Premise 3: There will be convergence that some qualia are intrinsically valuable, and what sorts of qualia are such.
Conjecture: A key piece of evidence for discerning the presence of advanced alien civilizations will be megascale objects which optimize for the production of intrinsically valuable qualia.
Speculatively, I suspect black holes and pulsars might fit this description.
More:
https://opentheory.net/2019/09/whats-out-there/
https://opentheory.net/2019/02/simulation-argument/
————
Reasonable people can definitely disagree here, and these premises may not work for various reasons. But I’d circle back to the first line: I feel these debates are too agnostic about the likely telos of aliens (whether grabby or not). In this sense I think we’re leaving value on the table.
- 2 Jun 2022 7:43 UTC; 2 points) 's comment on My list of effective altruism ideas that seem to be underexplored by (
Great, thank you for the response.
On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.
That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.
It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.
One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.
What do you see as Aligned AI’s core output, and what is its success condition? What do you see the payoff curve being — i.e. if you solve 10% of the problem, do you get [0%|10%|20%] of the reward?
I think a fresh AI safety approach may (or should) lead to fresh reframes on what AI safety is. Would your work introduce a new definition for AI safety?
Value extrapolation may be intended as a technical term, but intuitively these words also seem inextricably tied to both neuroscience and phenomenology. How do you plan on interfacing with these fields? What key topics of confusion within neuroscience and phenomenology are preventing interfacing with these fields?
I was very impressed by the nuance in your “model fragments” frame, as discussed at some past EAG. As best as I can recall, the frame was: that observed preferences allow us to infer interesting things about the internal models that tacitly generate these preferences, that we have multiple overlapping (and sometimes conflicting) internal models, and that it is these models that AI safety should aim to align with, not preferences per se. Is this summary fair, and does this reflect a core part of Aligned AI’s approach?
Finally, thank you for taking this risk.
I consistently enjoy your posts, thank you for the time and energy you invest.
Robin Hanson is famous for critiques in the form of “X isn’t about X, it’s about Y.” I suspect many of your examples may fit this pattern. To wit, Kwame Appiah wrote that “in life, the challenge is not so much to figure out how best to play the game; the challenge is to figure out what game you’re playing.” Andrew Carnegie, for instance, may have been trying to maximize status, among his peers or his inner mental parliament. Elon Musk may be playing a complicated game with SpaceX and his other companies. To critique assumes we know the game, but I suspect we only have a dim understanding of ”the great game” as it’s being played today.
When we see apparent dysfunction, I tend to believe there is dysfunction, but more deeper in the organizational-civilizational stack than it may appear. I.e. I think both Carnegie and Musk were/are hyper-rational actors responding to a very complicated incentive landscape.
That said, I do think ideas get lodged in peoples’ heads, and people just don’t look. Fully agree with your general suggestion, “before you commit yourself to a lifetime’s toil toward this goal, spend a little time thinking about the goal.”
That said— I’m also loathe to critique doers too harshly, especially across illegible domains like human motivation. I could see how more cold-eyed analysis could lead to wiser aim in what things to build; I could also see it leading to fewer great things being built. I can’t say I see the full tradeoffs at this point.
Most likely infectious diseases also play a significant role in aging- have seen some research suggesting that major health inflection points are often associated with an infection.
I like your post and strongly agree with the gist.
DM me if you’re interested in brainstorming alternatives to the vaccine paradigm (which seems to work much better for certain diseases than others).
Generally speaking, I agree with the aphorism “You catch more flies with honey than vinegar;”
For what it’s worth, I interpreted Gregory’s critique as an attempt to blow up the conversation and steer away from the object level, which felt odd. I’m happiest speaking of my research, and fielding specific questions about claims.
Gregory, I’ll invite you to join the object-level discussion between Abby and I.
Welcome, thanks for the good questions.
Asymmetries in stimuli seem crucial for getting patterns through the “predictive coding gauntlet.” I.e., that which can be predicted can be ignored. We demonstrably screen perfect harmony out fairly rapidly.
The crucial context for STV on the other hand isn’t symmetries/asymmetries in stimuli, but rather in brain activity. (More specifically, as we’re currently looking at things, in global eigenmodes.)
With a nod back to the predictive coding frame, it’s quite plausible that the stimuli that create the most internal symmetry/harmony are not themselves perfectly symmetrical, but rather have asymmetries crafted to avoid top-down predictive models. I’d expect this to vary quite a bit across different senses though, and depend heavily on internal state.
The brain may also have mechanisms which introduce asymmetries in global eigenmodes, in order to prevent getting ‘trapped’ by pleasure — I think of boredom as fairly sophisticated ‘anti-wireheading technology’ — but if we set aside dynamics, the assertion is that symmetry/harmony in the brain itself is intrinsically coupled with pleasure.
Edit: With respect to the Mosers, that’s really cool example of this stuff. I can’t say I have answers here but as a punt, I’d suspect the “orthogonal neural coding of similar but distinct memories” is going to revolve around some pretty complex frequency regimes and we may not yet be able to say exact things about how ‘consonant’ or ‘dissonant’ these patterns are to each other yet. My intuition is that this result about the golden mean being the optimal ratio for non-interaction will end up intersecting with the Mosers’ work. That said I wonder if STV would assert that some sorts of memories are ‘hedonically incompatible’ due to their encodings being dissonant? Basically, as memories get encoded, the oscillatory patterns they’re encoded with could subtly form a network which determines what sorts of new memories can form and/or which sorts of stimuli we enjoy and which we don’t. But this is pretty hand-wavy speculation…
Hi Abby, I understand. We can just make the best of it.
1a. Yep, definitely. Empirically we know this is true from e.g. Kringelbach and Berridge’s work on hedonic centers of the brain; what we’d be interested in looking into would be whether these areas are special in terms of network control theory.
1c. I may be getting ahead of myself here: the basic approach to testing STV we intend is looking at dissonance in global activity. Dissonance between brain regions likely contribute to this ‘global dissonance’ metric. I’m also interested in measuring dissonance within smaller areas of the brain as I think it could help improve the metric down the line, but definitely wouldn’t need to at this point.
1d. As a quick aside, STV says that ‘symmetry in the mathematical representation of phenomenology corresponds to pleasure’. We can think of that as ‘core STV’. We’ve then built neuroscience metrics around consonance, dissonance, and noise that we think can be useful for proxying symmetry in this representation; we can think of that as a looser layer of theory around STV, something that doesn’t have the ‘exact truth’ expectation of core STV. When I speak of dissonance corresponding to suffering, it’s part of this looser second layer.
To your question — why would STV be true? — my background is in the philosophy of science, so I’m perhaps more ready to punt to this domain. I understand this may come across as somewhat frustrating or obfuscating from the perspective of a neuroscientist asking for a neuroscientific explanation. But, this is a universal thread across philosophy of science: why is such and such true? Why does gravity exist; why is the speed of light as it is? Etc. Many things we’ve figured out about reality seem like brute facts. Usually there is some hints of elegance in the structures we’re uncovering, but we’re just not yet knowledgable to see some universal grand plan. Physics deals with this a lot, and I think philosophy of mind is just starting to grapple with this in terms of NCCs. Here’s something Frank Wilczek (won the 2004 Nobel Prize in physics for helping formalize the Strong nuclear force) shared about physics:
>… the idea that there is symmetry at the root of Nature has come to dominate our understanding of physical reality. We are led to a small number of special structures from purely mathematical considerations—considerations of symmetry—and put them forward to Nature, as candidate elements for her design. … In modern physics we have taken this lesson to heart. We have learned to work from symmetry toward truth. Instead of using experiments to infer equations, and then finding (to our delight and astonishment) that the equations have a lot of symmetry, we propose equations with enormous symmetry and then check to see whether Nature uses them. It has been an amazingly successful strategy. (A Beautiful Question, 2015)
So — why would STV be the case? ”Because it would be beautiful, and would reflect and extend the flavor of beauty we’ve found to be both true and useful in physics” is probably not the sort of answer you’re looking for, but it’s the answer I have at this point. I do think all the NCC literature is going to have to address this question of ‘why’ at some point.
4. We’re ultimately opportunistic about what exact format of neuroimaging we use to test our hypotheses, but fMRI checks a lot of the boxes (though not all). As you say, fMRI is not a great paradigm for neurotech; we’re looking at e.g. headsets by Kernel and others, and also digging into the TUS (transcranial ultrasound) literature for more options.
5. Cool! I’ve seen some big reported effect sizes and I’m generally pretty bullish on neurofeedback in the long term; Adam Gazzaley‘s Neuroscape is doing some cool stuff in this area too.
Good catch; there’s plenty that our glossary does not cover yet. This post is at 70 comments now, and I can just say I’m typing as fast as I can!
I pinged our engineer (who has taken the lead on the neuroimaging pipeline work) about details, but as the collaboration hasn’t yet been announced I’ll err on the side of caution in sharing.
To Michael — here’s my attempt to clarify the terms you highlighted:
Neurophysiological models of suffering try to dig into the computational utility and underlying biology of suffering
-> existing theories talk about what emotions ‘do’ for an organism, and what neurochemicals and brain regions seem to be associated with suffering
symmetry
Frank Wilczek calls symmetry ‘change without change’. A limited definition is that it’s a measure of the number of ways you can rotate a picture, and still get the same result. You can rotate a square 90 degrees, 180 degrees, and 270 degrees and get something identical; you can rotate a circle any direction and get something identical. Thus we’d say circles have more rotational symmetries than squares (who have more than rectangles, etc)
harmony
Harmony has been in our vocabulary a long time, but it’s not a ‘crisp’ word. This is why I like to talk about symmetry, rather than harmony — although they more-or-less point in the same direction
dissonance
The combination of multiple frequencies that have a high amount of interaction, but few common patterns. Nails on a chalkboard create a highly dissonant sound; playing the C and C# keys at the same time also creates a relatively dissonant sound
resonance as a proxy for characteristic activity
I’m not sure I can give a fully satisfying definition here that doesn’t just reference CSHW; I’ll think about this one more.
Consonance Dissonance Noise Signature
A way of mathematically calculating how much consonance, dissonance, and noise there is when we add different frequencies together. This is an algorithm developed at QRI by my co-founder, Andrés
self-organizing systems
A system which isn’t designed by some intelligent person, but follows an organizing logic of its own. A beehive or anthill would be a self-organizing system; no one’s in charge, but there’s still something clever going on
Neural Annealing
In November 2019 I released a work speaking of the brain as a self-organizing system. Basically, “when the brain is in an emotionally intense state, change is easier” similar to how when metal heats up and starts to melt, it’s easier to change the shape of the metal
full neuroimaging stack
All the software we need to do an analysis (and specifically, the CSHW analysis), from start to finish
precise physical formalism for consciousness
A perfect theory of consciousness, which could be applied to anything. Basically a “consciousness meter”
STV gives us a rich set of threads to follow for clear neurofeedback targets, which should allow for much more effective closed-loop systems, and I am personally extraordinarily excited about the creation of technologies that allow people to “update toward wholesome”,
Ah yes this is a litttttle bit dense. Basically, one big thing holding back neurotech is we don’t have good biomarkers for well-being. If we design these biomarkers, we can design neurofeedback systems which work better (not sure how familiar you are with neurofeedback)
Hi Abby, thanks for the questions. I have direct answers to 2,3,4, and indirect answers to 1 and 5.
1a. Speaking of the general case, we expect network control theory to be a useful frame for approaching questions of why certain sorts of activity in certain regions of the brain are particularly relevant for valence. (A simple story: hedonic centers of the brain act as ‘tuning knobs’ toward or away from global harmony. This would imply they don’t intrinsically create pleasure and suffering, merely facilitate these states.) This paper from the Bassett lab is the best intro I know of to this;
1b. Speaking again of the general case, asynchronous firing isn’t exactly identical to the sort of dissonance we’d identify as giving rise to suffering: asynchronous firing could be framed as in uncorrelated firing, or ‘non-interacting frequency regimes’. There’s a really cool paper asserting that the golden mean is the optimal frequency ratio for non-interaction, and some applications to EEG work, in case you’re curious. What we’re more interested in is frequency combinations that are highly interacting, and lacking a common basis set. An example would be playing the C and C# keys on a piano. This lens borrows more from music theory and acoustics (e.g. Helmholtz, Sethares) than traditional neuroscience although it lines up with some work by e.g. Buzsáki (Rhythms of the Brain); Friston has also done some cool work here on frequencies, communication, and birdsong, although I’d have to find the reference.
1c. Speaking again of the general case, naively I’d expect dissonance somewhere in the brain to induce dissonance elsewhere in the brain. I‘d have to think about what reference I could point to here as I don’t know if you’ll share this intuition, but a simple analogy would be if many people are walking in a line, if someone trips, more people might trip; chaos begets chaos.
1d. Speaking, finally, of the specific case, I admit I have only a general sense of the structure of the brain networks in question and I’m hesitant to put my foot in my mouth by giving you an answer I have little confidence in. I’d probably punt to the general case, and say if there’s dissonance between these two regions, depending on the network control theory involved, it could be caused by dissonance elsewhere in the brain, and/or it could spread to elsewhere in the brain: i.e. it could be both cause and effect.
2&3. The harmonic analysis we’re most interested in depends on accurately modeling the active harmonics (eigenmodes) of the brain. EEG doesn’t directly model eigenmodes; to infer eigenmodes we’d need fairly accurate source localization. It could be there are alternative ways to test STV without modeling brain eigenmodes, and that EEG could give us. I hope that’s the case, and I hope we find it, since EEG is certainly a lot easier to work with than fMRI.
I.e. we’re definitely not intrinsically tied to source localization, but currently we just don’t see a way to get clean enough abstractions upon which we could compute consonance/dissonance/noise without source localization.
4. Usually we can, and usually it’s much better than trying to measure it with some brain scanner! The rationale for pursuing this line of research is that existing biomarkers for mood and well-being are pretty coarse. If we can design a better biomarker, it’ll be useful for e.g. neurotech wearables. If your iPhone can directly measure how happy you are, you can chart that, correlate it with things, and all sorts of things. “What you can measure, you can manage.” It could also lead to novel therapies and other technologies, and that’s probably what I’m most viscerally excited about. There are also more ‘sci-fi’ applications such as using this to infer the experience of artificial sentience.
5. This question is definitely above my pay grade; I take my special edge here to be helping build a formal theory and more accurate biomarkers for suffering, rather than public policy (e.g. Michael D. Plant‘s turf). I do suspect however that some of the knowledge gained from better biomarkers could help inform emotional wellness best practices, and these best practices could be used by everyone, not just people getting scanned. I also think some therapies that might arise out of having better biomarkers could heal some sorts of trauma more-or-less permanently, so the scanning would just need to be a one-time-thing, not continuous. But this gets into the weeds of implementation pretty quickly.
Hi Samuel, I think it’s a good thought experiment. One prediction I’ve made is that one could make an agent such as that, but it would be deeply computationally suboptimal: it would be a system that maximizes disharmony/dissonance internally, but seeks out consonant patterns externally. Possible to make but definitely an AI-complete problem.
Just as an idle question, what do you suppose the natural kinds of phenomenology are? I think this can be a generative place to think about qualia in general.
Hi Abby,
I feel we’ve been in some sense talking past each other from the start. I think I bear some of the responsibility for that, based on how my post was written (originally for my blog, and more as a summary than an explanation).
I’m sorry for your frustration. I can only say I’m not intentionally trying to frustrate you, but that we appear to have very different styles of thinking and writing and this may have caused some friction, and I have been answering object-level questions from the community as best I can.
I really appreciate you putting it like this, and endorse everything you wrote.
I think sometimes researchers can get too close to their topics and collapse many premises and steps together; they sometimes sort of ‘throw away the ladder’ that got them where they are, to paraphrase Wittgenstein. This can make it difficult to communicate to some audiences. My experience on the forum this week suggests this may have happened to me on this topic. I’m grateful for the help the community is offering on filling in the gaps.
Hi Samuel,
I’d say there’s at least some diversity of views on these topics within QRI. When I introduced STV in PQ, I very intentionally did not frame it as a moral hypothesis. If we’re doing research, best to keep the descriptive and the normative as separate as possible. If STV is true it may make certain normative frames easier to formulate, but STV itself is not a theory of morality or ethics.
One way to put this is that when I wear my philosopher’s hat, I’m most concerned about understanding what the ‘natural kinds’ (in Plato’s terms) of qualia are. If valence is a natural kind (similar to how a photon or electromagnetism are natural kinds), that’s important knowledge about the structure of reality. My sense is that ‘understanding what reality’s natural kinds are’ is prior to ethics: first figure out what is real, and then everything else (such as ethics and metaethics) becomes easier.
In terms of specific ethical frames, we do count among QRI some deeply committed hedonistic utilitarians. I see deep value in that frame although I would categorize myself as closer to a virtue ethicist.
Hi all, I messaged some with Holly a bit about this, and what she shared was very helpful. I think a core part of what happened was a mismatch of expectations: I originally wrote this content for my blog and QRI’s website, and the tone and terminology was geared toward “home team content”, not “away team content”. Some people found both the confidence and somewhat dense terminology offputting, and I think that’s reasonable of them to raise questions. As a takeaway, I‘ve updated that crossposting involves some pitfalls and intend to do things differently next time.
Speaking broadly, I think people underestimate the tractability of this class of work, since we’re already doing this sort of inquiry under different labels. E.g.,
Nick Bostrom coined, and Roman Yampolskiy has followed up on, the Simulation Hypothesis, which is ultimately a Deist frame;
I and others have written various inquiries about the neuroscience of Buddhist states (“neuroscience of enlightenment” type work);
Robin Hanson has coined and offered various arguments around the Great Filter.
In large part, I don’t think these have been supported as longtermist projects, but it seems likely to me that there‘s value in pulling these strings, and each is at least directly adjacent to theological inquiry.