We can say that a high-level phenomenon is strongly emergent with respect to a low-level domain when the high-level phenomenon arises from the low-level domain, but truths concerning that phenomenon are not deducible even in principle from truths in the low-level domain.
Suppose we have a Python program running on a computer. Truths about the Python program are, in some sense, reducible to physics; however the reduction itself requires resolving philosophical questions. I don’t know if this means the Python program’s functioning (e.g. values of different variables at different times) are “strongly emergent”; it doesn’t seem like an important question to me.
Downward causation means that higher-level phenomena are not only irreducible but also exert a causal efficacy of some sort. … [This implies] low-level laws will be incomplete as a guide to both the low-level and the high-level evolution of processes in the world.
In the case of the Python program this seems clearly false (it’s consistent to view the system as a physical system without reference to the Python program). I expect this is also false in the case of consciousness. I think almost all computationalists would strongly reject downwards causation according to this definition. Do you know of any computationalists who actually advocate downwards causation (i.e. that you can’t predict future physical states by just looking at past physical states without thinking about the higher levels)?
IMO consciousness has power over physics the same way the Python program has power over physics; we can consider counterfactuals like “what if this variable in the Python program magically had a different value” and ask what would happen to physics if this happened (in this case, maybe the variable controls something displayed on a computer screen, so if the variable were changed then the computer screen would emit different light). Actually formalizing questions like “what would happen if this variable had a different value” requires a theory of logical counterfactuals (which MIRI is researching, see this paper).
Notably, Python programs usually don’t “make choices” such that “control” is all that meaningful, but humans do. Here I would say that humans implement a decision theory, while most Python programs do not (although some Python programs do implement a decision theory and can be meaningfully said to “make choices”). “Implementing a decision theory” means something like “evaluating multiple actions based on what their effects are expected to be, and choosing the one that scores best according to some metric”; some AI systems like reinforcement learners implement a decision theory.
(I’m writing this comment to express more “computationalism has a reasonable steelman that isn’t identified as a possible position in PQ” rather than “computationalism is clearly right”)
Thus, we would need to be open to the possibility that certain interventions could cause a change in a system’s physical substrate (which generates its qualia) without causing a change in its computational level (which generates its qualia reports)
It seems like this means that empirical tests (e.g. neuroscience stuff) aren’t going to help test aspects of the theory that are about divergence between computational pseudo-qualia (the things people report on) and actual qualia. If I squint a lot I could see “anthropic evidence” being used to distinguish between pseudo-qualia and qualia, but it seems like nothing else would work.
I’m also not sure why we would expect pseudo-qualia to have any correlation with actual qualia? I guess you could make an anthropic argument (we’re viewing the world from the perspective of actual qualia, and our sensations seem to match the pseudo-qualia). That would give someone the suspicion that there’s some causal story for why they would be synchronized, without directly providing such a causal story.
(For the record I think anthropic reasoning is usually confused and should be replaced with decision-theoretic reasoning (e.g. see this discussion), but this seems like a topic for another day)
Yes, the epistemological challenges with distinguishing between ground-truth qualia and qualia reports are worrying. However, I don’t think they’re completely intractable, because there is a causal chain (from Appendix C):
Our brain’s physical microstates (perfectly correlated with qualia) -->
The logical states of our brain’s self-model (systematically correlated with our brain’s physical microstates) -->
Our reports about our qualia (systematically correlated with our brain’s model of its internal state)
.. but there could be substantial blindspots, especially in contexts where there was no adaptive benefit to having accurate systematic correlations.
Awesome, I do like your steelman. More thoughts later, but just wanted to share one notion before sleep:
With regard to computationalism, I think you’ve nailed it. Downward causation seems pretty obviously wrong (and I don’t know of any computationalists that personally endorse it).
IMO consciousness has power over physics the same way the Python program has power over physics
Totally agreed, and I like this example.
it’s consistent to view the system as a physical system without reference to the Python program
Right- but I would go even further. Namely, given any non-trivial physical system, there exists multiple equally-valid interpretations of what’s going on at the computational level. The example I give in PQ is: let’s say I shake a bag of popcorn. With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings. Am I causing suffering by shaking that bag of popcorn?
And I think all computation is like this, if we look closely- there exists no frame-invariant way to map between computation and physical systems in a principled way… just useful mappings, and non-useful mappings (and ‘useful’ is very frame-dependent).
This introduces an inconsistency into computationalism, and has some weird implications: I suspect that, given any computational definition of moral value, there would be a way to prove any arbitrary physical system morally superior to any other arbitrary physical system. I.e., you could prove both that A>B, and B>A.
… I may be getting something wrong here. But it seems like the lack of a clean quarksbits mapping ultimately turns out to be a big deal, and is a big reason why I advocate not trying to define moral value in terms of Turing machines & bitstrings.
Instead, I tend to think of ethics as “how should we arrange the [quarks|negentropy] in our light-cone?”—ultimately we live in a world of quarks, so ethics is a question of quarks (or strings, or whatnot).
However! Perhaps this is just a failure of my imagination. What is ethics if not how to arrange our physical world? Or can you help me steelman computationalism against this inconsistency objection?
Thanks again for the comments. They’re both great and helpful.
Thanks for your comments too, I’m finding them helpful for understanding other possible positions on ethics.
With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings.
OK, how about a rule like this:
Physical system P embeds computation C if and only if P has different behavior counterfactually on C taking on different values
(formalizing this rule would require a theory of logical counterfactuals; I’m not sure if I expect a fully general theory to exist but it seems plausible that one does)
I’m not asserting that this rule is correct but it doesn’t seem inconsistent. In particular it doesn’t seem like you could use it to prove A > B and B > A. And clearly your popcorn embeds neither a cat nor the suffering of five holocausts under this rule.
If it turns out that no simple rule of this form works, I wouldn’t be too troubled, though; I’d be psychologically prepared to accept that there isn’t a clean quarkscomputations mapping. Similar to how I already accept that human value is complex, I could accept that human judgments of “does this physical system implement this computation” are complex (and thus can’t be captured in a simple rule). I don’t think this would make me inconsistent, I think it would just make me more tolerant of nebulosity in ethics. At the moment it seems like clean mappings might exist and so it makes sense to search for them, though.
Instead, I tend to think of ethics as “how should we arrange the [quarks|negentropy] in our light-cone?”—ultimately we live in a world of quarks, so ethics is a question of quarks (or strings, or whatnot).
On the object level, it seems like it’s possible to think of painting as “how should we arrange the brush strokes on the canvas?”. But it seems hard to paint well while only thinking at the level of brush strokes (and not thinking about the higher levels, like objects). I expect ethics to be similar; at the very least if human ethics has an “aesthetic” component then it seems like designing a good light cone is at least as hard as making a good painting. Maybe this is a strawman of your position?
On the meta level, I would caution against this use of “ultimately”; see here and here (the articles are worded somewhat disagreeably but I mostly endorse the content). In some sense ethics is about quarks, but in other senses it’s about:
I think these are all useful ways of viewing ethics, and I don’t feel the need to pick a single view (although I often find it appealing to look at what some views say about what other views are saying and resolving the contradictions between them). There are all kinds of reasons why it might be psychologically uncomfortable not to have a simple theory of ethics (e.g. it’s harder to know whether you’re being ethical, it’s harder to criticize others for being unethical, it’s harder for groups to coordinate around more complex and ambiguous ethical theories, you’ll never be able to “solve” ethics once and then never have to think about ethics again, it requires holding multiple contradictory views in your head at once, you won’t always have a satisfying verbal justification for why your actions are ethical). But none of this implies that it’s good (in any of the senses above!) to assume there’s a simple ethical theory.
(For the record I think it’s useful to search for simple ethical theories even if they don’t exist, since you might discover interesting new ways of viewing ethics, even if these views aren’t complete).
Physical system P embeds computation C if and only if P has different behavior counterfactually on C taking on different values
I suspect this still runs into the same problem—in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.
If it turns out that no simple rule of this form works, I wouldn’t be too troubled, though; I’d be psychologically prepared to accept that there isn’t a clean quarkscomputations mapping. Similar to how I already accept that human value is complex, I could accept that human judgments of “does this physical system implement this computation” are complex (and thus can’t be captured in a simple rule)
This is an important question: if there exists no clean quarkscomputations mapping, is it (a) a relatively trivial problem, or (b) a really enormous problem? I’d say the answer to this depends on how we talk about computations. I.e., if we say “the ethically-relevant stuff happens at the computational level”—e.g., we shouldn’t compute certain strings—then I think it grows to be a large problem. This grows particularly large if we’re discussing how to optimize the universe! :)
I think these are all useful ways of viewing ethics, and I don’t feel the need to pick a single view (although I often find it appealing to look at what some views say about what other views are saying and resolving the contradictions between them). There are all kinds of reasons why it might be psychologically uncomfortable not to have a simple theory of ethics (e.g. it’s harder to know whether you’re being ethical, it’s harder to criticize others for being unethical, it’s harder for groups to coordinate around more complex and ambiguous ethical theories, you’ll never be able to “solve” ethics once and then never have to think about ethics again, it requires holding multiple contradictory views in your head at once, you won’t always have a satisfying verbal justification for why your actions are ethical). But none of this implies that it’s good (in any of the senses above!) to assume there’s a simple ethical theory.
Let me push back a little here- imagine we live in the early 1800s, and Faraday was attempting to formalize electromagnetism. We had plenty of intuitive rules of thumb for how electromagnetism worked, but no consistent, overarching theory. I’m sure lots of people shook their head and said things like, “these things are just God’s will, there’s no pattern to be found.” However, it turns out that there was something unifying to be found, and tolerance of inconsistencies & nebulosity would have been counter-productive.
Today, we have intuitive rules of thumb for how we think consciousness & ethics work, but similarly no consistent, overarching theory. Are consciousness & moral value like electromagnetism—things that we can discover knowledge about? Or are they like elan vital—reifications of clusters of phenomena that don’t always cluster cleanly?
I think the jury’s still out here, but the key with electromagnetism was that Faraday was able to generate novel, falsifiable predictions with his theory. I’m not claiming to be Faraday, but I think if we can generate novel, falsifiable predictions with work on consciousness & valence (I offer some in Section XI, and observations that could be adapted to make falsifiable predictions in Section XII), this should drive updates toward “there’s some undiscovered cache of predictive utility here, similar to what Faraday found with electromagnetism.”
I suspect this still runs into the same problem—in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.
It seems like you’re saying here that there won’t be clean rules for determining logical counterfactuals? I agree this might be the case but it doesn’t seem clear to me. Logical counterfactuals seem pretty confusing and there seems to be a lot of room for better theories about them.
This is an important question: if there exists no clean quarkscomputations mapping, is it (a) a relatively trivial problem, or (b) a really enormous problem? I’d say the answer to this depends on how we talk about computations. I.e., if we say “the ethically-relevant stuff happens at the computational level”—e.g., we shouldn’t compute certain strings—then I think it grows to be a large problem. This grows particularly large if we’re discussing how to optimize the universe! :)
I agree that it would a large problem. The total amount of effort to “complete” the project of figuring out which computations we care about would be practically infinite, but with a lot of effort we’d get better and better approximations over time, and we would be able to capture a lot of moral value this way.
Let me push back a little here
I mostly agree with your push back; I think when we have different useful views of the same thing that’s a good indication that there’s more intellectual progress to be made in resolving the contradictions between the different views (e.g. by finding a unifying theory).
I think we have a lot more theoretical progress to make on understanding consciousness and ethics. On priors I’d expect the theoretical progress to produce more-satisfying things over time without ever producing a complete answer to ethics. Though of course I could be wrong here; it seems like intuitions vary a lot. It seems more likely to me that we find a simple unifying theory for consciousness than ethics.
It seems like you’re saying here that there won’t be clean rules for determining logical counterfactuals? I agree this might be the case but it doesn’t seem clear to me. Logical counterfactuals seem pretty confusing and there seems to be a lot of room for better theories about them.
Right, and I would argue that logical counterfactuals (in the way we’ve mentioned them in this thread) will necessarily be intractably confusing, because they’re impossible to do cleanly. I say this because in the “P & C” example above, we need a frame-invariant way to interpret a change in C in terms of P. However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can’t have, even theoretically.
(Unless we define both physics and computation through something like constructor theory… at which point we’re not really talking about Turing machines as we know them—we’d be talking about physics by another name.)
This is a big part of the reason why I’m a big advocate of trying to define moral value in physical terms: if we start with physics, then we know our conclusions will ‘compile’ to physics. If instead we start with the notion that ‘some computations have more moral value than others’, we’re stuck with the problem—intractable problem, I argue—that we don’t have a frame-invariant way to precisely identify what computations are happening in any physical system (and likewise, which aren’t happening). I.e., statements about computations will never cleanly compile to physical terms. And whenever we have multiple incompatible interpretations, we necessarily get inconsistencies, and we can prove anything is true (i.e., we can prove any arbitrary physical system is superior to any other).
Does that argument make sense?
… that said, it would seem very valuable to make a survey of possible levels of abstraction at which one could attempt to define moral value, and their positives & negatives.
I think we have a lot more theoretical progress to make on understanding consciousness and ethics. On priors I’d expect the theoretical progress to produce more-satisfying things over time without ever producing a complete answer to ethics. Though of course I could be wrong here; it seems like intuitions vary a lot. It seems more likely to me that we find a simple unifying theory for consciousness than ethics.
However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can’t have, even theoretically.
I’m still not sure why you strongly think there’s _no_ principled way; it seems hard to prove a negative. I mentioned that we could make progress on logical counterfactuals; there’s also the approach Chalmers talks about here. (I buy that there’s reason to suspect there’s no principled way if you’re not impressed by any proposal so far).
And whenever we have multiple incompatible interpretations, we necessarily get inconsistencies, and we can prove anything is true (i.e., we can prove any arbitrary physical system is superior to any other).
I don’t think this follows. The universal prior is not objective; you can “prove” that any bit probably follows from a given sequence, by changing your reference machine. But I don’t think this is too problematic. We just accept that some things don’t have a super clean objective answer. The reference machines that make odd predictions (e.g. that 000000000 is probably followed by 1) look weird, although it’s hard to precisely say what’s weird about them without making reference to another reference machine. I don’t think this kind of non-objectivity implies any kind of inconsistency.
Similarly, even if objective approaches to computational interpretations fail, we could get a state where computational interpretations are non-objective (e.g. defined relative to a “reference machine”) and the reference machines that make very weird predictions (like the popcorn implementing a cat) would look super weird to humans. This doesn’t seem like a fatal flaw to me, for the same reason it’s not a fatal flaw in the case of the universal prior.
What you’re saying seems very reasonable; I don’t think we differ on any facts, but we do have some divergent intuitions on implications.
I suspect this question—whether it’s possible to offer a computational description of moral value that could cleanly ‘compile’ to physics—would have non-trivial yet also fairly modest implications for most of MIRI’s current work.
I would expect the significance of this question to go up over time, both in terms of direct work MIRI expects to do, and in terms of MIRI’s ability to strategically collaborate with other organizations. I.e., when things shift from “let’s build alignable AGI” to “let’s align the AGI”, it would be very good to have some of this metaphysical fog cleared away so that people could get on the same ethical page, and see that they are in fact on the same page.
some more object-level comments on PQ itself:
Suppose we have a Python program running on a computer. Truths about the Python program are, in some sense, reducible to physics; however the reduction itself requires resolving philosophical questions. I don’t know if this means the Python program’s functioning (e.g. values of different variables at different times) are “strongly emergent”; it doesn’t seem like an important question to me.
In the case of the Python program this seems clearly false (it’s consistent to view the system as a physical system without reference to the Python program). I expect this is also false in the case of consciousness. I think almost all computationalists would strongly reject downwards causation according to this definition. Do you know of any computationalists who actually advocate downwards causation (i.e. that you can’t predict future physical states by just looking at past physical states without thinking about the higher levels)?
IMO consciousness has power over physics the same way the Python program has power over physics; we can consider counterfactuals like “what if this variable in the Python program magically had a different value” and ask what would happen to physics if this happened (in this case, maybe the variable controls something displayed on a computer screen, so if the variable were changed then the computer screen would emit different light). Actually formalizing questions like “what would happen if this variable had a different value” requires a theory of logical counterfactuals (which MIRI is researching, see this paper).
Notably, Python programs usually don’t “make choices” such that “control” is all that meaningful, but humans do. Here I would say that humans implement a decision theory, while most Python programs do not (although some Python programs do implement a decision theory and can be meaningfully said to “make choices”). “Implementing a decision theory” means something like “evaluating multiple actions based on what their effects are expected to be, and choosing the one that scores best according to some metric”; some AI systems like reinforcement learners implement a decision theory.
(I’m writing this comment to express more “computationalism has a reasonable steelman that isn’t identified as a possible position in PQ” rather than “computationalism is clearly right”)
(more comments)
It seems like this means that empirical tests (e.g. neuroscience stuff) aren’t going to help test aspects of the theory that are about divergence between computational pseudo-qualia (the things people report on) and actual qualia. If I squint a lot I could see “anthropic evidence” being used to distinguish between pseudo-qualia and qualia, but it seems like nothing else would work.
I’m also not sure why we would expect pseudo-qualia to have any correlation with actual qualia? I guess you could make an anthropic argument (we’re viewing the world from the perspective of actual qualia, and our sensations seem to match the pseudo-qualia). That would give someone the suspicion that there’s some causal story for why they would be synchronized, without directly providing such a causal story.
(For the record I think anthropic reasoning is usually confused and should be replaced with decision-theoretic reasoning (e.g. see this discussion), but this seems like a topic for another day)
Yes, the epistemological challenges with distinguishing between ground-truth qualia and qualia reports are worrying. However, I don’t think they’re completely intractable, because there is a causal chain (from Appendix C):
Our brain’s physical microstates (perfectly correlated with qualia) --> The logical states of our brain’s self-model (systematically correlated with our brain’s physical microstates) --> Our reports about our qualia (systematically correlated with our brain’s model of its internal state)
.. but there could be substantial blindspots, especially in contexts where there was no adaptive benefit to having accurate systematic correlations.
Awesome, I do like your steelman. More thoughts later, but just wanted to share one notion before sleep:
With regard to computationalism, I think you’ve nailed it. Downward causation seems pretty obviously wrong (and I don’t know of any computationalists that personally endorse it).
Totally agreed, and I like this example.
Right- but I would go even further. Namely, given any non-trivial physical system, there exists multiple equally-valid interpretations of what’s going on at the computational level. The example I give in PQ is: let’s say I shake a bag of popcorn. With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings. Am I causing suffering by shaking that bag of popcorn?
And I think all computation is like this, if we look closely- there exists no frame-invariant way to map between computation and physical systems in a principled way… just useful mappings, and non-useful mappings (and ‘useful’ is very frame-dependent).
This introduces an inconsistency into computationalism, and has some weird implications: I suspect that, given any computational definition of moral value, there would be a way to prove any arbitrary physical system morally superior to any other arbitrary physical system. I.e., you could prove both that A>B, and B>A.
… I may be getting something wrong here. But it seems like the lack of a clean quarksbits mapping ultimately turns out to be a big deal, and is a big reason why I advocate not trying to define moral value in terms of Turing machines & bitstrings.
Instead, I tend to think of ethics as “how should we arrange the [quarks|negentropy] in our light-cone?”—ultimately we live in a world of quarks, so ethics is a question of quarks (or strings, or whatnot).
However! Perhaps this is just a failure of my imagination. What is ethics if not how to arrange our physical world? Or can you help me steelman computationalism against this inconsistency objection?
Thanks again for the comments. They’re both great and helpful.
Thanks for your comments too, I’m finding them helpful for understanding other possible positions on ethics.
OK, how about a rule like this:
(formalizing this rule would require a theory of logical counterfactuals; I’m not sure if I expect a fully general theory to exist but it seems plausible that one does)
I’m not asserting that this rule is correct but it doesn’t seem inconsistent. In particular it doesn’t seem like you could use it to prove A > B and B > A. And clearly your popcorn embeds neither a cat nor the suffering of five holocausts under this rule.
If it turns out that no simple rule of this form works, I wouldn’t be too troubled, though; I’d be psychologically prepared to accept that there isn’t a clean quarkscomputations mapping. Similar to how I already accept that human value is complex, I could accept that human judgments of “does this physical system implement this computation” are complex (and thus can’t be captured in a simple rule). I don’t think this would make me inconsistent, I think it would just make me more tolerant of nebulosity in ethics. At the moment it seems like clean mappings might exist and so it makes sense to search for them, though.
On the object level, it seems like it’s possible to think of painting as “how should we arrange the brush strokes on the canvas?”. But it seems hard to paint well while only thinking at the level of brush strokes (and not thinking about the higher levels, like objects). I expect ethics to be similar; at the very least if human ethics has an “aesthetic” component then it seems like designing a good light cone is at least as hard as making a good painting. Maybe this is a strawman of your position?
On the meta level, I would caution against this use of “ultimately”; see here and here (the articles are worded somewhat disagreeably but I mostly endorse the content). In some sense ethics is about quarks, but in other senses it’s about:
computations
aesthetics
the id, ego, and superego
deciding which side to take in a dispute
a conflict between what we want and what we want to appear to want
nurturing the part of us that cares about others
updateless decision theory
a mathematical fact about what we would want upon reflection
I think these are all useful ways of viewing ethics, and I don’t feel the need to pick a single view (although I often find it appealing to look at what some views say about what other views are saying and resolving the contradictions between them). There are all kinds of reasons why it might be psychologically uncomfortable not to have a simple theory of ethics (e.g. it’s harder to know whether you’re being ethical, it’s harder to criticize others for being unethical, it’s harder for groups to coordinate around more complex and ambiguous ethical theories, you’ll never be able to “solve” ethics once and then never have to think about ethics again, it requires holding multiple contradictory views in your head at once, you won’t always have a satisfying verbal justification for why your actions are ethical). But none of this implies that it’s good (in any of the senses above!) to assume there’s a simple ethical theory.
(For the record I think it’s useful to search for simple ethical theories even if they don’t exist, since you might discover interesting new ways of viewing ethics, even if these views aren’t complete).
I suspect this still runs into the same problem—in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.
This is an important question: if there exists no clean quarkscomputations mapping, is it (a) a relatively trivial problem, or (b) a really enormous problem? I’d say the answer to this depends on how we talk about computations. I.e., if we say “the ethically-relevant stuff happens at the computational level”—e.g., we shouldn’t compute certain strings—then I think it grows to be a large problem. This grows particularly large if we’re discussing how to optimize the universe! :)
Let me push back a little here- imagine we live in the early 1800s, and Faraday was attempting to formalize electromagnetism. We had plenty of intuitive rules of thumb for how electromagnetism worked, but no consistent, overarching theory. I’m sure lots of people shook their head and said things like, “these things are just God’s will, there’s no pattern to be found.” However, it turns out that there was something unifying to be found, and tolerance of inconsistencies & nebulosity would have been counter-productive.
Today, we have intuitive rules of thumb for how we think consciousness & ethics work, but similarly no consistent, overarching theory. Are consciousness & moral value like electromagnetism—things that we can discover knowledge about? Or are they like elan vital—reifications of clusters of phenomena that don’t always cluster cleanly?
I think the jury’s still out here, but the key with electromagnetism was that Faraday was able to generate novel, falsifiable predictions with his theory. I’m not claiming to be Faraday, but I think if we can generate novel, falsifiable predictions with work on consciousness & valence (I offer some in Section XI, and observations that could be adapted to make falsifiable predictions in Section XII), this should drive updates toward “there’s some undiscovered cache of predictive utility here, similar to what Faraday found with electromagnetism.”
It seems like you’re saying here that there won’t be clean rules for determining logical counterfactuals? I agree this might be the case but it doesn’t seem clear to me. Logical counterfactuals seem pretty confusing and there seems to be a lot of room for better theories about them.
I agree that it would a large problem. The total amount of effort to “complete” the project of figuring out which computations we care about would be practically infinite, but with a lot of effort we’d get better and better approximations over time, and we would be able to capture a lot of moral value this way.
I mostly agree with your push back; I think when we have different useful views of the same thing that’s a good indication that there’s more intellectual progress to be made in resolving the contradictions between the different views (e.g. by finding a unifying theory).
I think we have a lot more theoretical progress to make on understanding consciousness and ethics. On priors I’d expect the theoretical progress to produce more-satisfying things over time without ever producing a complete answer to ethics. Though of course I could be wrong here; it seems like intuitions vary a lot. It seems more likely to me that we find a simple unifying theory for consciousness than ethics.
Right, and I would argue that logical counterfactuals (in the way we’ve mentioned them in this thread) will necessarily be intractably confusing, because they’re impossible to do cleanly. I say this because in the “P & C” example above, we need a frame-invariant way to interpret a change in C in terms of P. However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can’t have, even theoretically.
(Unless we define both physics and computation through something like constructor theory… at which point we’re not really talking about Turing machines as we know them—we’d be talking about physics by another name.)
This is a big part of the reason why I’m a big advocate of trying to define moral value in physical terms: if we start with physics, then we know our conclusions will ‘compile’ to physics. If instead we start with the notion that ‘some computations have more moral value than others’, we’re stuck with the problem—intractable problem, I argue—that we don’t have a frame-invariant way to precisely identify what computations are happening in any physical system (and likewise, which aren’t happening). I.e., statements about computations will never cleanly compile to physical terms. And whenever we have multiple incompatible interpretations, we necessarily get inconsistencies, and we can prove anything is true (i.e., we can prove any arbitrary physical system is superior to any other).
Does that argument make sense?
… that said, it would seem very valuable to make a survey of possible levels of abstraction at which one could attempt to define moral value, and their positives & negatives.
Totally agreed!
I’m still not sure why you strongly think there’s _no_ principled way; it seems hard to prove a negative. I mentioned that we could make progress on logical counterfactuals; there’s also the approach Chalmers talks about here. (I buy that there’s reason to suspect there’s no principled way if you’re not impressed by any proposal so far).
I don’t think this follows. The universal prior is not objective; you can “prove” that any bit probably follows from a given sequence, by changing your reference machine. But I don’t think this is too problematic. We just accept that some things don’t have a super clean objective answer. The reference machines that make odd predictions (e.g. that 000000000 is probably followed by 1) look weird, although it’s hard to precisely say what’s weird about them without making reference to another reference machine. I don’t think this kind of non-objectivity implies any kind of inconsistency.
Similarly, even if objective approaches to computational interpretations fail, we could get a state where computational interpretations are non-objective (e.g. defined relative to a “reference machine”) and the reference machines that make very weird predictions (like the popcorn implementing a cat) would look super weird to humans. This doesn’t seem like a fatal flaw to me, for the same reason it’s not a fatal flaw in the case of the universal prior.
What you’re saying seems very reasonable; I don’t think we differ on any facts, but we do have some divergent intuitions on implications.
I suspect this question—whether it’s possible to offer a computational description of moral value that could cleanly ‘compile’ to physics—would have non-trivial yet also fairly modest implications for most of MIRI’s current work.
I would expect the significance of this question to go up over time, both in terms of direct work MIRI expects to do, and in terms of MIRI’s ability to strategically collaborate with other organizations. I.e., when things shift from “let’s build alignable AGI” to “let’s align the AGI”, it would be very good to have some of this metaphysical fog cleared away so that people could get on the same ethical page, and see that they are in fact on the same page.