Why did you take the mean $/QALY instead of mean QALY/$ (which expected value analysis would suggest)? When I do that I get $5000/QALY as the mean.
Jessica_Taylor
I agree that:
clarifying “what should people who gain a huge amount of power through AI do with Earth, existing social structuers, and the universe?” seems like a good question to get agreement on for coordination reasons
we should be looking for tractable ways of answering this question
I think:
a) consciousness research will fail to clarify ethics enough to answer enough of (1) to achieve coordination (since I think human preferences on the relevant timescales are way more complicated than consciousness, conditioned on consciousness being simple).
b) it is tractable to answer (1) without reaching agreement on object-level values, by doing something like designing a temporary global government structure that most people agree is pretty good (in that it will allow society to reflect appropriately and determine the next global government structure), but that this question hasn’t been answered well yet and that a better answer would improve coordination. E.g. perhaps society is run as a global federalist democratic-ish structure with centralized control of potentially destructive technology (taking into account “how voters would judge something if they thought longer” rather than “how voters actually judge something”; this might be possible if the AI alignment problem is solved). It seems quite possible to create proposals of this form and critique them.It seems like we disagree about (a) and this disagreement has been partially hashed out elsewhere, and that it’s not clear we have a strong disagreement about (b).
However, we can only have such a frame-invariant way if there exists a clean mapping (injection, surjection, bijection, etc) between P&C- which I think we can’t have, even theoretically.
I’m still not sure why you strongly think there’s _no_ principled way; it seems hard to prove a negative. I mentioned that we could make progress on logical counterfactuals; there’s also the approach Chalmers talks about here. (I buy that there’s reason to suspect there’s no principled way if you’re not impressed by any proposal so far).
And whenever we have multiple incompatible interpretations, we necessarily get inconsistencies, and we can prove anything is true (i.e., we can prove any arbitrary physical system is superior to any other).
I don’t think this follows. The universal prior is not objective; you can “prove” that any bit probably follows from a given sequence, by changing your reference machine. But I don’t think this is too problematic. We just accept that some things don’t have a super clean objective answer. The reference machines that make odd predictions (e.g. that 000000000 is probably followed by 1) look weird, although it’s hard to precisely say what’s weird about them without making reference to another reference machine. I don’t think this kind of non-objectivity implies any kind of inconsistency.
Similarly, even if objective approaches to computational interpretations fail, we could get a state where computational interpretations are non-objective (e.g. defined relative to a “reference machine”) and the reference machines that make very weird predictions (like the popcorn implementing a cat) would look super weird to humans. This doesn’t seem like a fatal flaw to me, for the same reason it’s not a fatal flaw in the case of the universal prior.
I suspect this still runs into the same problem—in the case of the computational-physical mapping, even if we assert that C has changed, we can merely choose a different interpretation of P which is consistent with the change, without actually changing P.
It seems like you’re saying here that there won’t be clean rules for determining logical counterfactuals? I agree this might be the case but it doesn’t seem clear to me. Logical counterfactuals seem pretty confusing and there seems to be a lot of room for better theories about them.
This is an important question: if there exists no clean quarkscomputations mapping, is it (a) a relatively trivial problem, or (b) a really enormous problem? I’d say the answer to this depends on how we talk about computations. I.e., if we say “the ethically-relevant stuff happens at the computational level”—e.g., we shouldn’t compute certain strings—then I think it grows to be a large problem. This grows particularly large if we’re discussing how to optimize the universe! :)
I agree that it would a large problem. The total amount of effort to “complete” the project of figuring out which computations we care about would be practically infinite, but with a lot of effort we’d get better and better approximations over time, and we would be able to capture a lot of moral value this way.
Let me push back a little here
I mostly agree with your push back; I think when we have different useful views of the same thing that’s a good indication that there’s more intellectual progress to be made in resolving the contradictions between the different views (e.g. by finding a unifying theory).
I think we have a lot more theoretical progress to make on understanding consciousness and ethics. On priors I’d expect the theoretical progress to produce more-satisfying things over time without ever producing a complete answer to ethics. Though of course I could be wrong here; it seems like intuitions vary a lot. It seems more likely to me that we find a simple unifying theory for consciousness than ethics.
Thanks for your comments too, I’m finding them helpful for understanding other possible positions on ethics.
With the right mapping, we could argue that we could treat that physical system as simulating the brain of a sleepy cat. However, given another mapping, we could treat that physical system as simulating the suffering of five holocausts. Very worryingly, we have no principled way to choose between these interpretive mappings.
OK, how about a rule like this:
Physical system P embeds computation C if and only if P has different behavior counterfactually on C taking on different values
(formalizing this rule would require a theory of logical counterfactuals; I’m not sure if I expect a fully general theory to exist but it seems plausible that one does)
I’m not asserting that this rule is correct but it doesn’t seem inconsistent. In particular it doesn’t seem like you could use it to prove A > B and B > A. And clearly your popcorn embeds neither a cat nor the suffering of five holocausts under this rule.
If it turns out that no simple rule of this form works, I wouldn’t be too troubled, though; I’d be psychologically prepared to accept that there isn’t a clean quarkscomputations mapping. Similar to how I already accept that human value is complex, I could accept that human judgments of “does this physical system implement this computation” are complex (and thus can’t be captured in a simple rule). I don’t think this would make me inconsistent, I think it would just make me more tolerant of nebulosity in ethics. At the moment it seems like clean mappings might exist and so it makes sense to search for them, though.
Instead, I tend to think of ethics as “how should we arrange the [quarks|negentropy] in our light-cone?”—ultimately we live in a world of quarks, so ethics is a question of quarks (or strings, or whatnot).
On the object level, it seems like it’s possible to think of painting as “how should we arrange the brush strokes on the canvas?”. But it seems hard to paint well while only thinking at the level of brush strokes (and not thinking about the higher levels, like objects). I expect ethics to be similar; at the very least if human ethics has an “aesthetic” component then it seems like designing a good light cone is at least as hard as making a good painting. Maybe this is a strawman of your position?
On the meta level, I would caution against this use of “ultimately”; see here and here (the articles are worded somewhat disagreeably but I mostly endorse the content). In some sense ethics is about quarks, but in other senses it’s about:
computations
a conflict between what we want and what we want to appear to want
a mathematical fact about what we would want upon reflection
I think these are all useful ways of viewing ethics, and I don’t feel the need to pick a single view (although I often find it appealing to look at what some views say about what other views are saying and resolving the contradictions between them). There are all kinds of reasons why it might be psychologically uncomfortable not to have a simple theory of ethics (e.g. it’s harder to know whether you’re being ethical, it’s harder to criticize others for being unethical, it’s harder for groups to coordinate around more complex and ambiguous ethical theories, you’ll never be able to “solve” ethics once and then never have to think about ethics again, it requires holding multiple contradictory views in your head at once, you won’t always have a satisfying verbal justification for why your actions are ethical). But none of this implies that it’s good (in any of the senses above!) to assume there’s a simple ethical theory.
(For the record I think it’s useful to search for simple ethical theories even if they don’t exist, since you might discover interesting new ways of viewing ethics, even if these views aren’t complete).
(more comments)
Thus, we would need to be open to the possibility that certain interventions could cause a change in a system’s physical substrate (which generates its qualia) without causing a change in its computational level (which generates its qualia reports)
It seems like this means that empirical tests (e.g. neuroscience stuff) aren’t going to help test aspects of the theory that are about divergence between computational pseudo-qualia (the things people report on) and actual qualia. If I squint a lot I could see “anthropic evidence” being used to distinguish between pseudo-qualia and qualia, but it seems like nothing else would work.
I’m also not sure why we would expect pseudo-qualia to have any correlation with actual qualia? I guess you could make an anthropic argument (we’re viewing the world from the perspective of actual qualia, and our sensations seem to match the pseudo-qualia). That would give someone the suspicion that there’s some causal story for why they would be synchronized, without directly providing such a causal story.
(For the record I think anthropic reasoning is usually confused and should be replaced with decision-theoretic reasoning (e.g. see this discussion), but this seems like a topic for another day)
some more object-level comments on PQ itself:
We can say that a high-level phenomenon is strongly emergent with respect to a low-level domain when the high-level phenomenon arises from the low-level domain, but truths concerning that phenomenon are not deducible even in principle from truths in the low-level domain.
Suppose we have a Python program running on a computer. Truths about the Python program are, in some sense, reducible to physics; however the reduction itself requires resolving philosophical questions. I don’t know if this means the Python program’s functioning (e.g. values of different variables at different times) are “strongly emergent”; it doesn’t seem like an important question to me.
Downward causation means that higher-level phenomena are not only irreducible but also exert a causal efficacy of some sort. … [This implies] low-level laws will be incomplete as a guide to both the low-level and the high-level evolution of processes in the world.
In the case of the Python program this seems clearly false (it’s consistent to view the system as a physical system without reference to the Python program). I expect this is also false in the case of consciousness. I think almost all computationalists would strongly reject downwards causation according to this definition. Do you know of any computationalists who actually advocate downwards causation (i.e. that you can’t predict future physical states by just looking at past physical states without thinking about the higher levels)?
IMO consciousness has power over physics the same way the Python program has power over physics; we can consider counterfactuals like “what if this variable in the Python program magically had a different value” and ask what would happen to physics if this happened (in this case, maybe the variable controls something displayed on a computer screen, so if the variable were changed then the computer screen would emit different light). Actually formalizing questions like “what would happen if this variable had a different value” requires a theory of logical counterfactuals (which MIRI is researching, see this paper).
Notably, Python programs usually don’t “make choices” such that “control” is all that meaningful, but humans do. Here I would say that humans implement a decision theory, while most Python programs do not (although some Python programs do implement a decision theory and can be meaningfully said to “make choices”). “Implementing a decision theory” means something like “evaluating multiple actions based on what their effects are expected to be, and choosing the one that scores best according to some metric”; some AI systems like reinforcement learners implement a decision theory.
(I’m writing this comment to express more “computationalism has a reasonable steelman that isn’t identified as a possible position in PQ” rather than “computationalism is clearly right”)
Thanks for the response; I’ve found this discussion useful for clarifying and updating my views.
However, when we start talking about mind simulations and ‘thought crime’, WBE, selfish replicators, and other sorts of tradeoffs where there might be unknown unknowns with respect to moral value, it seems clear to me that these issues will rapidly become much more pressing. So, I absolutely believe work on these topics is important, and quite possibly a matter of survival. (And I think it’s tractable, based on work already done.)
Suppose we live under the wrong moral theory for 100 years. Then we figure out the right moral theory, and live according to that one for the rest of time. How much value is lost in that 100 years? It seems very high but not an x-risk. It seems like we only get x-risks if somehow we don’t put a trusted reflection process (e.g. human moral philosophers) in control of the far future.
It seems quite sensible for people who don’t put overwhelming importance on the far future to care about resolving moral uncertainty earlier. The part of my morality that isn’t exclusively concerned with the far future strongly approves of things like consciousness research that resolve moral uncertainty earlier.
Based on my understanding, I don’t think Act-based agents or Task AI would help resolve these questions by default, although as tools they could probably help.
Act-based agents and task AGI kick the problem of global governance to humans. Humans still need to decide questions like how to run governments; they’ll be able to use AGI to help them, but governing well is still a hard problem even with AI assistance. The goal would be that moral errors are temporary; with the right global government structure, moral philosophers will be able to make moral progress and have their moral updates reflected in how things play out.
It’s possible that you think that governing the world well enough that the future eventually reflects human values is very hard even with AGI assistance, and would be made easier with better moral theories made available early on.
One factor that bears mentioning is whether an AGI’s ontology & theory of ethics might be path-dependent upon its creators’ metaphysics in such a way that it would be difficult for it to update if it’s wrong. If this is a plausible concern, this would imply a time-sensitive factor in resolving the philosophical confusion around consciousness, valence, moral value, etc.
I agree with this but place low probability on the antecedent. It’s kind of hard to explain briefly; I’ll point to this comment thread for a good discussion (I mostly agree with Paul).
But now that I think about it more, I don’t put super low probability on the antecedent. It seems like it would be useful to have some way to compare different universes that we’ve failed to put in control of trusted reflection processes, to e.g. get ones that have less horrific suffering or more happiness. I place high probability on “distinguishing between such universes is as hard as solving the AI alignment problem in general”, but I’m not extremely confident of that and I don’t have a super precise argument for it. I guess I wouldn’t personally prioritize such research compared to generic AI safety research but it doesn’t seem totally implausible that resolving moral uncertainty earlier would reduce x-risk for this reason.
I expect:
We would lose a great deal of value by optimizing the universe according to current moral uncertainty, without the opportunity to reflect and become less uncertain over time.
There’s a great deal of reflection necessary to figure out what actions moral theory X recommends, e.g. to figure out which minds exist or what implicit promises people have made to each other. I don’t see this reflection as distinct from reflection about moral uncertainty; if we’re going to defer to a reflection process anyway for making decisions, we might as well let that reflection process decide on issues of moral theory.
Some thoughts:
IMO the most plausible non-CEV proposals are
Act-based agents, which defer to humans to a large extent. The goal is to keep humans in control of the future.
Task AI, which is used to accomplish concrete objectives in the world. The idea would be to use this to accomplish goals people would want accomplished using AI (including reducing existential risk), while leaving the future moral trajectory in the hands of humans.
Both proposals end up deferring to humans to decide the long-run trajectory of humanity. IMO, this isn’t a coincidence; I don’t think it’s likely that we get a good outcome without deferring to humans in the long run.
Some more specific comments:
If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI
There’s one story where this makes a little bit of sense, where we basically give up on satisfying any human values other than hedonic values, and build an AI that maximizes pleasure without satisfying any other human values. I’m skeptical that this is any easier than solving the full value alignment problem, but even if it were, I think this would be undesirable to the vast majority of humans, and so we would collectively be better off coordinating around a higher target.
If we’re shooting for a higher target, then we have some story for why we get more values than just hedonic values. E.g. the AI defers to human moral philosophers on some issues. But this method should also succeed for loading hedonic values. So there isn’t a significant benefit to having hedonic values specified ahead of time.
Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness.
This seems to be in the same reference class as asking questions like “how many humans exist” or “what’s the closing price of the Dow Jones”. I.e. you can use it to check if things are going as expected, though the metric can be manipulated. Personally I’m pessimistic about such sanity checks in general, and even if I were optimistic about them, I would think that the marginal value of one additional sanity check is low.
There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering.
See Eliezer’s thoughts on mindcrime. Also see the discussion in the comments. It does seem like consciousness research could help for defining a nonpersonhood predicate.
I don’t have comments on cognitive enhancement since it’s not my specialty.
Some of the points (6,7,8) seem most relevant if we expect AGI to be designed to use internal reinforcement substantially similar to humans’ internal reinforcement and substantially different from modern reinforcement learning. I don’t have precise enough models of such AGI systems that I feel optimistic about doing research related to such AGIs, but if you think questions like “how would we incentivize neuromorphic AI systems to do what we want” are tractable then maybe it makes sense for you to do research on this question. I’m pessimistic about things in the reference class of IIT making any progress on this question, but maybe you have different models here.
I agree that “Valence research could change the social and political landscape AGI research occurs in” and, like you, I think the sign is unclear.
(I am a MIRI research fellow but am currently speaking for myself not my employer).
- 12 Dec 2018 13:41 UTC; 5 points) 's comment on Why I’m focusing on invertebrate sentience by (
I share Open Phil’s view on the probability of transformative AI in the next 20 years. The relevant signposts would be answers to questions like “how are current algorithms doing on tasks requiring various capabilities”, “how much did this performance depend on task-specific tweaking on the part of programmers”, “how much is performance projected to improve due to increasing hardware”, and “do many credible AI researchers think that we are close to transformative AI”.
In designing the new ML-focused agenda, we imagined a concrete hypothetical (which isn’t stated explicitly in the paper): what research would we do if we knew we’d have sufficient technology for AGI in about 20 years, and this technology would be qualitatively similar to modern ML technology such as deep learning? So we definitely intend for this research agenda to be relevant to the scenario you describe, and the agenda document goes into more details. Much of this research deals with task-directed AGI, which can be limited (e.g. not self-improving).
I’ll start by stating that, while I have some intuitions about how the paper will be received, I don’t have much experience making crisp forecasts, and so I might be miscalibrated. That said:
In my experience it’s pretty common for ML researchers who are more interested in theory and general intelligence to find Solomonoff induction and AIXI to be useful theories. I think “Logical Induction” will be generally well-received among such people. Let’s say 70% chance that at least 40% of ML researchers who think AIXI is a useful theory, and who spend a couple hours thinking about “Logical Induction” (reading the paper / talking to people about it), will think that “Logical Induction” is at least 1⁄3 as interesting/useful as AIXI. I think ML researchers who don’t find Solomonoff induction relevant to their interests probably won’t find “Logical Induction” compelling either. This forecast is based on my personal experience of really liking Solomonoff induction and AIXI (long before knowing about MIRI) but finding theoretical gaps in them, many of which “Logical Induction” resolves nicely, and from various conversations with ML researchers who like Solomonoff induction and AIXI.
I have less-strong intuitions about mathematicians but more empirical data. “Logical Induction” has been quite well-received by Scott Aaronson, and I think the discussion at n-Category Cafe indicates that mathematicians find this paper and the overall topic interesting. I am quite uncertain about the numbers, but I expect something like 50% of mathematicians who are interested in Bayesianism and Gödel’s incompleteness theorem to think it’s quite an interesting result after thinking about it for a couple hours.
(these predictions might seem timid; I am adjusting for the low base rates of people finding things really interesting)
I agree with Nate that there isn’t much public on this yet. The AAMLS agenda is predicated on a relatively pessimistic scenario: perhaps we won’t have much time before AGI (and therefore not much time for alignment research), and the technology AI systems are based on won’t be much more principled than modern-day deep learning systems. I’m somewhat optimistic that it’s possible to achieve good outcomes in some pessimistic scenarios like this one.
I think that the ML-related topics we spend the most effort on (such as those in the ML agenda) are currently quite neglected. See my other comment for more on how our research approach is different from that of most AI researchers.
It’s still plausible that some of the ML-related topics we research would be researched anyway (perhaps significantly later). This is a legitimate consideration that is, in my view, outweighed by other considerations (such as the fact that less total safety research will be done if AGI comes soon, making such timelines more neglected; the fact that ML systems are easy to think about due to their concreteness; and the fact that it can be beneficial to “seed” the field with high-quality research that others can build on in the future).
Additionally I think that AI alignment researchers should avoid ignoring huge theoretically-relevant parts of the problem. I would have quite a lot of difficulty thinking about AI alignment without thinking about how one might train learning systems to do good things using feedback. One of my goals with the ML agenda is to build theoretical tools that make it possible to think about the rest of the problem more clearly.
The ideal MIRI researcher is someone who’s able to think about thorny philosophical problems and break off parts of them to formalize mathematically. In the case of logical uncertainty, researchers started by thinking about the initially vague problem of reasoning well about uncertain mathematical statements, turned some of these thoughts into formal desiderata and algorithms (producing intermediate possibility and impossibility results), and eventually found a way to satisfy many of these desiderata at once. We’d like to do a lot more of this kind of work in the future.
Probably the main difference between MIRI research and typical AI research is that we focus on problems of the form “if we had capability X, how would we achieve outcome Y?” rather than “how can we build a practical system achieving outcome Y?”. We focus less on computational tractability and more on the philosophical question of how we would build a system to achieve Y in principle, given e.g. unlimited computing resources or access to extremely powerful machine learning systems. I don’t think we have much special knowledge that others don’t have (or vice versa), given that most relevant AI research is public; it’s more that we have a different research focus that will lead us to ask different questions. Of course, our different research focus is motivated by our philosophy about AI, and we have significant philosophical differences with most AI researchers (which isn’t actually saying much given how much philosophical diversity there is in the field of AI).
Work in the field of AI can inform us about what approaches are most promising (e.g., the theoretical questions in the “Alignment for Advanced Machine Learning Systems” agenda are of more interest if variants of deep learning are sufficient to achieve AGI), and can directly provide useful theoretical tools (e.g., in the field of statistical learning theory). Typically, we will want to get a high-level view of what the field is doing and otherwise focus mainly on the more theoretical work relevant to our research interests.
We definitely need some way of dealing with the fact that we don’t know which AI paradigm(s) will be the foundation of the first AGI systems. One strategy is to come up with abstractions that work across AI paradigms; we can ask the question “if we had access to extremely powerful reinforcement learning systems, how would we use them to safely achieve some concrete objective in the world?” without knowing how these reinforcement learning systems work internally. A second strategy is to prioritize work related to types of AI systems that seem more promising (deep learning seems more promising than symbolic GOFAI at the moment, for example). A third strategy is to do what people sometimes do when coming up with new AI paradigms: think about how good reasoning works, formalize some of these aspects, and design algorithms performing good reasoning according to these desiderata. In thinking about AI alignment, we apply all three of these strategies.
- 12 Oct 2016 19:57 UTC; 7 points) 's comment on Ask MIRI Anything (AMA) by (
Here’s a third resolution. Consider a utility function that is a weighted sum of:
how close a region’s population level is to the “ideal” population level for that region (i.e. not underpopulated or overpopulated)
average utility of individuals in this region (not observer-moments in this region)
AMF is replacing lots of lives that are short (therefore low-utility) with fewer lives that are long (therefore higher utility), without affecting population level much. The effect of this could be summarized as “35 DALYs”, as in “we increased the average lifespan by 35 DALYs / total population”.
(warning: made-up numbers follow). Suppose we make someone live for 40 years instead of 5 years by curing malaria. This reduces the fertility rate; let’s say one fewer 35-year life happens as a result. This has no effect on the average population level (part 1). We replaced a 5-year life plus a 35-year life with a 40-year life. If average lives in the region are 35 years long (and we’re pretending that life utility = length of life), then most of the effect on part 2 of the utility function comes from preventing a worse-than-average life from happening.
Suppose instead that we extend someone’s life from 40 years to 75 years (a gain of 35 DALYs). This reduces the fertility rate; let’s pretend that this prevents a 35-year life from happening. So we’re replacing a 40-year life plus a 35-year life with a 75-year life. From the perspective of part 2 of the utility function, this is exactly as good as curing a case of malaria. So it seems like you can measure life-extending measures in DALYs pretty naively and things work out (both 35-DALY improvements are equally good under the utility function).
Another modeling issue is that each individual variable is log-normal rather than normal/uniform. This means that while probability of success is “0.01 to 0.1”, suggesting 5.5% as the “average”, the actual computed average is 4%. This doesn’t make a big difference on its own but it’s important when multiplying together lots of numbers. I’m not sure that converting log-normal to uniform would in general lead to better estimates but it’s important to flag.