bgarfinkel comments on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

bgarfinkel Nov 23, 2019, 1:17 PM
6 points
0 ∶ 0
I think there are basically three options:
- Decision theory isn’t normative.
- Decision theory is normative in the way that “murder is bad” or “improving aggregate welfare is good” is normative, i.e., it expresses an arbitrary terminal value of human beings.
- Decision theory is normative in the way that game theory, probability theory, Boolean logic, the scientific method, etc. are normative (at least for beings that want accurate beliefs); or in the way that the rules and strategies of chess are normative (at least for beings that want to win at chess); or in the way that medical recommendations are normative (at least for beings that want to stay healthy).
[[Disclaimer: I’m not sure this will be useful, since it seems like most of discussions that verge on meta-ethics end up with neither side properly understanding the other.]]

I think the kind of decision theory that philosophers tend to work on is typically explicitly described as “normative.” (For example, the SEP article on decision theory is about “normative decision theory.”) So when I’m talking about “academic decision theories” or “proposed criteria of rightness” I’m talking about normative theories. When I use the word “rational” I’m also referring to a normative property.

I don’t think there’s any very standard definition of what it means for something to be normative, maybe because it’s often treated as something pretty close to a primitive concept, but a partial account is that a “normative theory” is a claim about what someone should do. At least this is what I have in mind. This is different from the second option you list (and I think the third one).

Some normative theories concern “ends.” These are basically claims about what people should do, if they can freely choose outcomes. For example: A subjectivist theory might say that people should maximize the fulfillment of their own personal preferences (whatever they are). Whereas a hedonistic utilitarian theory might say that people should should maximize total happiness. I’m not sure what the best terminology is, and think this choice is probably relatively non-standard, but let’s label these “moral theories.”

Some normative theories, including “decision theories,” concern “means.” These theories put aside the question of which ends people should pursue and instead focus on how people should respond to uncertainty about the results/implications of their actions. For example: Expected utility theory says that people should take whatever actions maximize expected fulfillment of the relevant ends. Risk-weighted expected utility theory (and other alternative theories) say different things. Typical versions of CDT and EDT flesh out expected utility theory in different ways to specify what the relevant measure of “expected fulfillment” is.

Moral theory and normative decision theory seem to me to have pretty much the same status. They are both bodies of theory that bear on what people should do. On some views, the division between them is more a matter of analytic convenience than anything else. For example, David Enoch, a prominent meta-ethicist, writes: “In fact, I think that for most purposes [the line between the moral and the non-moral] is not a line worth worrying about. The distinction within the normative between the moral and the non-moral seems to me to be shallow compared to the distinction between the normative and the non-normative” (Taking Morality Seriously, 86).

One way to think of moral theories and normative decision theories is as two components that fit together to form more fully specified theories about what people should do. Moral theories describe the ends people should pursue; given these ends, decision theories then describe what actions people should take when in states of uncertainty. To illustrate, two examples of more complete normative theories that combine moral and decision-theoretic components would be: “You should take whatever action would in expectation cause the largest increase in the fulfillment of your preferences” and “You should take whatever action would, if you took it, lead you to anticipate the largest expected amount of future happiness in the world.” The first is subjectivism combined with CDT, while the second is total view hedonistic utilitarianism combined with EDT.

(On this conception, a moral theory is not a description of “an arbitrary terminal value of human beings.” Decision theory here also is not “the study of which decision-making methods humans happen to terminally prefer to employ.” These are both theories are about what people should do, rather than theories about about what people’s preferences are.)

Normativity is obviously pretty often regarded as a spooky or insufficiently explained thing. So a plausible position is normative anti-realism: It might be the case that no normative claims are true, either because they’re all false or because they’re not even well-formed enough to take on truth values. If normative anti-realism is true, then one thing this means is that the philosophical decision theory community is mostly focused on a question that doesn’t really have an answer.

In the twin prisoner’s dilemma with son-of-CDT, both agents are following son-of-CDT and neither is following CDT (regardless of whether the fork happened before or after the switchover to son-of-CDT).

If I’m someone with a twin and I’m implementing P_CDT, I still don’t think I will choose to modify myself to cooperate in twin prisoner’s dilemmas. The reason is that modifying myself won’t cause my twin to cooperate; it will only cause me to cooperate, lowering the utility I receive.

(The fact P_CDT agents won’t modify themselves to cooperate with their twins could of course be interpretted as a mark against R_CDT.)
- RobBensinger Nov 26, 2019, 7:52 AM
  5 points
  0 ∶ 0
  Parent
  I appreciate you taking the time to lay out these background points, and it does help me better understand your position, Ben; thanks!
  If normative anti-realism is true, then one thing this means is that the philosophical decision theory community is mostly focused on a question that doesn’t really have an answer.
  Some ancient Greeks thought that the planets were intelligent beings; yet many of the Greeks’ astronomical observations, and some of their theories and predictive tools, were still true and useful.
  I think that terms like “normative” and “rational” are underdefined, so the question of realism about them is underdefined (cf. Luke Muehlhauser’s pluralistic moral reductionism).
  I would say that (1) some philosophers use “rational” in a very human-centric way, which is fine as long as it’s done consistently; (2) others have a much more thin conception of “rational”, such as ‘tending to maximize utility’; and (3) still others want to have their cake and eat it too, building in a lot of human-value-specific content to their notion of “rationality”, but then treating this conception as though it had the same level of simplicity, naturalness, and objectivity as 2.
  I think that type-1, type-2, and type-3 decision theorists have all contributed valuable AI-relevant conceptual progress in the past (most obviously, by formulating Newcomb’s problem, EDT, and CDT), and I think all three could do more of the same in the future. I think the type-3 decision theorists are making a mistake, but often more in the fashion of an ancient astronomer who’s accumulating useful and real knowledge but happens to have some false side-beliefs about the object of study, not in the fashion of a theologian whose entire object of study is illusory. (And not in the fashion of a developmental psychologist or historian whose field of subject is too human-centric to directly bear on game theory, AI, etc.)
  I’d expect type-2 decision theorists to tend to be interested in more AI-relevant things than type-1 decision theorists, but on the whole I think the flavor of decision theory as a field has ended up being more type-2/3 than type-1. (And in this case, even type-1 analyses of “rationality” can be helpful for bringing various widespread background assumptions to light.)
  If I’m someone with a twin and I’m implementing P_CDT, I still don’t think I will choose to modify myself to cooperate in twin prisoner’s dilemmas. The reason is that modifying myself won’t cause my twin to cooperate; it will only cause me to cooperate, lowering the utility I receive.
  This is true if your twin was copied from you in the past. If your twin will be copied from you in the future, however, then you can indeed cause your twin to cooperate, assuming you have the ability to modify your own future decision-making so as to follow son-of-CDT’s prescriptions from now on.
  Making the commitment to always follow son-of-CDT is an action you can take; the mechanistic causal consequence of this action is that your future brain and any physical systems that are made into copies of your brain in the future will behave in certain systematic ways. So from your present perspective (as a CDT agent), you can causally control future copies of yourself, as long as the act of copying hasn’t happened yet.
  (And yes, by the time you actually end up in the prisoner’s dilemma, your future self will no longer be able to causally affect your copy. But this is irrelevant from the perspective of present-you; to follow CDT’s prescriptions, present-you just needs to pick the action that you currently judge will have the best consequences, even if that means binding your future self to take actions contrary to CDT’s future prescriptions.)
  (If it helps, don’t think of the copy of you as “you”: just think of it as another environmental process you can influence. CDT prescribes taking actions that change the behavior of future copies of yourself in useful ways, for the same reason CDT prescribes actions that change the future course of other physical processes.)
  - bgarfinkel Nov 27, 2019, 2:07 AM
    5 points
    0 ∶ 0
    Parent
    
    I appreciate you taking the time to lay out these background points, and it does help me better understand your position, Ben; thanks!
    
    Thank you for taking the time to respond as well! :)
    
    I think that terms like “normative” and “rational” are underdefined, so the question of realism about them is underdefined (cf. Luke Muehlhauser’s pluralistic moral reductionism).
    
    I would say that (1) some philosophers use “rational” in a very human-centric way, which is fine as long as it’s done consistently; (2) others have a much more thin conception of “rational”, such as ‘tending to maximize utility’; and (3) still others want to have their cake and eat it too, building in a lot of human-value-specific content to their notion of “rationality”, but then treating this conception as though it had the same level of simplicity, naturalness, and objectivity as 2.
    
    I’m not positive I understand what (1) and (3) are referring to here, but I would say that there’s also at least a fourth way that philosophers often use the word “rational” (which is also the main way I use the word “rational.”) This is to refer to an irreducibly normative concept.
    
    The basic thought here is that not every concept can be usefully described in terms of more primitive concepts (i.e. “reduced”). As a close analogy, a dictionary cannot give useful non-circular definitions of every possible word—it requires the reader to have a pre-existing understanding of some foundational set of words. As a wonkier analogy, if we think of the space of possible concepts as a sort of vector space, then we sort of require an initial “basis” of primitive concepts that we use to describe the rest of the concepts.
    
    Some examples of concepts that are arguably irreducible are “truth,” “set,” “property,” “physical,” “existance,” and “point.” Insofar as we can describe these concepts in terms of slightly more primitive ones, the descriptions will typically fail to be very useful or informative and we will typically struggle to break the slightly more primitive ones down any further.
    
    To focus on the example of “truth,” some people have tried to reduce the concept substantially. Some people have argued, for example, that when someone says that “X is true” what they really mean or should mean is “I personally believe X” or “believing X is good for you.” But I think these suggested reductions pretty obviously don’t entirely capture what people mean when they say “X is true.” The phrase “X is true” also has an important meaning that is not amenable to this sort of reduction.
    
    [[EDIT: “Truth” may be a bad example, since it’s relatively controversial and since I’m pretty much totally unfamiliar with work on the philosophy of truth. But insofar as any concepts seem irreducible to you in this sense, or buy the more general argument that some concepts will necessarily be irreducible, the particular choice of example used here isn’t essential to the overall point.]]
    
    Some philosophers also employ normative concepts that they say cannot be reduced in terms of non-normative (e.g. psychological) properties. These concepts are said to be irreducibly normative.
    
    For example, here is Parfit on the concept of a normative reason (OWM, p. 1):
    
    We can have reasons to believe something, to do something, to have some desire or aim, and to have many other attitudes and emotions, such as fear, regret, and hope. Reasons are given by facts, such as the fact that someone’s finger-prints are on some gun, or that calling an ambulance would save someone’s life.
    
    It is hard to explain the concept of a reason, or what the phrase ‘a reason’ means. Facts give us reasons, we might say, when they count in favour of our having some attitude, or our acting in some way. But ‘counts in favour of’ means roughly ‘gives a reason for’. Like some other fundamental concepts, such as those involved in our thoughts about time, consciousness, and possibility, the concept of a reason is indefinable in the sense that it cannot be helpfully explained merely by using words. We must explain such concepts in a different way, by getting people to think thoughts that use these concepts. One example is the thought that we always have a reason to want to avoid being in agony.
    
    When someone says that a concept they are using is irreducible, this is obviously some reason for suspicion. A natural suspicion is that the real explanation for why they can’t give a useful description is that the concept is seriously muddled or fails to grip onto anything in the real world. For example, whether this is fair or not, I have this sort of suspicion about the concept of “dao” in daoist philosophy.
    
    But, again, it will necessarily be the case that some useful and valid concepts are irreducible. So we should sometimes take evocations of irreducible concepts seriously. A concept that is mostly undefined is not always problematically “underdefined.”
    
    When I talk about “normative anti-realism,” I mostly have in mind the position that claims evoking irreducably normative concepts are never true (either because these claims are all false or because they don’t even have truth values). For example: Insofar as the word “should” is being used in an irreducibly normative sense, there is nothing that anyone “should” do.
    
    [[Worth noting, though: The term “normative realism” is sometimes given a broader definition than the one I’ve sketched here. In particular, it often also includes a position known as “analytic naturalist realism” that denies the relevance of irreducibly normative concepts. I personally feel I understand this position less well and I think sometimes waffle between using the broader and narrower definition of “normative realism.” I also more generally want to stress that not everyone who makes claims about “criterion of rightness” or employs other seemingly normative language is actually a normative realist in the narrow or even broad sense; what I’m doing here is just sketching one common especially salient perspective.]]
    
    One motivation for evoking irreducibly normative concepts is the observation that—in the context of certain discussions—it’s not obvious that there’s any close-to-sensible way to reduce the seemingly normative concepts that are being used.
    
    For example, suppose we follow a suggestion once made by Eliezer to reduce the concept of “a rational choice” to the concept of “a winning choice” (or, in line with the type-2 conception you mention, a “utility-maximizing choice”). It seems difficult to make sense of a lot of basic claims about rationality if we use this reduction—and other obvious alternative reductions don’t seem to fair much better. To mostly quote from a comment I made elsewhere:
    
    Suppose we want to claim that it is rational to try to maximize the expected winning (i.e. the expected fulfillment of your preferences). Due to randomness/uncertainty, though, an agent that tries to maximize expected “winning” won’t necessarily win compared to an agent that does something else. If I spend a dollar on a lottery ticket with a one-in-a-billion chance of netting me a billion-and-one “win points,” then I’m taking the choice that maximizes expected winning but I’m also almost certain to lose. So we can’t treat “the rational action” as synonymous with “the action taken by an agent that wins.”
    
    We can try to patch up the issue here by reducing “the rational action” to “the action that is consistent with the VNM axioms,” but in fact either action in this case is consistent with the VNM axioms. The VNM axioms don’t imply that an agent must maximize the expected desirability of outcomes. They just imply that an agent must maximize the expected value of some function. It is totally consistent with the axioms, for example, to be effectively risk averse and instead maximize the expected square root of desirability. If we try to define “the action I should take” in this way, then the claim “it is rational to act consistently with the VNM axioms” also becomes an empty tautology.
    
    We could of course instead reduce “the rational action” to “the action that maximizes expected winning.” But now, of course, the claim “it is rational to maximize expected winning” no longer has any substantive content. When we make this claim, do we really mean to be stating an empty tautology? And do we really consider it trivially incoherent to wonder—e.g. in a Pascal’s mugging scenario—whether it might be “rational” to take an action other than the one that maximizes expected winning? If not, then this reduction is a very poor fit too.
    
    It ultimately seems hard, at least to me, to make non-vacuous true claims about what it’s “rational” to do withoit evoking a non-reducible notion of “rationality.” If we are evoking a non-reducible notion of rationality, then it makes sense that we can’t provide a satisfying reduction.
    
    FN15 in my post on normative realism elaborates on this point.
    
    At the same time, though, I do think there are also really good and hard-to-counter epistemological objections to the existance of irreducibly normative properties (e.g. the objection described in this paper). You might also find the difficulty of reducing normative concepts a lot less obvious-seeming or problematic than I do. You might think, for example, that the difficulty of reducing “rationality” is less like the difficulty of reducing “truth” (which IMO mainly reflects the fact that truth is an important primitive concept) and more like the difficulty of defining the word “soup” in a way that perfectly matches our intuitive judgments about what counts as “soup” (which IMO mainly reflects the fact that “soup” is a high-dimensional concept). So I definitely don’t want to say normative realism is obviously or even probably right.
    
    I mainly just want to communicate the sort of thing that I think a decent chunk of philosophers have in mind when they talk about a “rational decision” or a “criterion of rightness.” Although, of course, philosophy being philosophy, plenty of people do of course have in mind plenty of different things.
    - RobBensinger Nov 27, 2019, 5:22 AM
      3 points
      0 ∶ 0
      Parent
      So, as an experiment, I’m going to be a very obstinate reductionist in this comment. I’ll insist that a lot of these hard-seeming concepts aren’t so hard.
      Many of them are complicated, in the fashion of “knowledge”—they admit an endless variety of edge cases and exceptions—but these complications are quirks of human cognition and language rather than deep insights into ultimate metaphysical reality. And where there’s a simple core we can point to, that core generally isn’t mysterious.
      It may be inconvenient to paraphrase the term away (e.g., because it packages together several distinct things in a nice concise way, or has important emotional connotations, or does important speech-act work like encouraging a behavior). But when I say it “isn’t mysterious”, I mean it’s pretty easy to see how the concept can crop up in human thought even if it doesn’t belong on the short list of deep fundamental cosmic structure terms.
      I would say that there’s also at least a fourth way that philosophers often use the word “rational,” which is also the main way I use the word “rational.” This is to refer to an irreducibly normative concept.
      Why is this a fourth way? My natural response is to say that normativity itself is either a messy, parochial human concept (like “love,” “knowledge,” “France”) , or it’s not (in which case it goes in bucket 2).
      Some examples of concepts that are arguably irreducible are “truth,” “set,” “property,” “physical,” “existance,” and “point.”
      Picking on the concept here that seems like the odd one out to me: I feel confident that there isn’t a cosmic law (of nature, or of metaphysics, etc.) that includes “truth” as a primitive (unless the list of primitives is incomprehensibly long). I could see an argument for concepts like “intentionality/reference”, “assertion”, or “state of affairs”, though the former two strike me as easy to explain in simple physical terms.
      Mundane empirical “truth” seems completely straightforward. Then there’s the truth of sentences like “Frodo is a hobbit”, “2+2=4”, “I could have been the president”, “Hamburgers are more delicious than battery acid”… Some of these are easier or harder to make sense of in the naive correspondence model, but regardless, it seems clear that our colloquial use of the word “true” to refer to all these different statements is pre-philosophical, and doesn’t reflect anything deeper than that “each of these sentences at least superficially looks like it’s asserting some state of affairs, and each sentence satisfies the conventional assertion-conditions of our linguistic community”.
      I think that philosophers are really good at drilling down on a lot of interesting details and creative models for how we can try to tie these disparate speech-acts together. But I think there’s also a common failure mode in philosophy of treating these questions as deeper, more mysterious, or more joint-carving than the facts warrant. Just because you can argue about the truthmakers of “Frodo is a hobbit” doesn’t mean you’re learning something deep about the universe (or even something particularly deep about human cognition) in the process.
      [Parfit:] It is hard to explain the concept of a reason, or what the phrase ‘a reason’ means. Facts give us reasons, we might say, when they count in favour of our having some attitude, or our acting in some way. But ‘counts in favour of’ means roughly ‘gives a reason for’. Like some other fundamental concepts, such as those involved in our thoughts about time, consciousness, and possibility, the concept of a reason is indefinable in the sense that it cannot be helpfully explained merely by using words.
      Suppose I build a robot that updates hypotheses based on observations, then selects actions that its hypotheses suggest will help it best achieve some goal. When the robot is deciding which hypotheses to put more confidence in based on an observation, we can imagine it thinking, “To what extent is observation o a [WORD] to believe hypothesis h?” When the robot is deciding whether it assigns enough probability to h to choose an action a, we can imagine it thinking, “To what extent is P(h)=0.7 a [WORD] to choose action a?” As a shorthand, when observation o updates a hypothesis h that favors an action a, the robot can also ask to what extent o itself is a [WORD] to choose a.
      When two robots meet, we can moreover add that they negotiate a joint “compromise” goal that allows them to work together rather than fight each other for resources. In communicating with each other, they then start also using “[WORD]” where an action is being evaluated relative to the joint goal, not just the robot’s original goal.
      Thus when Robot A tells Robot B “I assign probability 90% to ‘it’s noon’, which is [WORD] to have lunch”, A may be trying to communicate that A wants to eat, or that A thinks eating will serve A and B’s joint goal. (This gets even messier if the robots have an incentive to obfuscate which actions and action-recommendations are motivated by the personal goal vs. the joint goal.)
      If you decide to relabel “[WORD]” as “reason”, I claim that this captures a decent chunk of how people use the phrase “a reason”. “Reason” is a suitcase word, but that doesn’t mean there are no similarities between e.g. “data my goals endorse using to adjust the probability of a given hypothesis” and “probabilities-of-hypotheses my goals endorse using to select an action”), or that the similarity is mysterious and ineffable.
      (I recognize that the above story leaves out a lot of important and interesting stuff. Though past a certain point, I think the details will start to become Gettier-case nitpicks, as with most concepts.)
      For example, suppose we follow a suggestion once made by Eliezer to reduce the concept of “a rational choice” to the concept of “a winning choice” (or, in line with the type-2 conception you mention, a “utility-maximizing choice”).
      That essay isn’t trying to “reduce” the term “rationality” in the sense of taking a pre-existing word and unpacking or translating it. The essay is saying that what matters is utility, and if a human being gets too invested in verbal definitions of “what the right thing to do is”, they risk losing sight of the thing they actually care about and were originally in the game to try to achieve (i.e., their utility).
      Therefore: if you’re going to use words like “rationality”, make sure that the words in question won’t cause you to shoot yourself in the foot and take actions that will end up costing you utility (e.g., costing human lives, costing years of averted suffering, costing money, costing anything or everything). And if you aren’t using “rationality” in a safe “nailed-to-utility” way, make sure that you’re willing to turn on a time and stop being “rational” the second your conception of rationality starts telling you to throw away value.
      It ultimately seems hard, at least to me, to make non-vacuous true claims about what it’s “rational” to do withoit evoking a non-reducible notion of “rationality.”
      “Rationality” is a suitcase word. It refers to lots of different things. On LessWrong, examples include not just “(systematized) winning” but (as noted in the essay) “Bayesian reasoning”, or in Rationality: Appreciating Cognitive Algorithms, “cognitive algorithms or mental processes that systematically produce belief-accuracy or goal-achievement”. In philosophy, the list is a lot longer.
      The common denominator seems to largely be “something something reasoning / deliberation” plus (as you note) “something something normativity / desirability / recommendedness / requiredness”.
      The idea of “normativity” doesn’t currently seem that mysterious to me either, though you’re welcome to provide perplexing examples. My initial take is that it seems to be a suitcase word containing a bunch of ideas tied to:
      Goals/preferences/values, especially overridingly strong ones.
      Encouraged, endorsed, mandated, or praised conduct.
      Encouraging, endorsing, mandating, and praising are speech-acts that seem very central to how humans perceive and intervene on social situations; and social situations seem pretty central to human cognition overall. So I don’t think it’s particularly surprising if words associated with such loaded ideas would have fairly distinctive connotations and seem to resist reduction, especially reduction that neglects the pragmatic dimensions of human communication and only considers the semantic dimension.
      - bgarfinkel Nov 28, 2019, 2:20 AM
        2 points
        0 ∶ 0
        Parent
        I may write up more object-level thoughts here, because this is interesting, but I just wanted to quickly emphasize the upshot that initially motivated me to write up this explanation.
        
        (I don’t really want to argue here that non-naturalist or non-analytic naturalist normative realism of the sort I’ve just described is actually a correct view; I mainly wanted to give a rough sense of what the view consists of and what leads people to it. It may well be the case that the view is wrong, because all true normative-seeming claims are in principle reducible to claims about things like preferences. I think the comments you’ve just made cover some reasons to suspect this.)
        
        The key point is just that when these philosophers say that “Action X is rational,” they are explicitly reporting that they do not mean “Action X suits my terminal preferences” or “Action X would be taken by an agent following a policy that maximizes lifetime utility” or any other such reduction.
        
        I think that when people are very insistent that they don’t mean something by their statements, it makes sense to believe them. This implies that the question they are discussing—“What are the necessary and sufficient conditions that make a decision rational?”—is distinct from questions like “What decision would an agent that tends to win take?” or “What decision procedure suits my terminal preferences?”
        
        It may be the case that the question they are asking is confused or insensible—because any sensible question would be reducible—but it’s in any case different. So I think it’s a mistake to interpret at least these philosophers’ discussions of “decisions theories” or “criteria of rightness” as though they were discussions of things like terminal preferences or winning strategies. And it doesn’t seem to me like the answer to the question they’re asking (if it has an answer) would likely imply anything much about things like terminal preferences or winning strategies.
        
        [[NOTE: Plenty of decision theorists are not non-naturalist or non-analytic naturalist realists, though. It’s less clear to me how related or unrelated the thing they’re talking about is to issues of interest to MIRI. I think that the conception of rationality I’m discussing here mainly just presents an especially clear case.]]