turchin comments on Effective altruism is similar to the AI alignment problem and suffers from the same difficulties [Criticism and Red Teaming Contest entry]

turchin 19 Jun 2022 20:16 UTC
1 point
0 ∶ 0
Two things I am speaking about are:
(1) what is terminal moral value (good) and
(2) how we can increase it.
EA have some understanding what are 1 and 2, like 1 = wellbeing and 2 = donations to effective charities.
But if we ask AI safety researcher, he can’t point on what should be the final goal of a friendly AI. Maximum what he can say is that future superintelligence will solve this task. Any attempt to define “good” will suffer from our incomplete understanding.
EA works both on AI safety, where good is undefined, and on non-AI-related issue where good is defined. This looks contradictory: either we know what is real good and could use this knowledge in AI safety, or we don’t know, and in that case we can’t do anything useful.
- Marcel D 19 Jun 2022 20:44 UTC
  1 point
  0 ∶ 0
  Parent
  In that case, I think there are some issues with equivocation and/or oversimplification in this comparison:
  1. EAs don’t “know what is good” in specific terms like “how do we rigorously define and measure the concept of ‘goodness’”; well-being is a broad metric which we tend to use because people understand each other, which is made easier by the fact that humans tend to have roughly similar goals and constraints. (There are other things to say here, but the more important point is next.)
  2. One of the major problems we face with AI safety is that even if we knew how to objectively define and measure good we aren’t sure how to encode that into a machine and ensure it does what we actually want it to do (as opposed to exploiting loopholes or other forms of reward hacking).
  So the following statement doesn’t seem to be a valid criticism/contradiction:
  
  This looks contradictory: either we know what is real good and could use this knowledge in AI safety, or we don’t know, and in that case we can’t do anything useful.
  - turchin 20 Jun 2022 8:04 UTC
    1 point
    0 ∶ 0
    Parent
    The problem with (1) is that here it is assumed that fuzzy set of well-being has a subset of “real goodness” inside it, but we just don’t know how to define it correctly. But it could be the real goodness is outside well-being. In my view, reaching radical life extension and death-reversal is more important than well-being, if it is understood as comfortable healthy life.
    The fact that an organisation is doing good assumes that some concept of good exists in it. And we can’t do good effectively without measuring it, which requires even stricter model of good. In other words, altruism can’t be effective, if it escapes defining good.
    Moreover, some choices about what is good could be pivotal acts both for organisations and for AIs: like should we work more on biosafety, on nuclear war prevention, or digital immortality (data preservation). Here again we are ready to make such choices for organisation, but not for AI.
    Of cause I known that (2) is the main problem in AI alignment. But what I wanted to say here is that many problems which we encounter in AI alignment, also reappear in organisations, e.g. goodharting. Without knowing how to solve them, we can’t do good effectively.
    - Marcel D 20 Jun 2022 14:57 UTC
      1 point
      0 ∶ 0
      Parent
      In short, I don’t find your arguments persuasive, and I think they’re derived from some errors such as equivocation, weird definitions, etc.
      But it could be the real goodness is outside well-being. In my view, reaching radical life extension and death-reversal is more important than well-being, if it is understood as comfortable healthy life.
      First of all, I don’t understand the conflict here—why would you want life extension/death reversal if not to improve wellbeing? Wellbeing is almost definitionally what makes life worth living; I think you simply may not be translating or understanding “wellbeing” correctly. Furthermore, you don’t seem to offer any justification for that view: what could plausibly make life extension and death-reversal more valuable than wellbeing (given that wellbeing is still what determines the quality of life of the extended lives).
      The fact that an organisation is doing good assumes that some concept of good exists in it. And we can’t do good effectively without measuring it, which requires even stricter model of good. In other words, altruism can’t be effective, if it escapes defining good.
      You can assert things as much as you’d like, but that doesn’t justify the claims. Someone does not need to objectively, 100% confidently “know” what is “good” nor how to measure it if various rough principles, intuition, and partial analysis suffices. Maybe saving people from being tortured or killed isn’t good—I can’t mathematically or psychologically prove to you why it is good—but that doesn’t mean I should be indifferent about pressing a button which prevents 100 people from being tortured until I can figure out how to rigorously prove what is “good.”
      Moreover, some choices about what is good could be pivotal acts both for organisations and for AIs: like should we work more on biosafety, on nuclear war prevention, or digital immortality (data preservation). Here again we are ready to make such choices for organisation, but not for AI.
      This almost feels like a non-sequitur that fails to explicitly make a point, but my impression is that it’s saying “it’s inconsistent/contradictory to think that we can decide what organizations should do but not be able to align AI.” 1) This and the following paragraph still don’t address my second point from my previous comment, and so you can’t say “well, I know that (2) is a problem, but I’m talking about the inconsistency”—a sufficient justification for the inconsistency is (2) all by itself; 2) The reason we can do this with organizations more comfortably is that mistakes are far more corrigible, whereas with sufficiently powerful AI systems, screwing up the alignment/goals may be the last meaningful mistake we ever make.
      But what I wanted to say here is that many problems which we encounter in AI alignment, also reappear in organisations, e.g. goodharting. Without knowing how to solve them, we can’t do good effectively.
      I very slightly agree with the first point, but not the second point (in part for reasons described earlier). On the first point, yes, “alignment problems” of some sort often show up in organizations. However: 1) see my point above (mistakes in organizations are more corrigible); 2) aligning humans/organizations—with which we share some important psychological traits and have millennia of experience working with—is fairly different from aligning machines in terms of the challenges. So “solving” (or mostly solving) either one does not necessarily guarantee solutions to the other.
      - turchin 20 Jun 2022 18:43 UTC
        1 point
        0 ∶ 0
        Parent
        The transition from “good” to “wellbeing” seems rather innocent, but it opens the way to rather popular line of reasoning: that we should care only about the number of happy observer-moments, without caring whose are these moments. Extrapolating, we stop caring about real humans, but start caring about possible animals. In other words, it opens the way to pure utilitarian-open-individualist bonanza, where value of human life and individuality are lost and badness of death is ignored. The last point is most important for me, as I view irreversible mortality as the main human problem.
        I wrote more about why death is bad in Fighting Aging as an Effective Altruism Cause: A Model of the Impact of the Clinical Trials of Simple Interventions – and decided not to say it again in the main post, as the conditions of the contest requires that only new material should be published, but I recently found that the similar problem was raised in another application in the section “Defending person-affecting views”.
        Marcel D 20 Jun 2022 19:20 UTC
        8 points
        0 ∶ 0
        Parent
        The transition from “good” to “wellbeing” seems rather innocent, but it opens the way to rather popular line of reasoning: that we should care only about the number of happy observer-moments, without caring whose are these moments. Extrapolating, we stop caring about real humans, but start caring about possible animals. In other words, it opens the way to pure utilitarian-open-individualist bonanza, where value of human life and individuality are lost and badness of death is ignored. The last point is most important for me, as I view irreversible mortality as the main human problem.
        To be totally honest, this really gives off vibes of “I personally don’t want to die and I therefore don’t like moral reasoning that even entertains the idea that humans (me) may not be the only thing we should care about.” Gee, what a terrible world it might be if we “start caring about possible animals”!
        Of course, that’s probably not what you’re actually/consciously arguing, but the vibes are still there. It particularly feels like motivated reasoning when you gesture to abstract, weakly-defined concepts like the “value of human life and individuality” and imply they should supersede concepts like wellbeing, which, when properly defined and when approaching questions from a utilitarian framework, should arguably subsume everything morally relevant.
        You seem to dispute the (fundamental concept? application?) of utilitarianism for a variety of reasons—some of which (e.g., your very first example regarding the fog of distance) I see as reflecting a remarkably shallow/motivated (mis)understanding of utilitarianism, to be honest. (For example, the fog case seems to not understand that utilitarian decision-making/analysis is compatible with decision-making under uncertainty.)
        If you’d like to make a more compelling criticism that stems from rebuffing utilitarianism, I would strongly learning more about the framework from people who at least decently understand and promote/use the concept, such as here: https://www.utilitarianism.net/objections-to-utilitarianism#general-ways-of-responding-to-objections-to-utiliarianism
        turchin 21 Jun 2022 15:25 UTC
        1 point
        0 ∶ 0
        Parent
        I need to clarify my views: I want to save humans first, and after that save all animals, from closest to humans to more remote. By “saving” I mean resurrection of the dead, of course. I am pro resurrection of mammoth and I am for cryonics for pets. Such framework will eventually save everyone, so in infinity it converges with other approaches to saving animals.
        But “saving humans first” gives us a leverage, because we will have more powerful civilisation which will have higher capacity to do more good. If humans will extinct now, animals will eventually extinct too when Sun will become a little brighter, around 600 mln. years from now.
        But the claim that I want to save only my life is factually false.
        Marcel D 22 Jun 2022 4:37 UTC
        7 points
        0 ∶ 0
        Parent
        I’m afraid you’ve totally lost me at this point. Saving mammoths?? Why??
        
        And are you seriously suggesting that we can resurrect dead people whose brains have completely decayed? What?
        
        And what is this about saving humans first? No, we don’t have to save every human first, we theoretically only need to save enough so that the process of (whatever you’re trying to accomplish?) can continue. If we are strictly welfare-maximizing without arbitrary speciesism, it may mean prioritizing saving some of the existing animals over every human currently (although this may be unlikely).
        
        To be clear, I certainly understand that you aren’t saying you only care about saving your own life, but the post gives off those kinds of vibes nonetheless.
        Charles He 22 Jun 2022 4:59 UTC
        4 points
        0 ∶ 0
        Parent
        Unless you’re collecting data for an EA forum simulator (not IRB approved) I would consider disengaging in some situations. Some posts probably aren’t going to first place as a red team prize.
        turchin 22 Jun 2022 16:00 UTC
        1 point
        0 ∶ 0
        Parent
        I am serious about resurrection of the dead, there are several ways, including running the simulation of the whole history of mankind and filling the knowledge gaps with random noise which, thanks to Everett, will be correct in one of the branches. I explained this idea in longer article: You Only Live Twice: A Computer Simulation of the Past Could be Used for Technological Resurrection
        Jeffrey Kursonis 22 Jun 2022 5:37 UTC
        1 point
        0 ∶ 0
        Parent
        What if we can develop future technology to read all the vibrations emanated from the earth from all of human history...the earliest ones will be farther out, the most recent ones near...then we can filter through them and recreate everything that ever happened on earth, effectively watching what happened in the past...and maybe even to the level of brain waves of each human, thus we could resurrect all previously dead humans by gathering their brain waves and everything they ever said...presumably once re-animated they could gain memory of things missed and reconstruct themselves further. Of course we could do this with all extinct animals too.
        
        This really becomes a new version of heaven. For the religious; what if this was G-d’s plan, not to give us a heaven but for us to create one with the minds we have (or have been given) this being the resurrection...maybe G-d’s not egoistic and doesn’t care if we acknowledge the originating gift meaning atheism is just fine. We do know love doesn’t seek self benefit so that would fit well since “G-d is love”. I like being both religious and atheist at the same time, which I am.
        
        I would like to thank turchin the author for inspiring this idea in me for it is truly blowing my mind. Please let me know of other writings on this.
        turchin 22 Jun 2022 16:05 UTC
        1 point
        0 ∶ 0
        Parent
        I wrote two articles about resurrection: You Only Live Twice: A Computer Simulation of the Past Could be Used for Technological Resurrection
        and
        Classification of Approaches to Technological Resurrection