Your impact matrix places all its weight on the view that animals a high enough moral value that donating to humans is net negative
If by weight you meant probability, then placing 100% of that in anything is not implied by a discrete matrix, which must use expected values (i.e the average of {probability × impact conditional on probability}). One could mentally replace each number with a range for which the original number is the average.
(It is the case that my comment premises a certain weighting, and humans should not update on implied premises, except in case of beliefs about what may be good to investigate, to avoid outside-view cascades.)
If you have a lot of uncertainty and you are risk averse
I think beliefs about risk-aversion are probably where the crux between us is.
Uncertaintyalone does not imply one should act in proportion to their probabilities.[1]
I don’t know what is meant by ‘risk averse’ in this context. More precisely, I claim risk aversion must either (i) follow instrumentally from one’s values, or (ii) not be the most good option under one’s own values.[2]
Example of (i), where acting in a way that looks risk-averse is instrumental to fulfilling ones actual values: The Kelly criterion.
In a simple positive-EV bet, like at 1:2-odds on a fair coinflip, if one continually bets all of their resources, the probability they eventually lose everything approaches 1 as all their gains are concentrated into an unlikely series of events, resulting in many possible worlds where they have nothing and one where they have a huge amount of resources. The average resources had across all possible worlds is highest in this case.
Under my values, that set of outcomes is actually much worse than available alternatives (due to diminishing value of additional resources in a single possible world). To avoid that, we can apply something called the Kelly criterion, or in general bet with sums that are substantially smaller than the full amount of currently had resources.
This lets us choose the distribution of resources over possible worlds that our values want to result from resource-positive-EV bets; we can accept a lower average for a more even distribution.
Similarly, if presented with a series of positive-EV bets about things you find morally valuable in themselves, I claim that if you Kelly bet in that situation, it is actually because your values are more complex than {linearly valuing those things} alone.
As an example, I would prefer {a 90% chance of saving 500 good lives} to {a certainty of saving 400} in a world that already had many lives, but if those 500 lives were all the lives that exist, I would switch to preferring the latter—a certainty of only 100 of the 500 dying—even if the resulting quantities then became the eternal maximum (no creation of new minds possible, so we can’t say it actually results in a higher expected amount).
This is because I have other values that require just some amount of lives to be satisfied, including vaguely ‘the unfolding of a story’, and ‘the light of life/curiosity/intelligence continuing to make progress in understanding metaphysics until no more is possible’.
Another way to say this would be to say that our values are effectively concave over the the thing in question, and we’re distributing them across possible futures.
This is importantly not what we do when we make a choice in an already large world, and we’re not effecting all of it—then we’re choosing between, e.g., {90%: 1,000,500, 10%: 1,000,000} and {100%: 1,000,400}. (And notably, we are in a very large world, even beyond Earth.)
At least my own values are over worlds per se, rather than the local effects of my actions per se. Maybe the framing of the latter leads to mistaken Kelly-like tradeoffs[3], and acting as if one assigns value to the fact of being net-positive itself.
(I expanded on this section about Kelly in a footnote at first, then had it replace example (i) in the main post. I think it might make the underlying principle clear enough to make example (ii) unnecessary, so I’ve moved (ii) to a footnote instead.)[4]
There are two relevant posts from Yudkowsky’s sequences that come to mind here. I could only find one of them, ‘Circular Altruism’. The other was about a study wherein people bet on multiple outcomes at once in proportion to the probability of each outcome, rather than placing their full bet on the most probable outcome, in a simple scenario where the latter was incentivized.
It just struck me that some technical term should be used instead of ‘risk aversion’ here, because the latter in everyday language includes things like taking a moment to check if you forgot anything before leaving home.
Example of (ii), where I seem to act risk-unaverse
I’m offered the option to press a dubious button. This example ended up very long, because there is more implied uncertainty than just the innate chances of the button being of either kind, but maybe the extra detail will help show what I mean / be more surface for a cruxy disagreement to be exposed.
I think (66%) it’s a magic artifact my friends have been looking for, in which case it {saves 1 vegan[5] who would have died} when pressed. But I’m not sure; it might also be (33%) a cursed decoy, in which case it {causes 1 vegan[5] who would not have died to die} when pressed instead.
I can’t gain evidence about which possible button it is. I have only my memories and reasoning with which to make the choice of how many times to press it.
Simplifying assumptions to try to make it closer to a platonic ideal than a real-world case (can be skipped):
The people it might save or kill will all have equal counterfactual moral impact (including their own life) in the time which would be added or taken from their life
Each death has an equal impact on those around them
The button can’t kill the presser
These are unrealistic, but they mean I don’t have to reason about how at-risk vegans are less likely to be alignment researchers than non-at-risk vegans who I risk killing, or how I might be saving people who don’t want to live, or how those at risk of death would have more prepared families, or how my death could cut short a series of bad presses, anything like that.
In this case, I first wonder what it means to ‘save a life’, and reason it must mean preventing a death that would otherwise occur. I notice that if no one is going to die, then no additional lives can be saved. I notice that there is some true quantity of vegans who will die absent any action, and I would like to just press the button exactly that many times, but I don’t know that true quantity, so I have reason about it under uncertainty.
So, I try to reason about what that quantity is by estimating an amount of lives at various levels of at-risk; and though my estimates are very uncertain (I don’t know what portion of the population is vegan, nor how likely different ones are to die), I still try.
In the end I have a wide probability distribution that is not very concentrated at any particular point, and which is not the one an ideal reasoner would produce, and because I cannot do any better, I press the button exactly as many times as there are deaths in the distribution’s average[6].
More specifically, I stop once it has a ≤ 50% chance of {saving an additional life conditional on it already being a life-saving button}, because anything less, when multiplied by the 66% chance of it being a life-saving button, would be an under 33% total chance of saving a life compared to a 33% chance of certainly ending one. The last press will have only a very slightly positive EV, and one press further would have a very slightly negative EV.
Someone following a ‘risk averse principle’ might stop pressing once their distribution says an additional press scores less than 60% on that conditional, or something. They may reason, “Pressing it only so many times seems likely to do good across the vast majority of worldviews in the probability distribution,” and that would be true.
In my view, that’s just accepting the opposite trade: declining a 60% chance of preventing a death in return for a 40% chance of preventing a death.
I don’t see why this simple case would not generalize to reasoning about real-world actions under uncertainties about different things like how bad the experience would be as a factory farmed animal. But it would be positive for me to learn of such reasons if I’m missing something.
(given the setup’s simplifying assumptions. in reality, there might be a huge average number that mostly comes from tail-worlds, let alone probable environment hackers)
I agree that uncertainty alone doesn’t warrant separate treatment, and risk aversion is key.
(Before I get into the formal stuff, risk aversion to me just means placing a premium on hedging. I say this in advance because conversations about risk aversion vs risk neutrality tend to devolve into out-there comparisons like the St Petersburg paradox, and that’s never struck me as a particularly resonant way to think about it. I am risk averse for the same reason that most people are: it just feels important to hedge your bets.)
By risk aversion I mean a utility function that satisfies u(E[X])>E[u(X)]. Notably, that means that you can’t just take the expected value of lives saved across worlds when evaluating a decision – the distribution of how those lives are saved across worlds matters. I describe that more here.
For example, say my utility function over lives saved x is u(x)=√x. You offer me a choice between a charity that has a 10% chance to save 100 lives, and a charity that saves 5 lives with certainty. The utility of the former option to me is u(x)=0.1⋅√100=1, while the utility of the latter option is u(x)=1⋅√5. Thus, I choose the latter, even though it has lower expected lives saved (E[x]=0.1⋅100=10 for the former, E[x]=5 for the latter). What’s going on is that I am valuing certain impact over higher expected lives saved.
Apply this to the meat eater problem, where we have the choices
spend $10 on animal charities
spend $10 on development charities
spend $5 on each of them
If you’re risk neutral, 1) or 2) are the way to go – pick animals if your best bet is that animals are worth more (accounting for efficacy, room for funding, etc etc), and pick development if your best bet is that humans are worth more. But both options leave open the possibility that you are terribly wrong and you’ve wasted $10 or caused harm. Option 3) guarantees that you’ve created some positive value, regardless of whether animals or humans are worth more. If you’re risk-averse, that certain positive value is worth more than a higher expected value.
It sounds like we agree about what risk aversion is! The term I use that includes your example of valuing the square root of lives saved is a ‘concave utility function’. I have one of these, sort of; it goes up quickly for the first x lives (I’m not sure how large x is exactly), then becomes more linear.
But it’s unexpected to me for other EAs to value {amount of good lives saved by one’s own effect} rather than {amount of good lives per se}. I tried to indicate in my comment that I think this might be the crux, given the size of the world.
(In your example of valuing the square root of lives saved (or lives per se), if there’s 1,000 good lives already, then preventing 16 deaths has a utility of 4 under the former, and √1000−√984 under the latter; and preventing 64 is twice as valuable under the former, but ~4x as valuable under the latter)
But it’s unexpected to me for other EAs to value {amount of good lives saved by one’s own effect} rather than {amount of good lives per se}.
Your parenthetical clarifies that you just find it weird because you could add a constant inside the concave function and change the relative value of outcomes. I just don’t see any reason to do that? Why does the size of the world net of your decision determine the optimal decision?
The parenthetical isn’t why it’s unexpected, but clarifying how it’s actually different.
As an attempt at building intuition for why it matters, consider if an agent applied the ‘square of lives saved by me’ function newly to each action instead of keeping track of how many lives they’ve saved over their existence. Then this agent would gain more utility by taking four separate actions, each of which certainly save 1 life (for 1 utility each), than from one lone action that certainly saves 15 lives (for 3.87 utility). Then generalize this example to the case where they do keep track, and progress just ‘resets’ for new clones of them. Or the real-world case where there’s multiple agents with similar values.
Why does the size of the world net of your decision determine the optimal decision?
I describe this starting from 6 paragraphs up in my edited long comment. I’m not sure if you read it pre- or post-edit.
Could you describe your intuitions? ‘valuing {amount of good lives saved by one’s own effect} rather than {amount of good lives per se}’ is really unintuitive to me.
To me, risk aversion is just a way of hedging your bets about the upsides and downsides of your decision. It doesn’t make sense to me to apply risk aversion to objects that feature no risk (background facts about the world, like its size). It has nothing to do with whether we value the size of the world. It’s just that those background facts are certain, and von Neumann-Morgenstern utility functions like we are using are really designed to deal with uncertainty.
Another way to put it is that concave utility functions just mean something very different when applied to certain situations vs uncertain situations.
In the presence of certainty, saying you have a concave utility function means you genuinely place lower value on additional lives given the presence of many lives. That seems to be the position you are describing. I don’t resonate with that, because I think additional lives have constant value to me (if everything is certain).
But in the presence of uncertainty, saying that you have a concave utility function just means that you don’t like high-variance outcomes. That is the position I am taking. I don’t want to be screwed by tail outcomes. I want to hedge against them. If there were zero uncertainty, I would behave like my utility function was linear, but there is uncertainty, so I don’t.
I introduced this topic and wrote more about it in this shortform. I wanted to give the topic its own thread and see if others might have responses.
I don’t want to be screwed by tail outcomes. I want to hedge against them.
I do this too, but even despite the worlds size making my choices mostly only effecting value on the linear parts of my value function! Because tail outcomes are often large. (Maybe I mean something like kelly-betting/risk-aversion is often useful to fulfill instrumental subgoals too).
(Edit: and I think ‘correctly accounting for tail outcomes’ is just the correct way to deal with them).
saying you have a concave utility function means you genuinely place lower value on additional lives given the presence of many lives
Yes, though it’s not because additional lives are less intrinsically valuable, but because I have other values which are non-quantitative (narrative) and almost maxxed out way before there are very large numbers of lives.
A different way to say it would be that I value multiple things, but many of them don’t scale indefinitely with lives, so the overall function goes up faster at the start of the lives graph.
If by weight you meant probability, then placing 100% of that in anything is not implied by a discrete matrix, which must use expected values (i.e the average of {probability × impact conditional on probability}). One could mentally replace each number with a range for which the original number is the average.
(It is the case that my comment premises a certain weighting, and humans should not update on implied premises, except in case of beliefs about what may be good to investigate, to avoid outside-view cascades.)
I think beliefs about risk-aversion are probably where the crux between us is.
Uncertainty alone does not imply one should act in proportion to their probabilities.[1]
I don’t know what is meant by ‘risk averse’ in this context. More precisely, I claim risk aversion must either (i) follow instrumentally from one’s values, or (ii) not be the most good option under one’s own values.[2]
Example of (i), where acting in a way that looks risk-averse is instrumental to fulfilling ones actual values: The Kelly criterion.
In a simple positive-EV bet, like at 1:2-odds on a fair coinflip, if one continually bets all of their resources, the probability they eventually lose everything approaches 1 as all their gains are concentrated into an unlikely series of events, resulting in many possible worlds where they have nothing and one where they have a huge amount of resources. The average resources had across all possible worlds is highest in this case.
Under my values, that set of outcomes is actually much worse than available alternatives (due to diminishing value of additional resources in a single possible world). To avoid that, we can apply something called the Kelly criterion, or in general bet with sums that are substantially smaller than the full amount of currently had resources.
This lets us choose the distribution of resources over possible worlds that our values want to result from resource-positive-EV bets; we can accept a lower average for a more even distribution.
Similarly, if presented with a series of positive-EV bets about things you find morally valuable in themselves, I claim that if you Kelly bet in that situation, it is actually because your values are more complex than {linearly valuing those things} alone.
As an example, I would prefer {a 90% chance of saving 500 good lives} to {a certainty of saving 400} in a world that already had many lives, but if those 500 lives were all the lives that exist, I would switch to preferring the latter—a certainty of only 100 of the 500 dying—even if the resulting quantities then became the eternal maximum (no creation of new minds possible, so we can’t say it actually results in a higher expected amount).
This is because I have other values that require just some amount of lives to be satisfied, including vaguely ‘the unfolding of a story’, and ‘the light of life/curiosity/intelligence continuing to make progress in understanding metaphysics until no more is possible’.
Another way to say this would be to say that our values are effectively concave over the the thing in question, and we’re distributing them across possible futures.
This is importantly not what we do when we make a choice in an already large world, and we’re not effecting all of it—then we’re choosing between, e.g., {90%: 1,000,500, 10%: 1,000,000} and {100%: 1,000,400}. (And notably, we are in a very large world, even beyond Earth.)
At least my own values are over worlds per se, rather than the local effects of my actions per se. Maybe the framing of the latter leads to mistaken Kelly-like tradeoffs[3], and acting as if one assigns value to the fact of being net-positive itself.
(I expanded on this section about Kelly in a footnote at first, then had it replace example (i) in the main post. I think it might make the underlying principle clear enough to make example (ii) unnecessary, so I’ve moved (ii) to a footnote instead.)[4]
There are two relevant posts from Yudkowsky’s sequences that come to mind here. I could only find one of them, ‘Circular Altruism’. The other was about a study wherein people bet on multiple outcomes at once in proportion to the probability of each outcome, rather than placing their full bet on the most probable outcome, in a simple scenario where the latter was incentivized.
(Not including edge-cases where an agent values being risk-averse)
It just struck me that some technical term should be used instead of ‘risk aversion’ here, because the latter in everyday language includes things like taking a moment to check if you forgot anything before leaving home.
Example of (ii), where I seem to act risk-unaverse
I’m offered the option to press a dubious button. This example ended up very long, because there is more implied uncertainty than just the innate chances of the button being of either kind, but maybe the extra detail will help show what I mean / be more surface for a cruxy disagreement to be exposed.
I think (66%) it’s a magic artifact my friends have been looking for, in which case it {saves 1 vegan[5] who would have died} when pressed. But I’m not sure; it might also be (33%) a cursed decoy, in which case it {causes 1 vegan[5] who would not have died to die} when pressed instead.
I can’t gain evidence about which possible button it is. I have only my memories and reasoning with which to make the choice of how many times to press it.
Simplifying assumptions to try to make it closer to a platonic ideal than a real-world case (can be skipped):
The people it might save or kill will all have equal counterfactual moral impact (including their own life) in the time which would be added or taken from their life
Each death has an equal impact on those around them
The button can’t kill the presser
These are unrealistic, but they mean I don’t have to reason about how at-risk vegans are less likely to be alignment researchers than non-at-risk vegans who I risk killing, or how I might be saving people who don’t want to live, or how those at risk of death would have more prepared families, or how my death could cut short a series of bad presses, anything like that.
In this case, I first wonder what it means to ‘save a life’, and reason it must mean preventing a death that would otherwise occur. I notice that if no one is going to die, then no additional lives can be saved. I notice that there is some true quantity of vegans who will die absent any action, and I would like to just press the button exactly that many times, but I don’t know that true quantity, so I have reason about it under uncertainty.
So, I try to reason about what that quantity is by estimating an amount of lives at various levels of at-risk; and though my estimates are very uncertain (I don’t know what portion of the population is vegan, nor how likely different ones are to die), I still try.
In the end I have a wide probability distribution that is not very concentrated at any particular point, and which is not the one an ideal reasoner would produce, and because I cannot do any better, I press the button exactly as many times as there are deaths in the distribution’s average[6].
More specifically, I stop once it has a ≤ 50% chance of {saving an additional life conditional on it already being a life-saving button}, because anything less, when multiplied by the 66% chance of it being a life-saving button, would be an under 33% total chance of saving a life compared to a 33% chance of certainly ending one. The last press will have only a very slightly positive EV, and one press further would have a very slightly negative EV.
Someone following a ‘risk averse principle’ might stop pressing once their distribution says an additional press scores less than 60% on that conditional, or something. They may reason, “Pressing it only so many times seems likely to do good across the vast majority of worldviews in the probability distribution,” and that would be true.
In my view, that’s just accepting the opposite trade: declining a 60% chance of preventing a death in return for a 40% chance of preventing a death.
I don’t see why this simple case would not generalize to reasoning about real-world actions under uncertainties about different things like how bad the experience would be as a factory farmed animal. But it would be positive for me to learn of such reasons if I’m missing something.
(To avoid, in the thought experiment, the very problem this post is about)
(given the setup’s simplifying assumptions. in reality, there might be a huge average number that mostly comes from tail-worlds, let alone probable environment hackers)
I agree that uncertainty alone doesn’t warrant separate treatment, and risk aversion is key.
(Before I get into the formal stuff, risk aversion to me just means placing a premium on hedging. I say this in advance because conversations about risk aversion vs risk neutrality tend to devolve into out-there comparisons like the St Petersburg paradox, and that’s never struck me as a particularly resonant way to think about it. I am risk averse for the same reason that most people are: it just feels important to hedge your bets.)
By risk aversion I mean a utility function that satisfies u(E[X])>E[u(X)]. Notably, that means that you can’t just take the expected value of lives saved across worlds when evaluating a decision – the distribution of how those lives are saved across worlds matters. I describe that more here.
For example, say my utility function over lives saved x is u(x)=√x. You offer me a choice between a charity that has a 10% chance to save 100 lives, and a charity that saves 5 lives with certainty. The utility of the former option to me is u(x)=0.1⋅√100=1, while the utility of the latter option is u(x)=1⋅√5. Thus, I choose the latter, even though it has lower expected lives saved (E[x]=0.1⋅100=10 for the former, E[x]=5 for the latter). What’s going on is that I am valuing certain impact over higher expected lives saved.
Apply this to the meat eater problem, where we have the choices
spend $10 on animal charities
spend $10 on development charities
spend $5 on each of them
If you’re risk neutral, 1) or 2) are the way to go – pick animals if your best bet is that animals are worth more (accounting for efficacy, room for funding, etc etc), and pick development if your best bet is that humans are worth more. But both options leave open the possibility that you are terribly wrong and you’ve wasted $10 or caused harm. Option 3) guarantees that you’ve created some positive value, regardless of whether animals or humans are worth more. If you’re risk-averse, that certain positive value is worth more than a higher expected value.
It sounds like we agree about what risk aversion is! The term I use that includes your example of valuing the square root of lives saved is a ‘concave utility function’. I have one of these, sort of; it goes up quickly for the first x lives (I’m not sure how large x is exactly), then becomes more linear.
But it’s unexpected to me for other EAs to value {amount of good lives saved by one’s own effect} rather than {amount of good lives per se}. I tried to indicate in my comment that I think this might be the crux, given the size of the world.
(In your example of valuing the square root of lives saved (or lives per se), if there’s 1,000 good lives already, then preventing 16 deaths has a utility of 4 under the former, and √1000−√984 under the latter; and preventing 64 is twice as valuable under the former, but ~4x as valuable under the latter)
Your parenthetical clarifies that you just find it weird because you could add a constant inside the concave function and change the relative value of outcomes. I just don’t see any reason to do that? Why does the size of the world net of your decision determine the optimal decision?
The parenthetical isn’t why it’s unexpected, but clarifying how it’s actually different.
As an attempt at building intuition for why it matters, consider if an agent applied the ‘square of lives saved by me’ function newly to each action instead of keeping track of how many lives they’ve saved over their existence. Then this agent would gain more utility by taking four separate actions, each of which certainly save 1 life (for 1 utility each), than from one lone action that certainly saves 15 lives (for 3.87 utility). Then generalize this example to the case where they do keep track, and progress just ‘resets’ for new clones of them. Or the real-world case where there’s multiple agents with similar values.
I describe this starting from 6 paragraphs up in my edited long comment. I’m not sure if you read it pre- or post-edit.
I suppose that is a coherent worldview but I don’t share any of the intuitions that lead you to it.
Could you describe your intuitions? ‘valuing {amount of good lives saved by one’s own effect} rather than {amount of good lives per se}’ is really unintuitive to me.
To me, risk aversion is just a way of hedging your bets about the upsides and downsides of your decision. It doesn’t make sense to me to apply risk aversion to objects that feature no risk (background facts about the world, like its size). It has nothing to do with whether we value the size of the world. It’s just that those background facts are certain, and von Neumann-Morgenstern utility functions like we are using are really designed to deal with uncertainty.
Another way to put it is that concave utility functions just mean something very different when applied to certain situations vs uncertain situations.
In the presence of certainty, saying you have a concave utility function means you genuinely place lower value on additional lives given the presence of many lives. That seems to be the position you are describing. I don’t resonate with that, because I think additional lives have constant value to me (if everything is certain).
But in the presence of uncertainty, saying that you have a concave utility function just means that you don’t like high-variance outcomes. That is the position I am taking. I don’t want to be screwed by tail outcomes. I want to hedge against them. If there were zero uncertainty, I would behave like my utility function was linear, but there is uncertainty, so I don’t.
This is so interesting to me.
I introduced this topic and wrote more about it in this shortform. I wanted to give the topic its own thread and see if others might have responses.
I do this too, but even despite the worlds size making my choices mostly only effecting value on the linear parts of my value function! Because tail outcomes are often large. (Maybe I mean something like kelly-betting/risk-aversion is often useful to fulfill instrumental subgoals too).
(Edit: and I think ‘correctly accounting for tail outcomes’ is just the correct way to deal with them).
Yes, though it’s not because additional lives are less intrinsically valuable, but because I have other values which are non-quantitative (narrative) and almost maxxed out way before there are very large numbers of lives.
A different way to say it would be that I value multiple things, but many of them don’t scale indefinitely with lives, so the overall function goes up faster at the start of the lives graph.