Great post, my comments are responding to the longer linked document.
1. I had a thought about quasilinear utilities and moral preferences. There, you say “What’s more, even if, given massive increases in wealth, people switch to using most resources on satisfying moral preferences rather than self-interested preferences, there is no guarantee that those moral preferences are for what is in fact good. They could be misguided about what is in fact good, or have ideological preferences that are stronger than their preferences for what is in fact good; their approximately linear preferences could result in building endless temples to their favoured god, rather than promoting the good.”
Still, I think there’s an argument that can be run that outcomes caused by moral preferences tend to converge / be very similar to one another. Here, the first step is to
distinguish de dicto and de re moral preferences. Second, if we imagine a group of agents with de dicto moral preferences, one argument for convergence is that each agent will be uncertain about what outcomes are morally best, and then agreement theorems suggest that once the various agents pool their information, they will tend to converge in their credence distribution over moral theories. But this means that their distribution over what outcomes are morally best will tend to converge. Connecting back to the main question, the point is that if we think that in the future most outcomes involve decision-makers being de dicto ethical, then we might expect concentration in the outcomes, if decision-makers tend to converge in their de dicto ethical credence distributions. Regardless of what is actually good, we should expect most outcomes in which agents want to be de dicto ethical to be similar, if these agents have similar credences across outcomes about the good. This only works for de dicto not de re moral preferences: if I want to help chickens and you want to help cows, no amount of pooling our information will lead to convergence in our preferences, because descriptive information won’t resolve our dispute.
2. I was a bit confused about the division of resources argument. I was thinking that if the division of resources is strong enough, that actually would support dichotomy, because then most futures are quite similar in total utility, because differences across territories will tend to be smoothed out. So the most important thing will just be to avoid extinction before the point of division (which might be the point at which serious space colonisation happens). Again, after the point of division lots of different territories will have lots of different outcomes, so that the total value will be a function of the average value across territories, and the average will be smooth and hard to influence. I think maybe where I’m getting confused is you’re imagining a division into a fairly small number of groups, the valorium-optimisers and the non-optimisers. By contrast, I’m imagining a division into vast numbers of different groups. So I think I agree that in your small-division case, dichotomy is less plausible, and pushing towards valorium-optimisation might be better than mitigating x-risk.
3. “if moral realism is false, then, morally, things in general are much lower-stakes — at least at the level of how we should use cosmic-scale resources” I think this argument is very suspicious. It fails according to quasi-realists, for example, who will reject counterfactual like “if moralism is false, then all actions are permissible.” Rather, if one has the appropriate pro-attitudes towards helping the poor, quasi-realists will go in for strong counterfactuals like “I should have helped the poor even if I had had different pro-attitudes”. Similarly, moral relativists don’t have to be subjectivists, where subjectivists can be defined as those who accept the opposite counterfactuals: if I had had pro-attitudes towards suffering, suffering would have been good. Finally, I am not identifying another notion of “stakes” here beyond “what you should do.” I also think the argument would fail for non-cognitivists who do not identify as quasi-realists, although I found those views harder to model.
4. Even if AGI had values lock-in, what about belief shifts and concept shifts? It is hard for me to imagine an AGI that was not continuing to gather new evidence about the world. But new pieces of evidence could dramatically change at least its instrumental values. In addition, new pieces of evidence / new experiences could lead to concept shifts, which effectively cause changes to its intrinsic values, by changing the meaning of the concepts involved. In one picture, for any concepts C and C*, C can shift over some amount of time and experience to have the intension currently assigned to C*. If you have that strong shifty view, then maybe in the long run even a population of AGIs with ‘fixed’ intrinsic values will tend to move around vast amounts of value space.
Also, regarding AGI lock-in, even if AGIs don’t change their values, there are still several reasons why rational agents randomize their actions (https://web.stanford.edu/~icard/random.pdf). But chance behavior by the AGI dictator could over time allow for ‘risks’ of escaping lock-in.
Also curious what you guys would make of The Narrow Corridor by Acemoglu and Robinson, which gives a systematic model of how liberal democracies stably evolved through game theoretical competition between state and society. Seems like the kind of model that could be fruitfully projected forward to think about what it would take to preserve flourishing institutions. More generally, I’d be curious to see more examples of taking existing models of how successful institutions have stably evolved, and apply them to the far future.
6. I was a bit skeptical about the argument about Infernotopia. The issue is that even though utility is not bounded from below, I think it is very unlikely that the far future could have large numbers of lives with negative utility levels. Here is one argument for why: suppose utility is negative when you prefer to die than stay alive. In that case, as long as agents in the future have the ability to end their lives, it would be very unlikely to have large numbers of people whose utility was negative. The one caveat here is the specific kind of s-risk where suffering is adversarially created as a threat against altruists; I do think this is worth hedging against. But other than that, I’m having trouble seeing why there would be vast numbers of lives not worth living.
Great post, my comments are responding to the longer linked document.
1. I had a thought about quasilinear utilities and moral preferences. There, you say “What’s more, even if, given massive increases in wealth, people switch to using most resources on satisfying moral preferences rather than self-interested preferences, there is no guarantee that those moral preferences are for what is in fact good. They could be misguided about what is in fact good, or have ideological preferences that are stronger than their preferences for what is in fact good; their approximately linear preferences could result in building endless temples to their favoured god, rather than promoting the good.”
Still, I think there’s an argument that can be run that outcomes caused by moral preferences tend to converge / be very similar to one another. Here, the first step is to
distinguish de dicto and de re moral preferences. Second, if we imagine a group of agents with de dicto moral preferences, one argument for convergence is that each agent will be uncertain about what outcomes are morally best, and then agreement theorems suggest that once the various agents pool their information, they will tend to converge in their credence distribution over moral theories. But this means that their distribution over what outcomes are morally best will tend to converge. Connecting back to the main question, the point is that if we think that in the future most outcomes involve decision-makers being de dicto ethical, then we might expect concentration in the outcomes, if decision-makers tend to converge in their de dicto ethical credence distributions. Regardless of what is actually good, we should expect most outcomes in which agents want to be de dicto ethical to be similar, if these agents have similar credences across outcomes about the good. This only works for de dicto not de re moral preferences: if I want to help chickens and you want to help cows, no amount of pooling our information will lead to convergence in our preferences, because descriptive information won’t resolve our dispute.
2. I was a bit confused about the division of resources argument. I was thinking that if the division of resources is strong enough, that actually would support dichotomy, because then most futures are quite similar in total utility, because differences across territories will tend to be smoothed out. So the most important thing will just be to avoid extinction before the point of division (which might be the point at which serious space colonisation happens). Again, after the point of division lots of different territories will have lots of different outcomes, so that the total value will be a function of the average value across territories, and the average will be smooth and hard to influence. I think maybe where I’m getting confused is you’re imagining a division into a fairly small number of groups, the valorium-optimisers and the non-optimisers. By contrast, I’m imagining a division into vast numbers of different groups. So I think I agree that in your small-division case, dichotomy is less plausible, and pushing towards valorium-optimisation might be better than mitigating x-risk.
3. “if moral realism is false, then, morally, things in general are much lower-stakes — at least at the level of how we should use cosmic-scale resources” I think this argument is very suspicious. It fails according to quasi-realists, for example, who will reject counterfactual like “if moralism is false, then all actions are permissible.” Rather, if one has the appropriate pro-attitudes towards helping the poor, quasi-realists will go in for strong counterfactuals like “I should have helped the poor even if I had had different pro-attitudes”. Similarly, moral relativists don’t have to be subjectivists, where subjectivists can be defined as those who accept the opposite counterfactuals: if I had had pro-attitudes towards suffering, suffering would have been good. Finally, I am not identifying another notion of “stakes” here beyond “what you should do.” I also think the argument would fail for non-cognitivists who do not identify as quasi-realists, although I found those views harder to model.
4. Even if AGI had values lock-in, what about belief shifts and concept shifts? It is hard for me to imagine an AGI that was not continuing to gather new evidence about the world. But new pieces of evidence could dramatically change at least its instrumental values. In addition, new pieces of evidence / new experiences could lead to concept shifts, which effectively cause changes to its intrinsic values, by changing the meaning of the concepts involved. In one picture, for any concepts C and C*, C can shift over some amount of time and experience to have the intension currently assigned to C*. If you have that strong shifty view, then maybe in the long run even a population of AGIs with ‘fixed’ intrinsic values will tend to move around vast amounts of value space.
Also, regarding AGI lock-in, even if AGIs don’t change their values, there are still several reasons why rational agents randomize their actions (https://web.stanford.edu/~icard/random.pdf). But chance behavior by the AGI dictator could over time allow for ‘risks’ of escaping lock-in.
5. Might be fruitful to engage with the very large literature on persistence effects in economics (https://economics.yale.edu/sites/default/files/understanding_persistence_ada-ns.pdf). They might have models of persistence that could be adapted to this case.
Also curious what you guys would make of The Narrow Corridor by Acemoglu and Robinson, which gives a systematic model of how liberal democracies stably evolved through game theoretical competition between state and society. Seems like the kind of model that could be fruitfully projected forward to think about what it would take to preserve flourishing institutions. More generally, I’d be curious to see more examples of taking existing models of how successful institutions have stably evolved, and apply them to the far future.
6. I was a bit skeptical about the argument about Infernotopia. The issue is that even though utility is not bounded from below, I think it is very unlikely that the far future could have large numbers of lives with negative utility levels. Here is one argument for why: suppose utility is negative when you prefer to die than stay alive. In that case, as long as agents in the future have the ability to end their lives, it would be very unlikely to have large numbers of people whose utility was negative. The one caveat here is the specific kind of s-risk where suffering is adversarially created as a threat against altruists; I do think this is worth hedging against. But other than that, I’m having trouble seeing why there would be vast numbers of lives not worth living.