How does this work mechanically? Say 1% of people care about wild animal suffering, 49% care about spreading nature, and 50% don’t care about either. How do you satisfy both the 1% and the 49%? How do the 1%—who have the actually correct values—not get trampled?
The view I like is something like Nash bargaining. You elicit values from people on a cardinal scale, give everyone’s values equal weight, and find compromise solution that maximizes group values. On Nash’s solution this means something like: everyone rates outcomes on a Likert scale (1-10) and then you pick whatever solution gives you the highest product of everyone’s numbers. (There are other approaches than taking the product, and I’m fudging a few details, but the underlying principle is the same: people rank outcomes, then you find the compromise solution that maximizes a weighted or adjusted sum of their values.) You can imagine just doing preference utilitarianism to see what this looks like in practice.
If you have a literal conflict between values (some people want to minimize animal suffering a 10 and some people want to maximize animal suffering a 10) then they will cancel out and there’s no positive sum trade that you can make. Still, the system will be sensitive to the 1%’s values. So maybe we’ll spread nature 48% hard instead of 49% hard because of the 1% canceling out. (Not literally this, but directionally this.)
But usually people don’t have literally opposed values and they can find positive sum compromises that have more value overall. Like, say, spreading nature but without sentience, or spreading nature but with welfare interventions, etc.
You could also picture it as an idealized marketplace: the 1% who care about wild animal suffering pay the 49% who care about spreading nature a price that they see as fair to reduce wild animal suffering in the nature that they spread.
Lots of different methods to consider here, but I hope the underlying idea is now less opaque.
If I align AGI to my own values, then the AGI will be nice to everyone—probably nicer than if it’s aligned to some non-extrapolated aggregate of the values of all currently-living humans.
It also seems a bit circular because if you want to build a Deep Democracy AGI, then that means you value Deep Democracy, so you’re still aligning AGI to your values, it’s just that you value including everyone else’s values.
I agree that Deep Democracy is not value neutral. It presupposes some values you might not like, and will get you outcomes worse than what you value. The hope is to find a positive sum compromise that makes sense from the individual rationality of lots of people, so instead of fighting over what to maximize you maximize something that captures what a lot of people care about, expanding the Pareto frontier by finding better solutions than wasteful conflict or risking a chance of a dictator that lots of people oppose.
Or, put another way, the idea is that it’s “better than the default outcome” not just for you but for a large majority of people, and so it’s in our interests to band together and push for this over the default outcome.
(One side-bar is that you should ideally do a global Nash bargain, over all of your values rather than bargaining over particular issues like the suffering of wildlife. This is so that people can rank all of their values and get the stuff that is most important to them. If you care a lot about wild animal suffering and nothing about hedonium and I care a lot about hedonium and nothing about WAS, a good trade is that we have no wild animal suffering and lots of hedonium. This is very hard to do but theoretically optimal.)
I have a slide deck on this solution if you’d like to see it!
We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be ‘no future at all’, but that’s going to make it hard to find any bargain with the anti-natalists. Another default might be ‘just more of the same’, but again, someone might like that and oppose any bargain that deviates much. Have you given much thought to picking the right default against which to measure people’s preferences? (Or is the thought that you would just exclude obstinate minorities?)
You (and @tylermjohn) might be interested in Diffractor’s Unifying Bargaining sequence. The sequence focuses on transferable utility games being a better target than just bargaining games, with I believe Nash being a special-case for bargaining games. As well as talking about avoiding threats in bargaining and trying to further refine. I think the defaults won’t matter too much. Do you have any writing on the moral parliament that talks about the defaults issue more?
Thanks for the suggestion. I’m interested in the issue of dealing with threats in bargaining.
I don’t think we ever published anything specifically on the defaults issue.
We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get good compromises. Everyone is happier with the default than any adjustment of it. (More here.) On the other hand, if you switch the default to be some sort of neutral 0 value (assuming that can be defined), then you will get compromises, but many bargainers would rather that they just be given their own slice of the total budget to allocate.
I think the importance of defaults comes through just by playing around with some numbers. Consider the difference between setting the default to be the status quo trajectory we’re currently on and setting the default to be the worst possible outcome. Suppose we have two worldviews, one of which cares about suffering in all other people linearly, and the other of which is very locally focused and doesn’t care about immense suffering elsewhere. For the two worldviews, relative to the status quo, option A might give (worldview1: 2,worldview2: 10) value and option B might give (4,6) value. Against this default, option B has a higher product (24 vs 20) and is preferred by Nash bargaining. However, relative to the worst possible value default, option A might give (10,002, 12) and option B (10,004, 8), then option A would be preferred to option B (~120k vs 80k).
I agree defaults are a problem, especially with large choice problems involving many people. I honestly haven’t given this much thought, and assume we’ll just have to sacrifice someone or some desideratum to get tractability, and that will kind of suck but such is life.
I’m more wedded to Nash’s preference prioritarianism than the specific set-up, but I do see that once you get rid of Pareto efficiency relative to the disagreement point it’s not going to be individually rational for everyone to participate. Which is sad.
In the traditional Nash bargaining setup you evaluate people’s utilities in options relative to the default scenario, and only consider options that make everyone at least as well off. This makes it individually rational for everyone to participate because they will be made better off by the bargain. That’s different from, say, range voting.
Thank you, Michael!
The view I like is something like Nash bargaining. You elicit values from people on a cardinal scale, give everyone’s values equal weight, and find compromise solution that maximizes group values. On Nash’s solution this means something like: everyone rates outcomes on a Likert scale (1-10) and then you pick whatever solution gives you the highest product of everyone’s numbers. (There are other approaches than taking the product, and I’m fudging a few details, but the underlying principle is the same: people rank outcomes, then you find the compromise solution that maximizes a weighted or adjusted sum of their values.) You can imagine just doing preference utilitarianism to see what this looks like in practice.
If you have a literal conflict between values (some people want to minimize animal suffering a 10 and some people want to maximize animal suffering a 10) then they will cancel out and there’s no positive sum trade that you can make. Still, the system will be sensitive to the 1%’s values. So maybe we’ll spread nature 48% hard instead of 49% hard because of the 1% canceling out. (Not literally this, but directionally this.)
But usually people don’t have literally opposed values and they can find positive sum compromises that have more value overall. Like, say, spreading nature but without sentience, or spreading nature but with welfare interventions, etc.
You could also picture it as an idealized marketplace: the 1% who care about wild animal suffering pay the 49% who care about spreading nature a price that they see as fair to reduce wild animal suffering in the nature that they spread.
Lots of different methods to consider here, but I hope the underlying idea is now less opaque.
I agree that Deep Democracy is not value neutral. It presupposes some values you might not like, and will get you outcomes worse than what you value. The hope is to find a positive sum compromise that makes sense from the individual rationality of lots of people, so instead of fighting over what to maximize you maximize something that captures what a lot of people care about, expanding the Pareto frontier by finding better solutions than wasteful conflict or risking a chance of a dictator that lots of people oppose.
Or, put another way, the idea is that it’s “better than the default outcome” not just for you but for a large majority of people, and so it’s in our interests to band together and push for this over the default outcome.
(One side-bar is that you should ideally do a global Nash bargain, over all of your values rather than bargaining over particular issues like the suffering of wildlife. This is so that people can rank all of their values and get the stuff that is most important to them. If you care a lot about wild animal suffering and nothing about hedonium and I care a lot about hedonium and nothing about WAS, a good trade is that we have no wild animal suffering and lots of hedonium. This is very hard to do but theoretically optimal.)
I have a slide deck on this solution if you’d like to see it!
We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be ‘no future at all’, but that’s going to make it hard to find any bargain with the anti-natalists. Another default might be ‘just more of the same’, but again, someone might like that and oppose any bargain that deviates much. Have you given much thought to picking the right default against which to measure people’s preferences? (Or is the thought that you would just exclude obstinate minorities?)
You (and @tylermjohn) might be interested in Diffractor’s Unifying Bargaining sequence. The sequence focuses on transferable utility games being a better target than just bargaining games, with I believe Nash being a special-case for bargaining games. As well as talking about avoiding threats in bargaining and trying to further refine.
I think the defaults won’t matter too much. Do you have any writing on the moral parliament that talks about the defaults issue more?
Thanks for the suggestion. I’m interested in the issue of dealing with threats in bargaining.
I don’t think we ever published anything specifically on the defaults issue.
We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get good compromises. Everyone is happier with the default than any adjustment of it. (More here.) On the other hand, if you switch the default to be some sort of neutral 0 value (assuming that can be defined), then you will get compromises, but many bargainers would rather that they just be given their own slice of the total budget to allocate.
I think the importance of defaults comes through just by playing around with some numbers. Consider the difference between setting the default to be the status quo trajectory we’re currently on and setting the default to be the worst possible outcome. Suppose we have two worldviews, one of which cares about suffering in all other people linearly, and the other of which is very locally focused and doesn’t care about immense suffering elsewhere. For the two worldviews, relative to the status quo, option A might give (worldview1: 2,worldview2: 10) value and option B might give (4,6) value. Against this default, option B has a higher product (24 vs 20) and is preferred by Nash bargaining. However, relative to the worst possible value default, option A might give (10,002, 12) and option B (10,004, 8), then option A would be preferred to option B (~120k vs 80k).
Nice! I’ll have to read this.
I agree defaults are a problem, especially with large choice problems involving many people. I honestly haven’t given this much thought, and assume we’ll just have to sacrifice someone or some desideratum to get tractability, and that will kind of suck but such is life.
I’m more wedded to Nash’s preference prioritarianism than the specific set-up, but I do see that once you get rid of Pareto efficiency relative to the disagreement point it’s not going to be individually rational for everyone to participate. Which is sad.
what do you mean “default”? you just have a utility for each option and the best option is the one that maximizes net utility.
https://www.rangevoting.org/BayRegDum
In the traditional Nash bargaining setup you evaluate people’s utilities in options relative to the default scenario, and only consider options that make everyone at least as well off. This makes it individually rational for everyone to participate because they will be made better off by the bargain. That’s different from, say, range voting.