Deep Democracy as a promising target for positive AGI futures

tylermjohn20 Aug 2025 12:18 UTC

64 points

Long-term future Value lock-in Philosophy Cause prioritization AI safety AI alignment Moral uncertainty Opinion

If you want the long-term future to go well by the lights of a certain value function, you might be tempted to try to align AGI(s) to your own values (broadly construed, including your deliberative values and intellectual temperaments).^[1]

Why Shouldnt I Keep It Its Mine GIF - Why Shouldnt I Keep It Its Mine I Found It - Discover & Share GIFs

Suppose that you’re not going to do that, for one of three reasons:

You can’t. People more powerful than you are going to build AGIs and you don’t have a say over that.
You object to aligning AGI(s) to your own values for principled reasons. It would be highly uncooperative, undemocratic, coercive, and basically cartoon supervillain evil.
You recognize that this behaviour would, when pursued by lots of people, lead to a race to the bottom where everyone fights to build AGI aligned to their values as fast as possible and destroys a ton of value in the process, so you want to strongly reject this kind of norm.

Then a good next-best option is Deep Democracy. What I mean by this is aligning AGI(s) to a process that is arbitrarily sensitive to every person’s entire value function. Not democracy in the sense of the current Western electoral system, but in the idealistic theoretical sense of deeply capturing and being responsive to every single person’s values. (Think about the ideal that democratic mechanisms like quadratic voting and bargaining theory are trying to capture, where democracy is basically equivalent to enlightened preference utilitarianism.)

This is basically just the first class of Political Philosophy 101: it sure would be nice if you could install your favorite benevolent dictator, wouldn’t it? Well it turns out you can’t, and even if you could that’s evil and a very dangerous policy — what if someone else does this and you get a dictator you don’t like? As civilized people, let’s agree to give everyone a seat at the table to decide what happens.

Deep Democracy has a lot of nice properties:

It avoids the ascendence of an arbitrary dictator who decides the future.
Suitably deep kinds of democracy avoid the tyranny of the majority, where if 51% of people say they want something, it happens. Instead decisions are sensitive to everyone’s values. This means that if you personally value something really weird, that doesn’t get stamped out by majority values, it still gets a place in the future.
- As a corollary, it makes outcomes sensitive to the number of people who care about something and how much they care about something.
- And it means that what you specifically care about will have some place in the long-term future, no matter what it is.
It facilitates “moral hedging” — if everyone has a say, then everyone’s moral theories get a seat at the table in a real life moral parliament, hedging against both moral uncertainty and the possibility that a wrong moral theory wins and controls everything, destroying all value in the process.
- If value is a power law or similarly distributed, then you have a high chance of at least capturing some of the stuff that is astronomically more valuable than everything else, rather than losing out on this stuff entirely.
It avoids races to the bottom because no one is advantaged by getting there first; no matter who wins, the outcome is the same.
It has a lot of resonance with enlightened political discourse around the world, making it much more achievable than some more narrow vision of the future.
Whereas Deep Democracy is fairly computationally intractable today^[2] (we have to vote on representatives who act on our interests because preference utilitarianism is computationally intractable in the real world), advances in AI will make deeper and deeper versions of democracy more and more tractable, as we have more computational power to explore our own values, find positive-sum compromises, aggregate people’s preferences and local information, and the like.
- And whereas Deep Democracy is politically intractable today because it upsets existing interested powers, if AGI shifts power structures a lot it will be more tractable to implement in the future.
And it’s of course extremely cooperative with everyone who values something.

There are reasons you could object to Deep Democracy, such as because there are other procedures that do better by the lights of your own values, or because you think evil people shouldn’t have a say on how the world goes. But I think by the lights of most value systems, in the real world, Deep Democracy is an extremely promising goal in the presence of value pluralism.

I don’t know what the best way is to practically achieve Deep Democracy. It might look like a top-down political mechanism of voting, or a rhizomatic mechanism of bargaining between AGIs who represent principals with different interests, or just really good market mechanisms facilitating trade between people under conditions of roughly equal power and wealth. It’s not obvious at all that it looks like something like aligning AI model specs to a democratic procedure or nationalizing and democratizing AGI companies, but these are possibilities as well. And there are plenty of more exotic mechanisms to consider like appointing a random lottery dictator. (It’s still Deep Democracy ex ante!) I think a lot of work needs to be done forecasting what is going to happen with AGI values and articulating and evaluating mechanisms for making the outcome one that is Deeply Democratic.

But as an ideal, Deep Democracy has a lot going for it, and it’s my current best candidate for an excellent north star that EAs and many other people could get behind and push for.

^
I don’t mean to commit to any particular vision of the future here. By “align AGI(s) to your own values” I mean something like: whether there’s an ASI singleton, a bunch of competing AGIs with different values, or a bunch of personal AGIs aligned directly to each user’s intent, the overall way that decisions are made by the world’s AGI(s) (and beyond) is aligned with your values.
^
But more tractable than many people assume. We would do much better as a society by implementing better voting procedures like approval voting and quadratic voting.

tylermjohn20 Aug 2025 12:18 UTC

64 points

32 comments3 min readEA link

Long-term future Value lock-in Philosophy Cause prioritization AI safety AI alignment Moral uncertainty Opinion

finm 20 Aug 2025 17:50 UTC
13 points
2 ∶ 1
Thanks for the post! Sure seems like an important topic. Unnecessarily grumpy thoughts to follow —

First, I feel unsure what you’re imagining this ‘deep democracy’ thing is an answer to. You write, “If you want the long-term future to go well by the lights of a certain value function[…]” Who is “you” here?

An individual? If so, it seems clear to me that the best way to bring about things I care about, holding fixed others’ strategies, is indeed to get my AI(s) to pursue things I care about, and clearly not best or desirable from any almost individual’s point of view (certainly not an equilibrium) to get their AI(s) to pursue some kind of democratic conglomeration of everybody’s interests.

An AI company? If so, I imagine (at least by default) that the companies are just trying to serve demand with more and better models, complying with regulation, and maybe proactively adding in side constraints and various revenue-sacrificing behaviours for prosocial or reputational reasons. It’s not obvious to me that there comes a point where the executives sit down to discuss: “what set of values should we align AI to?”, where the candidate answers are various (pseudo-)moral views.

In the case where we’re considering a company considering the target of alignment, I wonder if my confusion comes from starting from different assumptions about how ‘unitary’ their product is. One outcome, which I currently view as a default, is just an extrapolation of the status quo today: companies train base models, then those base models are post-trained and broken up into various specialised variants, which can in turn be fine-tuned for specific use cases, and all eventually served for a million different uses. On this picture, it’s not clear how much influence an AI company can have on the moral vision of the models they ultimately serve. The main reason is just that the vast majority of what these AIs are doing, in my mind, are just helpful or economically useful tasks based on (direct or delegated) specific human instructions, not following a grand impartial moral plan. And if the models are too eager to break ranks and pursue an abstract moral vision, people won’t use them. This is what runs through my head when people talk about “the AGI” — what is that?

Of course, there are some reasons for thinking the AI landscape will be more unitary, and this picture could be wrong. Maybe a corporate monopoly, maybe a centralised (state-led) project, maybe a coup. Let’s consider the extreme case where “you” are a lab exec, you hold total power over the world through some single alignable AI system, and you face the decision of what to tell it to do. Here I’d zoom in on the part where you say it would be “uncooperative, undemocratic, coercive” to implement your values. One rejoinder here is to make the point that, AI aside, you should (i) be morally uncertain, (ii) interested in figuring out what’s good through deliberation, and (iii) care about other people. So if the hegemon-leader had a reasonable moral view, directly implementing it through an AI hegemon doesn’t strike me as obviously worse in expectation than ceding influence. If the hegemon-leader has a moral view which is obviously bad, then I don’t think it’s very interesting that a democratic thing seems better.

In any case, I agree that the main reasons against the hegemon-leader directly implementing their vision of what’s good. But (at a gut level) this is, as you say, because it would be illegitimate, uncooperative, etc., not to mention practically likely to fail (we haven’t tested global totalitarianism, but most new political experiments fail). And I think democractic-but-hegemonic AI probably fares pretty badly by those lights, too, compared to actually just ceding power or not becoming a hegemon in the first place?

I do feel like I’m being unfair or missing something here. Maybe another reading is: look, anarchy is bad, especially when everyone has an army of AIs to carry out their bidding, and everyone is rich enough to start caring about scary, scope-sensitive, ideologically-motivated outcomes. The result is a bunch of winner-take-all conflict, and general destructive competition. So we need some governance system (national and/or global?) which curbs this destruction, but also does the best job possible aggregating people’s values. And this is what “deep democracy” should be doing.

The part of this I agree with is that, as far as we have voting systems, they could be much improved post-AGI, in a million ways. Thumbs up to people imagining what those tools and approaches could look like, to make democratic political procedures more flexible, effective, rich in information, great at finding win-win compromises, and so on.

But there’s a part that feels underspecified, and a part that I’m more sceptical of. The part that feels underspecified is what “deep democracy” actually is. The way you’ve phrased it, and I’m being a bit unfair here, is close to being good by definition (“deeply capturing and being responsive to every single person’s values” — I mean, sure!) I expect this is one of those cases where, once forced to actually specify the system, you make salient the fact that any particular system has to make tradeoffs (cf Arrow, though that’s a bit overblown).

The part I’m more sceptical of is that the anarchic alternative is chaotic and destructive, and the best way to aggregate preferencesis via setting up some centralised monopoly on force, and figuring out what centralised process it follows. Consider the international ~anarchy. War is a really unattractive option, even for neighboring expansionary states, so in theory (and often in practice) compromise is virtually always preferred to war. And that’s the hard (fully anarchic) case — smaller-scale conflict is avoided because of criminal and civil laws which make it very not worth it.

Finally, I’d suggest that, in a sense, we have a way to allocate resources and efforts towards what people want in a granular, deep, preference-aggregating way: trade. My sense is to think about this as (even today) the main means by which society is arranged to make everyone better-off; and then consider cases where centralised processes (like voting) are necessary or valuable. One example, which seems potentially very important, is if something like “moral worth” becomes even more divorced from wealth. Of course in designing a democratic process you don’t have perfectly neutral ground to stand on; you have to make a call on who gets to partake. But you can give more of a voice to people who are otherwise practically totally disenfranchised because they lack resources; while the ultra-wealthy otherwise would dominate outcomes. That’s already an issue and could become a bigger issue, but does suggest an alternative answer to the question you’re asking, which is (potentially a lot of) wealth redistribution.

If value is a power law or similarly distributed, then you have a high chance of at least capturing some of the stuff that is astronomically more valuable than everything else, rather than losing out on this stuff entirely.

Of course there are a bunch of methods in social choice which do this, like quadratic voting; though it’s notable that most electoral democracies are not good examples, and anything like “go with the majority while protecting the rights of the minority” seems apt to highly underrate cases where some voters think that a particular issue is astronimically higher in stakes than others think. But this is also a case where I don’t think there’s an especially neutral, non-theory-laden approach for how to recognise how and when to give people’s views more weight because (intuitively) they think some issue is higher stakes. Then again, I think this is a general problem, not a specific issue with designing democratic methods.

Ok sorry for the chaotically written thoughts, I think they probably understate how much I’m a fan of this line of thinking. And your comments in reply to this were clarifying and made me realise I was in fact a bit confused on a couple points. Thanks again for writing.
- tylermjohn 20 Aug 2025 21:54 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks very much for the added last four paragraphs! We’re in strong agreement re: trade being a great way to approximate granular, deep preference aggregation, particularly if you have a background of economic equality.
  I’m excited to read the linked section of No Easy Eutopia. I agree that there’s no fully neutral way to aggregate people’s preferences and preserve cardinality. But I do think there are ways that are much more neutral, and that command much broader consent, and that they can be a big improvement over alternative mechanisms.
  No problem on the chaotically written thoughts, to be fair to you my post was (due to its length) very unspecific. And that meant we could hammer out more of the details in the comments, which seems appropriate.
  - finm 21 Aug 2025 9:38 UTC
    2 points
    0 ∶ 0
    Parent
    
    And that meant we could hammer out more of the details in the comments, which seems appropriate.
    
    Agree! As I say, I feel much clearer now on your position.
- tylermjohn 20 Aug 2025 18:07 UTC
  1 point
  0 ∶ 0
  Parent
  Always appreciate your “normal technology” foil! It’s shaped my views a lot. One simple thought that you foreshadow is: we’ll have some laws about what AIs can do (can they harm animals?) and we’ll have new ways to decide on these laws enabled by AI technology. This could be a kind of discontinuity where a lot of change could happened, and where we could do things in a different and better way.
  I foreshadow a few other options, like nationalization and democratization of AI, which I expect will happen *to some degree*, but likely not that much. If you have something more singleton shaped, where one lab is leading everyone else, maybe because Google or Nvidia has so much more compute, this kind of outcome looks much more likely.
  Of course, all the labs say things about democratic inputs into AI model specs and are trying stuff to this end, and these could be deeper or shallow. But I share your skepticism that this is actually going to happen in any meaningful way...
  You might find my response to Michael’s comment helpful re: from whose perspective this makes sense. The classic idea of bargaining theory and formal defenses of democracy is that it’s in most everyone’s rational self interest because it expands the Pareto frontier, eliminating arbitrary dictatorship, avoiding wasteful races, and getting people positive sum options better than the status quo. Or see CB’s remarks: it’s really unlikely that a utilitarian thing could happen, but this might actually work! And it gets a lot of what we want, and much more than things that aren’t sensitive to niche values like utilitarianism.
  Re: what deep democracy, I like Nash bargaining! Kalai-Smorodinsky also seems fine. Quadratic voting with AI agents finding compromise solutions seems like a good approximation.
  - tylermjohn 20 Aug 2025 18:20 UTC
    3 points
    0 ∶ 0
    Parent
    By the way, for some evidence of lab execs sitting around talking about what ethical theory to align AI to (especially re: Anthropic)...
  - finm 20 Aug 2025 21:08 UTC
    2 points
    0 ∶ 0
    Parent
    Of course, all the labs say things about democratic inputs into AI model specs and are trying stuff to this end, and these could be deeper or shallow.
    Ah, that’s a good point! And then I guess both of as are in some kind of agreement that this kind of stuff (deliberate structured initatives to inject some democracy into the models) ends up majorly determining outcomes from AGI.
    >Re: what deep democracy, I like Nash bargaining!
    I find this somewhat confusing, because elsewhere you say the kind of deep democracy you are imagining “is basically equivalent to enlightened preference utilitarianism”. But the Nash solution is basically never the utility-maximising solution! You can even set things up so that the Nash solution is an arbitrarily small fraction, in terms of total welfare, between the disagreement point and the utility-maxising optimum. I do think there is an interesting and important question of how good, in practice, the Nash outcome is compared to the utility-maximing outcome, maybe it’s great ¯\_(ツ)_/¯
    - tylermjohn 20 Aug 2025 21:42 UTC
      1 point
      0 ∶ 0
      Parent
      And then I guess both of as are in some kind of agreement that this kind of stuff (deliberate structured initatives to inject some democracy into the models) ends up majorly determining outcomes from AGI.
      
      Yeah I think this is plausible and a good point of agreement, plus a promising leverage point. But I do kind of expect normal capitalist incentives will dominate anything like this, and that governments won’t intervene except for issues of safety, as you seem to.
      I find this somewhat confusing
      Nash is formally equivalent to the sum of log utilities (preferences) in the disagreement set, so it’s a prioritarian transformation of preference utilitarianism over a particular bargain.
      I agree that it can come drastically far apart from totalist utilitarianism. What I actually like about it is that it’s a principled way to give everyone’s values equal weight that preserves the cardinality in people’s value functions and is arbitrarily sensitive to changes in individuals’ values, and that it doesn’t require interpersonally comparable utilities, making it very workable. I also like that it maximizes a certain weighted sum of efficiency and equality. As an antirealist who thinks I have basically unique values, I like that it guarantees that my values have some sway over the future.
      One thing I don’t like about Nash is that it’s a logistic form of prioritarianism, and over preferences rather than people. That means that my strongest preferences don’t get that much more weight over my weakest preferences. Perhaps for that reason simple quadratic voting does better. It’s in some ways less elegantly grounded, but it’s also more well-understood by the broader world.
      I’m seeing the position as a principled way to have a fair compromise across different people’s moral viewpoints, which also happens to do pretty well by the lights of my own values. It’s not attempting to approximate classical utilitarianism directly, but instead to give me some control over the future in the areas that matter the most to me, and thereby allow me to enact classical utilitarianism. There might be better such approaches, but so far this is the one that seems most promising to me at the moment.
      - finm 21 Aug 2025 9:37 UTC
        2 points
        0 ∶ 0
        Parent
        Thanks for all this! I agree that something like Nash is appealing for a bunch of reasons. Not least because it’s Pareto efficient, so doesn’t screw people over for the greater good, which feels more politically legitimate. It is also principled, in that it doesn’t require some social planner to decide how to weigh people’s preferences or wellbeing.
        
        My sense, though you know much more about all this, is that Nash bargaining is not well described as a variant of utilitarianism, though I case it’s a grey area.
        
        Maybe I’m realising now that a lot of the action in your argument is not in arguing for the values which guide the future to be democratically chosen, but rather in thikning through which kinds of democratic mechanisms are best. Where plain old majority rule seems very unappealing, but more granular approaches which give more weight to those who care most about a given issue look much better. And (here we agree) this is especially important if you think that the wrong kind of popular future, such as a homogenous majority-determined future, could fall far short of the best future.
        tylermjohn 21 Aug 2025 9:54 UTC
        1 point
        0 ∶ 0
        Parent
        Huh, I mean it just is formally equivalent to the sum of log utilities in the bargaining situation! But “utilitarianism” is fuzzy :)
        Yes, the idea of finding a preference aggregation mechanism that does much better than modern electoral systems at capturing the cardinality of societal preferences is, I think, really core to what I’m doing here, so I probably should have brought this out a bit more than I did!
        finm 21 Aug 2025 10:09 UTC
        2 points
        0 ∶ 0
        Parent
        Yeah, fair! I guess there’s a broad understanding of utilitarianism, which is “the sum of any monotone or non-decreasing transformation of utilities”, and a narrower understanding, which is “the sum of utilities”. But I want to say that prioritarianism (a version of the former) is an alternative to utilitarianism, not a variant. Not actually sure what prioritarians would say. Also not really an important point to argue about.
        
        Glad to have highlighted the cardinality point!
        tylermjohn 21 Aug 2025 10:41 UTC
        1 point
        0 ∶ 0
        Parent
        Makes sense! There’s some old writers in the utilitarian tradition like James Griffin that define utilitarianism in the broader way, but I do think your articulation is probably more common.
  - tylermjohn 20 Aug 2025 18:29 UTC
    2 points
    0 ∶ 0
    Parent
    One further thing that might help you get in my brain a bit is that I really am thinking this as more like “what values should we be aiming at to guide the future” and being fairly agnostic on mechanism rather than something like “let’s put democracy in The AGI’s model spec”. And I really am envisioning the argument as something like: “Wow, it really seems like the future is not a utilitarian one. Maybe sensitivity to minority values like utilitarianism is the best thing we can ask for.” — rather than something like “democracy good!” And that could mean a lot of different things. On Mondays, Wednesdays, and Fridays I think it means avoiding too much centralization and aiming for highly redistributive free market based futures as an approximation of Deep Democracy.
    - finm 20 Aug 2025 21:09 UTC
      2 points
      0 ∶ 0
      Parent
      
      I really am thinking this as more like “what values should we be aiming at to guide the future” and being fairly agnostic on mechanism
      
      Thanks, super useful! And makes me much more clear on the argument / sympathetic to it.
cb 20 Aug 2025 13:30 UTC
13 points
2 ∶ 0
Could you say a bit more about the power law point?

A related thing I’ve been thinking about is that some kinds of deep democracy and some kinds of better futures-style reasoning (for sufficiently risk-neutral, utilitarian, confident in their moral views, etc etc etc kinds of agents, assume all the necessary qualifiers here) will end up being in tension — after all, why compromise between lots of moral views when this means you miss out on a bunch of feasible moral value? (More precisely, why choose the compromise it’s-just-ok future when you could optimise really hard according to the moral view you favour and have some small chance of getting almost all feasible value?)
I think that some versions of the power law point might make moral compromise look more appealing, which is why I’m interested. (I’m personally on team compromise!)
- tylermjohn 20 Aug 2025 15:42 UTC
  9 points
  0 ∶ 0
  Parent
  Yes! That is very close to the kind of idea that drove me from utilitarian supervillain towards deep democracy enjoyer. I do think it’s worth reading the whole post (it’s short), but in brief:
  - On a bunch of moral worldviews there are some futures that are astronomically more valuable than others, and they are not valued to nearly that extent in the world today, leading to the possibility of losing out on almost all value
  - For example, maybe you’re a hedonium fan and you think hedonium is 10^20 times more valuable than whatever matter is turned into by default; if you can’t get any hedonium, then the future you expect is like 10^20 times worse than it could be… ~all value has been lost
  - One way to hedge against this possibility is essentially a kind of worldview diversification, where you give power to a bunch of different moral views that then each maximize their own goals
  - Then if you’re someone with a minority viewpoint you at least get some of the world to maximize according to your values, so maybe you capture 1/1000th of value instead of 1 over 10^20
  - This only works if you have a democratic procedure that is genuinely sensitive to minority views and not democratic procedures that, say, maximize whatever 51% of people want, which leads to zero hedonium
  So if you have extremely scope sensitive, fragile values that society doesn’t have (which… probably all of us do?) then you do much better with Deep Democracy than with Normal Democracy and, arguably, than you do with a coup from an arbitrary dictator.
  - tylermjohn 20 Aug 2025 15:50 UTC
    3 points
    0 ∶ 0
    Parent
    If you have a lot of power, then gambling on a chance at dictatorship (say, proportional to your power) could be worth it and incentivized, and I think it’s important to be realistic about that to understand how the world could unfold. But there are a lot of other downsides to it, like wasteful races and the chance of being cut out of the pie as punishment for your societal defection, which do favour more democratic approaches.
MichaelDickens 20 Aug 2025 16:24 UTC
12 points
0 ∶ 0
Upvoted; I have some some concerns about this proposal but I do think “Deep Democracy”-aligned AGI would be significantly better than the default outcome (assuming alignment is solved, which is a big assumption) that AGI is aligned to whoever happens to be in control of it. And I think this is an important discussion to have.

And “make AGI democratic” seems much more tractable than “convince everyone that utilitarianism is true, and then make AGI utilitarian”.

Suitably deep kinds of democracy avoid the tyranny of the majority, where if 51% of people say they want something, it happens. Instead decisions are sensitive to everyone’s values. This means that if you personally value something really weird, that doesn’t get stamped out by majority values, it still gets a place in the future.

How does this work mechanically? Say 1% of people care about wild animal suffering, 49% care about spreading nature, and 50% don’t care about either. How do you satisfy both the 1% and the 49%? How do the 1%—who have the actually correct values—not get trampled?

You object to aligning AGI(s) to your own values for principled reasons. It would be highly uncooperative, undemocratic, coercive, and basically cartoon supervillain evil.

I value cooperation and not-being-evil, though! If I align AGI to my own values, then the AGI will be nice to everyone—probably nicer than if it’s aligned to some non-extrapolated aggregate of the values of all currently-living humans. The Michael-aligned AGI will surely be nicer to animals and digital minds than the democratically-aligned AGI would be.

(This is hypothetical of course; there will never be a situation where AGI is aligned specifically to my values.)

It also seems a bit circular because if you want to build a Deep Democracy AGI, then that means you value Deep Democracy, so you’re still aligning AGI to your values, it’s just that you value including everyone else’s values. Why is that any better than (say) building a Utilitarian AGI, which incorporates everyone’s preferences?

P.S. I appreciated the length of this article, when I saw the title I thought it was going to be a 10,000 word slog but it ended up being quite easy to read. (Observing this preference in myself makes me think I need to try harder to make my own writing shorter.)
- tylermjohn 20 Aug 2025 16:51 UTC
  4 points
  0 ∶ 0
  Parent
  Thank you, Michael!
  How does this work mechanically? Say 1% of people care about wild animal suffering, 49% care about spreading nature, and 50% don’t care about either. How do you satisfy both the 1% and the 49%? How do the 1%—who have the actually correct values—not get trampled?
  The view I like is something like Nash bargaining. You elicit values from people on a cardinal scale, give everyone’s values equal weight, and find compromise solution that maximizes group values. On Nash’s solution this means something like: everyone rates outcomes on a Likert scale (1-10) and then you pick whatever solution gives you the highest product of everyone’s numbers. (There are other approaches than taking the product, and I’m fudging a few details, but the underlying principle is the same: people rank outcomes, then you find the compromise solution that maximizes a weighted or adjusted sum of their values.) You can imagine just doing preference utilitarianism to see what this looks like in practice.
  If you have a literal conflict between values (some people want to minimize animal suffering a 10 and some people want to maximize animal suffering a 10) then they will cancel out and there’s no positive sum trade that you can make. Still, the system will be sensitive to the 1%’s values. So maybe we’ll spread nature 48% hard instead of 49% hard because of the 1% canceling out. (Not literally this, but directionally this.)
  But usually people don’t have literally opposed values and they can find positive sum compromises that have more value overall. Like, say, spreading nature but without sentience, or spreading nature but with welfare interventions, etc.
  You could also picture it as an idealized marketplace: the 1% who care about wild animal suffering pay the 49% who care about spreading nature a price that they see as fair to reduce wild animal suffering in the nature that they spread.
  Lots of different methods to consider here, but I hope the underlying idea is now less opaque.
  If I align AGI to my own values, then the AGI will be nice to everyone—probably nicer than if it’s aligned to some non-extrapolated aggregate of the values of all currently-living humans.
  It also seems a bit circular because if you want to build a Deep Democracy AGI, then that means you value Deep Democracy, so you’re still aligning AGI to your values, it’s just that you value including everyone else’s values.
  I agree that Deep Democracy is not value neutral. It presupposes some values you might not like, and will get you outcomes worse than what you value. The hope is to find a positive sum compromise that makes sense from the individual rationality of lots of people, so instead of fighting over what to maximize you maximize something that captures what a lot of people care about, expanding the Pareto frontier by finding better solutions than wasteful conflict or risking a chance of a dictator that lots of people oppose.
  Or, put another way, the idea is that it’s “better than the default outcome” not just for you but for a large majority of people, and so it’s in our interests to band together and push for this over the default outcome.
  - tylermjohn 20 Aug 2025 17:24 UTC
    1 point
    0 ∶ 0
    Parent
    (One side-bar is that you should ideally do a global Nash bargain, over all of your values rather than bargaining over particular issues like the suffering of wildlife. This is so that people can rank all of their values and get the stuff that is most important to them. If you care a lot about wild animal suffering and nothing about hedonium and I care a lot about hedonium and nothing about WAS, a good trade is that we have no wild animal suffering and lots of hedonium. This is very hard to do but theoretically optimal.)
    I have a slide deck on this solution if you’d like to see it!
    - Derek Shiller 20 Aug 2025 19:31 UTC
      9 points
      3 ∶ 0
      Parent
      We implemented a Nash bargain solution in our moral parliament and I came away the impression that the results of Nash bargaining are very sensitive to your choice of defaults and for plausible defaults true bargains can be pretty rare. Anyone who is happy with defaults gets disproportionate bargaining power. One default might be ‘no future at all’, but that’s going to make it hard to find any bargain with the anti-natalists. Another default might be ‘just more of the same’, but again, someone might like that and oppose any bargain that deviates much. Have you given much thought to picking the right default against which to measure people’s preferences? (Or is the thought that you would just exclude obstinate minorities?)
      - MinusGix 20 Aug 2025 20:47 UTC
        3 points
        1 ∶ 0
        Parent
        You (and @tylermjohn) might be interested in Diffractor’s Unifying Bargaining sequence. The sequence focuses on transferable utility games being a better target than just bargaining games, with I believe Nash being a special-case for bargaining games. As well as talking about avoiding threats in bargaining and trying to further refine.
        I think the defaults won’t matter too much. Do you have any writing on the moral parliament that talks about the defaults issue more?
        Derek Shiller 21 Aug 2025 12:25 UTC
        2 points
        0 ∶ 0
        Parent
        Thanks for the suggestion. I’m interested in the issue of dealing with threats in bargaining.
        
        I don’t think we ever published anything specifically on the defaults issue.
        
        We were focused on allocating a budget that respects the priorities of different worldviews. The central thing we were encountering was that we started by taking the defaults to be the allocation you get by giving everyone their own slice of the total budget and spending it as they wanted. Since there are often options that are well-suited to each different worldview, there is no way to get good compromises. Everyone is happier with the default than any adjustment of it. (More here.) On the other hand, if you switch the default to be some sort of neutral 0 value (assuming that can be defined), then you will get compromises, but many bargainers would rather that they just be given their own slice of the total budget to allocate.
        
        I think the importance of defaults comes through just by playing around with some numbers. Consider the difference between setting the default to be the status quo trajectory we’re currently on and setting the default to be the worst possible outcome. Suppose we have two worldviews, one of which cares about suffering in all other people linearly, and the other of which is very locally focused and doesn’t care about immense suffering elsewhere. For the two worldviews, relative to the status quo, option A might give (worldview1: 2,worldview2: 10) value and option B might give (4,6) value. Against this default, option B has a higher product (24 vs 20) and is preferred by Nash bargaining. However, relative to the worst possible value default, option A might give (10,002, 12) and option B (10,004, 8), then option A would be preferred to option B (~120k vs 80k).
      - tylermjohn 20 Aug 2025 19:39 UTC
        3 points
        0 ∶ 0
        Parent
        Nice! I’ll have to read this.
        I agree defaults are a problem, especially with large choice problems involving many people. I honestly haven’t given this much thought, and assume we’ll just have to sacrifice someone or some desideratum to get tractability, and that will kind of suck but such is life.
        I’m more wedded to Nash’s preference prioritarianism than the specific set-up, but I do see that once you get rid of Pareto efficiency relative to the disagreement point it’s not going to be individually rational for everyone to participate. Which is sad.
      - ClayShentrup 21 Aug 2025 2:10 UTC
        2 points
        0 ∶ 1
        Parent
        what do you mean “default”? you just have a utility for each option and the best option is the one that maximizes net utility.
        
        https://www.rangevoting.org/BayRegDum
        tylermjohn 21 Aug 2025 8:39 UTC
        1 point
        0 ∶ 0
        Parent
        In the traditional Nash bargaining setup you evaluate people’s utilities in options relative to the default scenario, and only consider options that make everyone at least as well off. This makes it individually rational for everyone to participate because they will be made better off by the bargain. That’s different from, say, range voting.
- ClayShentrup 21 Aug 2025 0:52 UTC
  1 point
  0 ∶ 1
  Parent
  It also seems a bit circular because if you want to build a Deep Democracy AGI, then that means you value Deep Democracy, so you’re still aligning AGI to your values
  no, you’re aligning it to what everyone values.
  J.C.Harsanyi, in a 2-page article involving no mathematics whatever [J.Political Economy 61,5 (1953) 434-435], came up with the following nice idea: “Optimizing social welfare” means “picking the state of the world all individuals would prefer if they were in a state of uncertainty about their identity.” I.e. if you are equally likely to be anybody, then your expected utility is the summed utility in the world divided by the number of people in it – i.e. average utility. Then by the linear-lottery property (Lin) of von Neumann utility, it follows that social utility is averaging.
  source
MinusGix 20 Aug 2025 14:26 UTC
5 points
1 ∶ 0
I generally agree with this post that it is a good target. This is effectively equivalent to Coherent Extrapolated Volition^[1], which has been proposed as a target for alignment. The difference is whether you consider extrapolated values by default or not.
1. ^
  https://www.lesswrong.com/w/coherent-extrapolated-volition-alignment-target
michel 21 Aug 2025 5:08 UTC
4 points
0 ∶ 0
Nice post. I’ve been vaguely thinking in this direction so it’s helpful to have a name for this idea and some early food for thought.
VeryJerry 28 Aug 2025 14:17 UTC
3 points
0 ∶ 0
This is a good idea, pointed at by ideas like sentientism and the veil of ignorance (https://forum.effectivealtruism.org/posts/fcM7nshyCiKCadiGi/how-the-veil-of-ignorance-grounds-sentientism) but the problem is, who counts as a “person” whose utility function “gets a say” in deep democracy? If it’s only humans, that’s just blatantly arbitrary and speciesist. If we just say vertebrates, we rule out beings that clearly have a stake and a utility function, like octopi. But if we make a properly general filter and include all sentient beings, or in other words, all beings that can experience positive or negative valence, or in other words all beings that have a utility function at all, then we have to include insects and maybe even springtails and nematodes, and suddenly the interests of humans are a rounding error in the aggregate utility function. Most humans don’t like that and would see that as a tyranny of the majority
- tylermjohn 28 Aug 2025 17:10 UTC
  3 points
  1 ∶ 0
  Parent
  Yup. The constitution of the democratic community is inherently value laden. Even prioritizing conscious beings or beings with preferences is a value judgment. I don’t think there’s any option here but to debate and argue over who gets a seat at the table in a realpolitik kind of way, and then use deep democracy to extend standing to other beings. If, for example, only adult humans vote, you and I can still use our votes to extend political standing to animals, and if there is no tyranny of the majority and democracy does justice to the cardinality in people’s preferences, then people who care a lot about animals can give them a correspondingly large amount of status.
ClayShentrup 21 Aug 2025 0:44 UTC
−4 points
0 ∶ 1
this was my premise much more so with election by jury.
https://www.electionbyjury.org/manifesto
full disclosure: i co-founded the center for election science which advocates approval voting.
quadratic voting has been pretty deeply debunked.
- Jonathan B 2 Dec 2025 15:50 UTC
  5 points
  0 ∶ 0
  Parent
  quadratic voting has been pretty deeply debunked.
  What do you mean by this? What specific claims have been “debunked” and in what way?