I am a Research Fellow at Forethought. Before that I was an analyst at Longview Philanthropy, a Research Scholar at FHI, and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
I am a Research Fellow at Forethought. Before that I was an analyst at Longview Philanthropy, a Research Scholar at FHI, and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
Yeah, fair! I guess there’s a broad understanding of utilitarianism, which is “the sum of any monotone or non-decreasing transformation of utilities”, and a narrower understanding, which is “the sum of utilities”. But I want to say that prioritarianism (a version of the former) is an alternative to utilitarianism, not a variant. Not actually sure what prioritarians would say. Also not really an important point to argue about.
Glad to have highlighted the cardinality point!
And that meant we could hammer out more of the details in the comments, which seems appropriate.
Agree! As I say, I feel much clearer now on your position.
Thanks for all this! I agree that something like Nash is appealing for a bunch of reasons. Not least because it’s Pareto efficient, so doesn’t screw people over for the greater good, which feels more politically legitimate. It is also principled, in that it doesn’t require some social planner to decide how to weigh people’s preferences or wellbeing.
My sense, though you know much more about all this, is that Nash bargaining is not well described as a variant of utilitarianism, though I case it’s a grey area.
Maybe I’m realising now that a lot of the action in your argument is not in arguing for the values which guide the future to be democratically chosen, but rather in thikning through which kinds of democratic mechanisms are best. Where plain old majority rule seems very unappealing, but more granular approaches which give more weight to those who care most about a given issue look much better. And (here we agree) this is especially important if you think that the wrong kind of popular future, such as a homogenous majority-determined future, could fall far short of the best future.
I really am thinking this as more like “what values should we be aiming at to guide the future” and being fairly agnostic on mechanism
Thanks, super useful! And makes me much more clear on the argument / sympathetic to it.
Of course, all the labs say things about democratic inputs into AI model specs and are trying stuff to this end, and these could be deeper or shallow.
Ah, that’s a good point! And then I guess both of as are in some kind of agreement that this kind of stuff (deliberate structured initatives to inject some democracy into the models) ends up majorly determining outcomes from AGI.
>Re: what deep democracy, I like Nash bargaining!
I find this somewhat confusing, because elsewhere you say the kind of deep democracy you are imagining “is basically equivalent to enlightened preference utilitarianism”. But the Nash solution is basically never the utility-maximising solution! You can even set things up so that the Nash solution is an arbitrarily small fraction, in terms of total welfare, between the disagreement point and the utility-maxising optimum. I do think there is an interesting and important question of how good, in practice, the Nash outcome is compared to the utility-maximing outcome, maybe it’s great ¯\_(ツ)_/¯
Thanks for the post! Sure seems like an important topic. Unnecessarily grumpy thoughts to follow —
First, I feel unsure what you’re imagining this ‘deep democracy’ thing is an answer to. You write, “If you want the long-term future to go well by the lights of a certain value function[…]” Who is “you” here?
An individual? If so, it seems clear to me that the best way to bring about things I care about, holding fixed others’ strategies, is indeed to get my AI(s) to pursue things I care about, and clearly not best or desirable from any almost individual’s point of view (certainly not an equilibrium) to get their AI(s) to pursue some kind of democratic conglomeration of everybody’s interests.
An AI company? If so, I imagine (at least by default) that the companies are just trying to serve demand with more and better models, complying with regulation, and maybe proactively adding in side constraints and various revenue-sacrificing behaviours for prosocial or reputational reasons. It’s not obvious to me that there comes a point where the executives sit down to discuss: “what set of values should we align AI to?”, where the candidate answers are various (pseudo-)moral views.
In the case where we’re considering a company considering the target of alignment, I wonder if my confusion comes from starting from different assumptions about how ‘unitary’ their product is. One outcome, which I currently view as a default, is just an extrapolation of the status quo today: companies train base models, then those base models are post-trained and broken up into various specialised variants, which can in turn be fine-tuned for specific use cases, and all eventually served for a million different uses. On this picture, it’s not clear how much influence an AI company can have on the moral vision of the models they ultimately serve. The main reason is just that the vast majority of what these AIs are doing, in my mind, are just helpful or economically useful tasks based on (direct or delegated) specific human instructions, not following a grand impartial moral plan. And if the models are too eager to break ranks and pursue an abstract moral vision, people won’t use them. This is what runs through my head when people talk about “the AGI” — what is that?
Of course, there are some reasons for thinking the AI landscape will be more unitary, and this picture could be wrong. Maybe a corporate monopoly, maybe a centralised (state-led) project, maybe a coup. Let’s consider the extreme case where “you” are a lab exec, you hold total power over the world through some single alignable AI system, and you face the decision of what to tell it to do. Here I’d zoom in on the part where you say it would be “uncooperative, undemocratic, coercive” to implement your values. One rejoinder here is to make the point that, AI aside, you should (i) be morally uncertain, (ii) interested in figuring out what’s good through deliberation, and (iii) care about other people. So if the hegemon-leader had a reasonable moral view, directly implementing it through an AI hegemon doesn’t strike me as obviously worse in expectation than ceding influence. If the hegemon-leader has a moral view which is obviously bad, then I don’t think it’s very interesting that a democratic thing seems better.
In any case, I agree that the main reasons against the hegemon-leader directly implementing their vision of what’s good. But (at a gut level) this is, as you say, because it would be illegitimate, uncooperative, etc., not to mention practically likely to fail (we haven’t tested global totalitarianism, but most new political experiments fail). And I think democractic-but-hegemonic AI probably fares pretty badly by those lights, too, compared to actually just ceding power or not becoming a hegemon in the first place?
I do feel like I’m being unfair or missing something here. Maybe another reading is: look, anarchy is bad, especially when everyone has an army of AIs to carry out their bidding, and everyone is rich enough to start caring about scary, scope-sensitive, ideologically-motivated outcomes. The result is a bunch of winner-take-all conflict, and general destructive competition. So we need some governance system (national and/or global?) which curbs this destruction, but also does the best job possible aggregating people’s values. And this is what “deep democracy” should be doing.
The part of this I agree with is that, as far as we have voting systems, they could be much improved post-AGI, in a million ways. Thumbs up to people imagining what those tools and approaches could look like, to make democratic political procedures more flexible, effective, rich in information, great at finding win-win compromises, and so on.
But there’s a part that feels underspecified, and a part that I’m more sceptical of. The part that feels underspecified is what “deep democracy” actually is. The way you’ve phrased it, and I’m being a bit unfair here, is close to being good by definition (“deeply capturing and being responsive to every single person’s values” — I mean, sure!) I expect this is one of those cases where, once forced to actually specify the system, you make salient the fact that any particular system has to make tradeoffs (cf Arrow, though that’s a bit overblown).
The part I’m more sceptical of is that the anarchic alternative is chaotic and destructive, and the best way to aggregate preferencesis via setting up some centralised monopoly on force, and figuring out what centralised process it follows. Consider the international ~anarchy. War is a really unattractive option, even for neighboring expansionary states, so in theory (and often in practice) compromise is virtually always preferred to war. And that’s the hard (fully anarchic) case — smaller-scale conflict is avoided because of criminal and civil laws which make it very not worth it.
Finally, I’d suggest that, in a sense, we have a way to allocate resources and efforts towards what people want in a granular, deep, preference-aggregating way: trade. My sense is to think about this as (even today) the main means by which society is arranged to make everyone better-off; and then consider cases where centralised processes (like voting) are necessary or valuable. One example, which seems potentially very important, is if something like “moral worth” becomes even more divorced from wealth. Of course in designing a democratic process you don’t have perfectly neutral ground to stand on; you have to make a call on who gets to partake. But you can give more of a voice to people who are otherwise practically totally disenfranchised because they lack resources; while the ultra-wealthy otherwise would dominate outcomes. That’s already an issue and could become a bigger issue, but does suggest an alternative answer to the question you’re asking, which is (potentially a lot of) wealth redistribution.
If value is a power law or similarly distributed, then you have a high chance of at least capturing some of the stuff that is astronomically more valuable than everything else, rather than losing out on this stuff entirely.
Of course there are a bunch of methods in social choice which do this, like quadratic voting; though it’s notable that most electoral democracies are not good examples, and anything like “go with the majority while protecting the rights of the minority” seems apt to highly underrate cases where some voters think that a particular issue is astronimically higher in stakes than others think. But this is also a case where I don’t think there’s an especially neutral, non-theory-laden approach for how to recognise how and when to give people’s views more weight because (intuitively) they think some issue is higher stakes. Then again, I think this is a general problem, not a specific issue with designing democratic methods.
Ok sorry for the chaotically written thoughts, I think they probably understate how much I’m a fan of this line of thinking. And your comments in reply to this were clarifying and made me realise I was in fact a bit confused on a couple points. Thanks again for writing.
as people get richer and happier and wiser and so forth, they just have more time and interest and mental/emotional capacity to think carefully about ethics and act accordingly
I think that’s right as far as it goes, but it’s worth considering the limiting behaviour with wealth, which is (as the ultra-wealthy show) presumably that the activity of thinking about ethics still competes with other “leisure time” activities, and behaviour is still sensitive to non-moral incentives like through social competition. But also note that the ways in which it becomes cheaper to help others as society gets richer are going to tend to be the ways in which it becomes cheaper for others to help themselves (or ways in which people’s lives just get better without much altruism). That’s not always true, like in the case of animals.
Good point about persuasion. I guess one way of saying that back, is that (i) if the “right” or just “less bad” moral views are on average the most persuasive views, and (ii) at least some people are generating them, then they will win out. One worry is that (i) isn’t true, because other bad views are more memetically fit, even in a society of people with access to very good abstract reasoning abilities.
Thanks Oscar!
I believe this is correct, but possibly for the wrong reason.
The reason you give sounds right — for certain concave views, you really care that at least some people are bringing about or acting on the things you care about. One point, though, is that (as we discuss in the previous post) reasonable views which are concave in the good direction are going to be more sensitive (higher magnitude or less concave) in the negative direction. If you have such a view, you might also think that there are particular ways to bring about a lot of disvalue, so the expected quantity of bads could scale similarly to goods with the number of actors.
Another possibility is that you don’t need to have a concave view to value increasing the number of actors trading with one another. If there are very few actors, the amount by which a given actor can multiply the value they can achieve by their own (even linear) lights, compared to the no-trade case, may be lower, because they have more opporunities for trade. But I haven’t thought this through and it could be wrong.
The choice of two dimensions is unmotivated
Yep, totally arbitrary choice to suit the diagram. I’ve made a note to clarify that! Agree it generalises, and indeed people diverge more and more quickly on average in higher dimensions.
Also note 4.2. in ‘How to Make the Future Better’ (and footnote 32) —
[O]n this model, making each factor more correlated can dramatically improve the expected value of the future — without improving the expected value of any individual factor at all.
Which could look like “averting a catstrophic disruption of an otherwise convergent win”.
Thanks for writing this James, I think you raise good points, and for the most part I think you’re accurately representing what we say in the piece. Some scattered thoughts —
I agree that you can’t blithely assume scaling trends will continue, or that scaling “laws” will continue to hold. I don’t think both assumptions are entirely unmotivated, because the trends have already spanned many orders of magnitude. E.g. pretraining compute has spanned ≈ 1e8 OOMs of FLOP since the introduction of the transformer, and scaling “laws” for given measures of performance hold up across almost as many OOMs, experimentally and practically. Presumably the trends have to break down eventually, but if all you knew were the trends so far, it seems reasonable to spread your guesses over many future OOMs.
Of course we know more than big-picture trends. As you say, GPT 4.5 seemed pretty disappointing, on benchmarks and just qualitatively. People at labs are making noises about pretraining scaling as we know it slowing down. I do think short timelines[1] look less likely than they did 6 months or a year ago. Progress is often made by a series of overlapping s-curves, and the source of future gains could come from elsewhere, like inference scaling. But we don’t give specific reasons to expect that in the piece.
On the idea of a “software-only intelligence explosion”, this piece by Daniel Eth and Tom Davidson gives more detailed considerations which are specific to ML R&D. It’s not totally obvious to me why you should expect returns to diminish faster in broad vs narrow domains. In very narrow domains (e.g. checkers-playing programs), you might intuitively think that the ‘well’ of creative ways to improve performance is shallower, so further improvements beyond the most obvious are less useful. That said, I’m not strongly sold on the “software-only intelligence explosion” idea; I think it’s not at all guaranteed (maybe I’m 40% on it?) but worth having on the radar.
I also agree that benchmarks don’t (necessarily) generalise well to real-world tasks. In fact I think this is one of the cruxes between people who expect AI progress to quickly transform the world in various ways. I do think the cognitive capabilities of frontier LLMs can meaningfully be described as “broad” in the sense that: if I tried to think of a long list of tasks and questions which span an intuitively broad range of “cognitive skills” before knowing about the skill distribution of LLMs[2] then I expect the best LLMs to perform well across the large majority of those tasks and questions.
But I agree there is a gap between impressive benchmark scores and real-world usefulness. Partly there’s a kind of o-ring dynamic, where the speed of progress is gated by the slowest-moving critical processes — speeding up tasks that normally take 20% of a researcher’s time by 1,000x won’t speed them up by more than 25% if the other tasks can’t be substituted. Partly there are specific competences which AI models don’t have, like reliably carrying out long-range tasks. Partly (and relatedly) there’s an issue of access to implicit knowledge and other bespoke training data (like examples of expert practitioners carrying out long time-horizon tasks) which are scarce in the public internet training data. And so on.
Note that (unless we badly misphrased something) we are not arguing for “AGI by 2030” in the piece you are discussing. As always, it depends on what “AGI” means, but if it means a world where tech progress is advancing across the board at 10x rates, or a time where there is basically no cognitive skill where any human still outperforms some AI, then I also think that’s unlikely by 2030.
One thing I should emphasise, and maybe we should have emphasised more, is the possibility of faster (say 10x) growth rates from continued AI progress. Where I think the best accounts of why this could happen do not depend on AI becoming superhuman in some very comprehensive way, or on a software-only intelligence explosion happening. The main thing we were trying to argue is that sustained AI progress could significantly accelerate tech progress and growth, after the point where (if it ever happens) the contribution to research from AI reaches some kind of parity with humans. I do think this is plausible, though not overwhelmingly likely.
Thanks again for writing this.
Thanks for sharing, though I have to say I’m a little sceptical of this line of thought.
If we’re considering our Solar System, I expect almost all aquaculture (and other animal farming) to remain on Earth compared to other planets, indefinitely.
In the short-run this is because I expect at all self-sustaining settlements inhabited by humans beyond Earth to be very difficult and kind of pointless. Every last gram of payload will have to be justified, and water will be scarce and valuable. If there are any more calorie-efficient ways to grow or make food, compared to farming animals, then at least initially I don’t see how there would be animal farming.
And then if (say) a city on Mars really does become advanced enough to justify animal farming, I would expect at that point we’d have bioreactors to grow the animal product directly, without the energy wastage of movement, metabolism, and growing a brain and other discarded organs.[1]
I also think this applies to the very long-run, beyond our Solar System. I personally struggle to picture a civilisation advanced enough to settle new stars, but primitive enough to choose to raise and slaughter animals. Not even “primitive” in a moral sense; more like technologically inept in this very specific way!
I also think there needs to be a specific mechanism of lock-in, in order to think that the decision to farm animals off-Earth (or deliberately choosing not to) should strongly influence long-run treatment of animals. I’d expect the more important factor is humanity’s general attitudes to and treatment of animals.
I do buy that there would be something symbolically significant about the relevant actors explicitly choosing not to farm animals off-Earth, though, that could resonate a lot (including for animal conditions back on Earth).
(This comment might also be partly relevant)
Honestly I think it’s notable how many startups and even academic projects get funding based on claims that they’re building some component of a mission to Mars or the Moon, based on assumptions which strike me as completely wild and basically made-up.
Thanks!
should this be “does make sense”?
No
Great question, maybe someone should set up some kind of moral trade ‘exchange’ platform.
Quick related PSA in case it’s not obvious: typically donation swapping for political campaigns (i.e. with individual contribution caps) is extremely illegal…
This is a stupid analogy! (Traffic accidents aren’t very likely.)
Oh, I didn’t mean to imply that I think AI takeover risk is on par with traffic accident-risk. I was just illustrating the abstract point that the mere presence of a mission-ending risk doesn’t imply spending everything to prevent it. I am guessing you agree with this abstract point (but furthermore think that AI takeover risk is extremely high, and as such we should ~entirely focus on preventing it).
I think Wei Dei’s reply articulates my position well:
Maybe I’m splitting hairs, but “x-risk could be high this century as a result of AI” is not the same claim as “x-risk from AI takeover is high this century”, and I read you as making the latter claim (obviously I can’t speak for Wei Dai).
No, the correct reply is that dolphins won’t run the world because they can’t develop technology
That’s right, and I do think the dolphin example was too misleading and straw-man-ish. The point I was trying to illustrate, though, is not that there is no way to refute the dolphin theory, but that failing to adequately describe the alternative outcome(s) doesn’t especially support the dolphin theory, because trying to accurately describe the future is just generally extremely hard.
No, but they had sound theoretical arguments. I’m saying these are lacking when it comes to why it’s possible to align/control/not go extinct from ASI.
Got it. I guess I see things as messier than this — I see people with very high estimates of AI takeover risk advancing arguments, and I see others advancing skeptical counter-arguments (example), and before engaging with these arguments a lot and forming one’s own views, I think it’s not obvious which sets of arguments are fundamentally unsound.
But it’s worse than this, because the only viable solution to avoid takeover is to stop building ASI, in which case the non-takeover work is redundant (we can mostly just hope to luck out with one of the exotic factors).
Makes sense.
Thanks for the comment. I agree that if you think AI takeover is the overwhelmingly most likely outcome from developing ASI, then preventing takeover (including by preventing ASI) should be your strong focus. Some comments, though —
Just because failing at alignment undermines ~every other issue, doesn’t mean that working on alignment is the only or overwehelmingly most important thing.[1] Tractability and likelihood also matters.
I’m not sure I buy that things are so stark as “there are no arguments against AI takeover”, see e.g. Katja Grace’s post here. I also think there are cases where someone presents you with an argument that superficially drives toward a conclusion that sounds unlikely, and it’s legitimate to be skeptical of the conclusion even if you can’t spell out exactly where the argument is going wrong (e.g. the two-envelope “paradox”). That’s not to say you can justify not engaging with the theoretical arguments whenever you’re uncomfortable with where they point, just that humility deducing bold claims about the future on theoretical grounds cuts both ways.
Relatedly, I don’t think you don’t need to be able to describe alternative outcomes in detail to reject a prediction about how the world goes. If I tell someone the world will be run by dolphins in the year 2050, and they disagree, I can reply, “oh yeah, well you tell me what the world looks like in 2050”, and their failure to describe their median world in detail doesn’t strongly support the dolphin hypothesis.[2]
“Default” doesn’t necessarily mean “unconditionally likely” IMO. Here I take it to mean something more like “conditioning on no specific response and/or targeted countermeasures”. Though I guess it’s baked into the meaning of “default” that it’s unconditionally plausible (like, ⩾5%?) — it would be misleading to say “the default outcome from this road trip is that we all die (if we don’t steer out of oncoming traffic)”.
In theory, one could work on making outcomes from AI takeover less bad, as well as making them less likely (though less clear what this looks like).
Altogether, I think you’re coming from a reasonable but different position, that takeover risk from ASI is very high (sounds like 60–99% given ASI?) I agree that kinds of preparedness not focused on avoiding takeover look less important on this view (largely because they matter in fewer worlds). I do think this axis of disagreement might not be as sharp as it seems, though — suppose person A has 60% p(takeover) and person B is on 1%. Assuming the same marginal tractability and neglectedness between takeover and non-takeover work, person A thinks that takeover-focused work is 60× more important; but non-takeover work is 40/99≈0.4 times as important, compared to person B.
By (stupid) analogy, all the preparations for a wedding would be undermined if the couple got into a traffic accident on the way to the ceremony; this does not justify spending ~all the wedding budget on car safety.
Again by analogy, there were some superficially plausible arguments in the 1970s or thereabouts that population growth would exceed the world’s carrying capacity, and we’d run out of many basic materials, and there would be a kind of system collapse by 2000. The opponents of these arguments were not able to describe the ways that the world could avoid these dire fates in detail (they could not describe the specific tech advances which could raise agricultural productivity, or keep materials prices relatively level, for instance).
Thanks for writing this! Some personal thoughts:
I have a fair amount of research latitude, and I’m working at an org with a broad and flexible remit to try to identify and work on the most important questions. This makes the Hamming question — what are the most important questions in your field and why aren’t you working on them — hard to avoid! This is uncomfortable, because if you don’t feel like you’re doing useful work, you’re out of excuses. But it’s also very motivating.
There is an ‘agenda’ in the sense that there’s a list of questions and directions with some consensus that someone at Forethought should work on them. But there’s a palpable sense that the bottleneck to progress isn’t just more researchers to shlep on with writing up ideas, so much as more people with crisp, opinionated takes about what’s important, who can defend their views in good faith.
One possible drawback is that Forethought is not a place where you learn well-scoped skills or knowledge by default, because as a researcher you are not being trained for a particular career track (like junior → senior SWE) or taught a course (like doing a PhD). But there is support and time for self-directed learning, and I’ve learned a lot of tacit knowledge about how to do this kind of research especially from the more senior researchers.
I would personally appreciate people applying with research or industry expertise in fields like law, law and economics, physics, polsci, and ML itself. You should not hold off on applying because you don’t feel like you belong to the LessWrong/EA/AI safety sphere, and I’m worried that Forethought culture becomes too insular in that respect (currently it’s not much of a concern).
If you’re considering applying, I recorded a podcast with Mia Taylor, who recently joined as a researcher!