In fairness to Richard, I think it comes across in text a lot more strongly than in my view it came across listening on youtube
GideonF
Safety Conscious Researchers should leave Anthropic
I really like this piece, and I think I share in a lot of these views. Just on some fairly minor points:
Deep Incommensurability. It seems like incommensurability helps with regards to avoiding MPL, but not actually that much. For example, there seem many moral theories (ie something that is somewhat like Person Affecting Views) that are incommensurable (or indifferent) between different size worlds, but not different qualities. So they may really care if it is a world of humans, or insects, or hedonium.
I can imagine views (they do run into non-identity, but maybe there is ways of formulating them that don’t) that this would be a real problem. For example, imagine a view that holds that simulated human existence if the best form of life, but is indifferent between that and non-existence. As such, they won’t care whether we leave the universe insentient, but faced with a pair-wise choice between hedonium and simulated humans, they will take the simulated humans everytime. So they don’t care much if we do extinct, but do care if the hedonistic utilitarians win. indeed, these views may be even less willing to take trades than many views that care about quantity. I imagine many religions, particularly universalist religions like Christianity and Islam, may actually fall into this category.
-
I think some more discussion of the ‘kinetics’ vs ‘equilibrium’ point you sort of allude to seems pretty interesting. I think you could reasonably hold the view that rational (or sensing or whatever other sort of beings) beings converge to moral correctness in infinite time. But we are likely not waiting infinite time before locking in decisions that cannot be reversed. Thus, because irreversible moral decisions could occur at a faster rate than correct moral convergence (ie the kinetics of the process is more important than what it would be at equilibrium), we shouldn’t expect the equilibrium process to dominate. I think you gesture towards this, but I think exploration of the ordering further would be very interesting.
-
I also wonder if views that are pluralist rather than monist about value may make the MPL problem worse or better. I think I could see arguments either way, depending on exactly how those views are formulated, but would be interesting to explore.
Very interesting piece anyway, thanks a lot, and really resonates with a lot I’ve been thinking about
I’m sure I’ll have a few more comments at some point as I revisit the essay.
Ye, I might be wrong, but something like Larry Temkin’s model might work best here (been a while since I read it so may be getting it wrong)
I think averageists may actually also care about the long term future a lot, and it may still have a MPL if they don’t hold (rapid) diminish returns to utility WITHIN lives (ie it is possible for the average life to be a lot worse or a lot better than today). Indeed, given (potentially) plausible views on interspecies welfare comparisons, and how bad the lvies of lots of non-humans seem today, this just does seem to be true. Now, its not clear they shouldn’t be at least a little more sympathetic to us converging on the ‘right’ world (since it seems easier), but it doesn’t seem like they get out of much of the argument either
I think a really important question in addressing this is something like—does the USA remain ‘unfanatical’ if the shackles are taken off powerful people. This is where I think the analysis of the USA goes a little bit wrong—we need to think about what the scenario looks like if it is possible for power to be much much more extremely concentrated than it is now. Certainly, in such a scenario, its not obvious thatit will be true post AGI that “even a polarizing leader cannot enforce a singular ideology or eliminate opposition”
You’re sort of right on the first point, and I’ve definitely counted that work in my views on the area. I generally prefer to refer to it as ‘making sure the future goes well for non-humans’ - but I’ve had that misinterpreted as just focused on animals. I
I think for me the fact that the minds will be non-human, and probably digital, matter a lot. Firstly, I think arguments for longtermism probably don’t work if the future is mostly just humans. Secondly, the fact that these beings are digital minds, and maybe digital minds very different to us, means a lot of common responses that are given for how to make the future go well (eg make sure they’re preferred government ‘wins’ the ASI race) definitely looks less promising me. Plus you run into trickier problems like what Carlsmith discusses in his Otherness and Control series, and on the other end, if conscious AIs are ‘small minds’ ala insects (lots of small conscious digital minds that are maybe not individually very smart) you run into a bunch of the same issues of how to adequately treat them. So this is sort of why I call it ‘digital minds’, but I guess thats fairly semantic.
On you’re second point, I basically think it could go either way. I think this depends on a bunch of things, including if, how strong and what type (ie what values are encoded) of ‘lock in’ we get, how ‘adaptive’ consciousness is etc. At least to me, I could see it going either way (not saying 50-50 credence towards both, but my guess is I’m at least less skeptical than you). Also, its possible that these are the more likely scenarios to have abundant suffering (although this also isn’t obvious to me given potential motivations for causing deliberate suffering).
Gideon Futerman’s Quick takes
I wish more work focused on digital minds really focused on answering the following questions, rather than merely investigating how plausible it is that digital minds similar to current day AI’s could be sentient:
-
What does good sets of scenarios for post-AGI governance need to look like to create good/avoid terrible (or whatever normative focus we want) futures, assuming digital minds are the dominant moral patients going into the future 1a) How does this differ dependent on what sorts of things can be digital minds eg whether sentient AIs are likely to happen ‘by accident’ by creating useful AIs (including ASI systems or sub-systems) vs whether sentient AIs have to be delibrately built? How do we deal with this trade off?
-
Which of these good sets of scenarios need certain actions to be taken pre-ASI development (actions beyond simply ensuring we don’t all die)? Therefore, what actions would we ideally take now to help bring about such good futures? This includes, in my view, what, if any, thicker concept of alignment than ‘intent alignment’ ought we to use.
-
Given the strategic, political, geopolitical and technological situation we are in, how, if at all, can we make concrete progress to this? We obviously can’t just ‘do research’ and hope this solves everything. Rather, we ought to use this to guide specific actions that can have impact. I guess this step feels rather hard to do without 1 and 2, but also, as far as I can tell, no one is really doing this?
I’m sure someone has expressed this same set of questions elsewhere, but i’ve not seen them yet, and at least to me, seem pretty neglected and important
-
Factory farming?
The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Just flagging, it seems pretty strange to have something about career choice in ‘Community’
Pretty sure EA basically invented that (yes people were working on stuff before then and outside of it, but still that seems different to ‘reinventing the wheel’)
I see no legitimate justification for attitudes that would consider humans as important enough that global health interventions would beat out animal welfare, particularly given the sheer number and scale of invertebrate suffering. If invertibrates are sentient, it seem animal welfare definitely could absorb 100m and remain effective on the margin, and probably also if they are not (which seems unlikely). The reasons I am not fully in favour is mostly because the interaction of animal welfare with population ethics is far stronger than the interaction of global health developments, and given the signifciant uncertainties involved with population ethics, I can’t be sure these don’t at least significant reduce the benefits of AW over GH work
I think I am unsure how long it is possible for an indefinite moratorium to last, but I think I probably fall, and increasingly fall, much closer to supporting it than I guess you do.
In answer to these specific points, I basically seem maintaining a moratorium as an example of Differential Technology Development. As long as the technologies that we can use to maintain a moratorium (both physical and social technologies) outpace the rate of progress towards ASI, we can maintain the moratorium. I do think this would require drastically slowing down a specific subset of scientific progress in the long term, but am not convinced it would be so general as you suggest. I guess this is some mixture of both 1 and 2, although with both I do think this means that neither position ends up being so extreme.
In answer to your normative judgement, if 1 allows a flourishing future, which I think a drastically slowed sense of progress could, then it seems desirable from a longtermist perspective. I’m also really unsure that, with sufficient time, we can’t access significant parts of technology space without an agentic ASI, particularly if we increase our defences against an agentic ASI using technologies like narrow AIs sufficiently. It also strikes me that assigning significant normative value to accessing all areas (or even extremely large areas) of science and technology space seems like a value set that is related to ‘progress’/transhumanism as an end of itself, rather than a means to an end (like totalist utilitarians with transhumanist bents do).
For me, its really hard to tell how long we could hold a moratroium for, and how long its desireable. But certainly, if feasible, it seems timescales well beyond decades would be very desirable
Deconfusing Pauses: Long Term Moratorium vs Slowing AI
I do think we have to argue that national securitisation is more dangerous than humanity securitisation, or non-securitised alternatives. I think its important to note that whilst I explicitly discuss humanity macrosecuritisation, there are other alternatives as well that Aschenbrenner’s national securitisation compromises, as I briefly argue in the piece.
Of course, I have not and was not intending to provide an entire and complete argument for this (it is only 6,000 words) , although I think I do go further to proving this than you give me credit for here. As I summarise in the piece, the Sears (2023) thesis provides a convincing argument from empirical examples that national securitisation (and a failure of humanity macrosecuritisation) is the most common factor in the failure of Great Powers to adequately combat existential threats (eg the failure of the Baruch Plan/international control of nuclear energy, the promotion of technology competition around AI vs arms agreements with the threat of nuclear winter, BWC, montreal protocol). Given this limited but still significant data that I draw on, I do think it is unfair to suggest that I haven’t provided an argument that national securitisation isn’t more dangerous on net. Moreover, as I address in the piece, Aschenbrenner fails to provide any convincing track record of success of national securitisation, whilst his use of historical analogies (Szilard, Oppenheimer and Teller), all indicate he is pursuing a course of action that probably isn’t safe. Whilst of course I didn’t go through every argument, I think Section 1 provides arguments that national securitisation isn’t inevitable, Section 2 provides the argument that, at least from historical case studies, humanity macrosecuritisation is safer than national securitisation. The other sections show why I think Aschenbrenner’s argument is dangerous rather than just wrong, and how he ignores important other factors.
The core of Aschenbrenner’s argument is that national securitisation is desirable and thus we ought to promote and embrace it (‘see you in the desert’). Yet he fails to engage with the generally poor track record of national securitisation at promoting existential safety, or fails to provide a legitimate counter-argument. He also, as we both acknowledge, fails to adequate deal with possibilities for international collaboration. His argument for why we need national securitisation seems to be premised on three main ideas: it is inevitable (/there are no alternatives), the values of the USA ‘winning’ the future is our most important concern (whilst alignment is important, I do think it is secondary to Aschenbrenner to this), the US natsec establishment is the way to ensure that we get a maximally good future. I think Aschenbrenner is wrong on the first point (and certainly, fails to adequeatly justify it). On the second point, he overestimates the importance of the US winning compared to the difficulty of alignment, and certainly, I think his argument for this fails to deal with many of the thorny questions here (what about non-humans? how does this freedom remain in a world of AGI etc?). On the third point, I think he goes some way to justify why the US natsec establishment would be more likely to ‘win’ a race, but fails to show why such a race would be safe (particularly given its track record). He also fails to argue that natsec would allow for the values we care about to be preserved (US natsec doesn’t have the best track record with reference to freedom, human rights etc).
On the point around the instability of international agreements. I do think this is the strongest argument against my model of humanity macrosecuritisation leading to a regime that stops the development of AGI. However, as I allude to in the essay, this isn’t the only alternative to national securitisation. Since publishing the piece this is the biggest mistake in reasoning (and I’m happy to call it that) that I see people making. The chain of logic that goes ‘humanity macrosecuritisation leading to an agreement would be unstable therefore promoting national securitisation is the best course of action’ is flawed; one needs to show that the plethora of other alternatives (depolitical/political/riskified decisionmaking, or humanity macrosecuritisation but without an agreement) are not viable—Aschenbrenner doesn’t address this at all. I also, as I think you do, see Aschenbrenner’s argument against an agreement as containing very little substance—I don’t mean to say its obviously wrong, but he hardly even argues for it.
I do think stronger arguments for the need to nationally securitise AI could be provided, and I also think they are probably wrong. Similarly, I think stronger arguments than mine can be provided with regards to why we need to humanity macrosecuritise superintelligence and how international collaboration on controlling AI development (I am working on something like this) that can address some of the concerns that one may have. But the point of this piece is to engage with the narratives and arguments in Aschenbrenner’s piece. I think he fails to justify national securitisation whilst also taking action that endangers us (and I’m hearing from people connected to US politics that the impact of this piece may actually be worse than I feared).
On the stable totalitarianism point, I also think its useful to note that it is not at all obvious that the risk of stable totalitarianism is more under some form of global collaboration than it is under a nationally securitised race.
On these three points:
Yes, the Project is a significant possibility. People like Aschenbrenner make this more likely to happen, and we should be trying to oppose it as much as possible. Certainly, there is a major ‘missing mood’ in Aschenbrenner’s piece (and the interview), where he seems to greet the possibility of the Project with glee.
I’m actually pretty unsure whether improving cybersecurity is very important. The benefits are well known. However, if you don’t improve cybersecurity (or can’t), then advancing AI becomes much more dangerous withg much less upside, so racing becomes harder. With worse cybersecurity, a pause may be more likely. Basically, I’m unsure and I don’t think its as simple as most people think. Its also not obvious to me that, for example, America directly sharing model weights with China wouldn’t be a positive thing.
Certainly according to my ethics I am not ‘neutral pro-humanity’, but rather care about a flourishing and just future for all sentient beings. On this axis, I do think the difference is more marginal than many would expect. I would probably guess that US/the free world would be better to have relatively greater power, although with some caveats (eg I’m not sure I trust the CIA very much to have a large amount of control). I think both groups ‘as-is’, particularly in a nationally securitised ‘race’ are rather far from the optimal, and this difference is very morally significant. So I think I’m definitiely MUCH more concerned than Aschenbrenner is about avoiding a nationally securitised race (also because I’m more concerned with misalignment than I think he is).
Thanks for this reply Stephen, and sorry for my late reply, I was away.
I think its true that Aschenbrenner gives (marginally) more consideration than I gave him credit for—not actually sure how I missed that paragraph to be honest! Even then, whilst there is some merit to that argument, I think he needs to much better justify his dismissal of an international treaty (along similar lines to your shortform piece). As I argue in the essay, I think that such lack of stability requires a particular reading of how states acts—for example, I argue if we buy a form of defensive realism, states may in fact be more inclined to reach a stable equilibrium/. Moreover, as I argue, I think Aschenbrenner fails to acknowledge how his ideas on this may well become a self-fulfilling prophecy.
I actually think I just disagree with your characterisation of my second point, although it could well be a flaw in my communication, and if so I apologise. My argument isn’t even that values of freedom and democracy, or even a narrower form of ‘American values’ wouldn’t be better for the future (see below for more discussion on that), its that national securitisation has a bad track record at promoting collaboration and dealing with extreme risk and we have good reason to think it may be bad in the case of AI. So even if Aschenbrenner doesn’t frame it as national securitisation for the sake of nationalism, but rather national securitisation for the sake of all humanity, the impacts will be the same. The point of that paragraph was simply to preempt a critique that is exactly what you say. I also think its clear that Aschenbrenner in his piece is happy to conflate those values with ‘American nationalism/dominance’ (eg ‘America must win’), so I’m not sure him making this distinction actually matters.
I also probably am much less bearish on American dominance than Aschenbrenner is. I’m not sure the American national security establishment actually has a good track record of preserving a ‘raucous plurality’, and if (as Aschenbrenner wants) we expect superintelligence to be developed through that institution, I’m not overly confident in how good it will be. Whilst I am no friend of dictatorships, I’m also unconvinced that if one cares about raucous pluralism that US dominance, or certainly to the extent that Aschenbrenner envisions, is necessarily a good thing. Moreover, even in American democracy, the vast majority of moral patients aren’t represented at all. I’m essentially unconvinced that the benefits of America ‘winning’ a nationally securitised AI race anywhere near oughtweigh the geopolitical risk, misalignment risk, and most importantly the risk of not taking our time to construct a mutually beneficial future for all sentient beings. I think I have put this paragraph quite crudely, and would be happy to elaborate further, although it isn’t actually central to my argument.
I think its wrong to say that my argument doesn’t work without significant argument against those two premises. Firstly, my argument was that Aschenbrenner was ‘dangerous’, which required highlighting why the narrative choice was problematic. Secondly, yes, there is more to do on those points, but given Aschenbrenner’s failure to give in depth argumentation on those points, I thought that they would be better to deal with as their own pieces (which I may or may not right). In my view, the most important aspect of the piece was Aschenbrenner’s claim that national securitisation is necessary to secure the safest outcomes, and I do feel the piece was broadly successful at arguing that this is a dangerous narrative to propogate. I do think if you hold Aschenbrenner’s assumptions strongly, namely cooperation is very difficult, alignment is easy-ish and the most important thing is for an American AI lead as this leads to a maximally good future by maximising free expression and political expression, then my argument is not convincing. I do, however, think this model is based on some rather controversial assumptions, and given the dangers involved, woefully insufficiently justified by Aschenbrenner in his essay.
One final point is that it is still entirely non-obvious, as I mention in the essay, that national securitisation is the best frame even if a pause is impossible, or even weaker, if it is an unstable equilibrium.
I think you should delete the post and resend it out another day (maybe on the 3rd?)