I lead the DeepMind mechanistic interpretability team
Neel Nanda
Thanks for clarifying. In that case I think that we broadly agree
I think that even the association between functional agency and preferences in a morally valuable sense is an open philosophical question that I am not happy taking as a given.
Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not
4% is higher than I thought! Presumably much of that is people who had pre-existing conditions which I don’t or people who got into eg a car accidents which AI probably somewhat reduces, but this seems a lot more complicated and indirect to me.
But this isn’t really engaging with my cruxes. it seems pretty unlikely to me that we will pause until we have pretty capable and impressive AIs and to me much of the non-doom scenarios comes from uncertainty about when we will get powerful ai and how capable it will be. And I expect this to be much clearer the closer we get to these systems, or at the very least the empirical uncertainty about whether it’ll happen will be a lot clearer. I would be very surprised if there was the political will to do anything about this before we got a fair bit closer to the really scary systems.
And yep, I totally put more than 4% chance that I get killed by AI in the next 20 years. But I can see this is a more controversial belief and one that requires higher standards of evidence to argue for. If I imagine a hypothetical world where I know that in 2 years we could have aligned super intelligent AI with 98% probability and it would kill us all with 2% probability. Or we could pause for 20 years and that would get it from 98 to 99%, then I guess from a selfish perspective I can kind of see your point. But I know I do value humanity not going extinct a fair amount even if I think that total utilitarianism is silly. But I observe that I’m finding this debate kind of slippery and I’m afraid that I’m maybe moving the goalposts here because I disagree on many counts so it’s not clear what exactly my cruxes are, or where I’m just attacking points in what you say that seem off
I do think that the title of your post is broadly reasonable though. I’m an advocate for making AI x-risk cases that are premised on common sense morality like “human extinction would be really really bad”, and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn’t something that I fully trust in extreme situations
Ah, gotcha. Yes, I agree that if your expected reduction in p(doom) is less than around 1% per year of pause, and you assign zero value to future lives, then pausing is bad on utilitarian grounds
Note that my post was not about my actual numerical beliefs, but about a lower bound that I considered highly defensible—I personally expect notably higher than 1%/year reduction and was taking that as given, but on reflection I at least agree that that’s a more controversial belief (I also think that a true pause is nigh impossible)
I expect there are better solutions that achieve many of the benefits of pausing while still enabling substantially better biotech research, but that’s nitpicking
I’m not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I’m in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed
Ah! Thanks for clarifying—if I understand correctly, you think that it’s reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not? I personally just disagree with that premise, but I can see where you’re coming from
But in that case, it’s highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful—this seems like an open philosophical question. Defining actually what they are also seems like an open question to me—does a thermostat have preferences? Does a plant that grows towards the light? While I do feel fairly confident humans are morally meaningful. Is your argument that even if there’s a good chance they’re not morally meaningful, the expected amount of moral significance is comparable to humans?
Because there is a much higher correlation between the value of the current generation of humans and the next one than there is between the values of humans and arbitrary AI entities
For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don’t understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable? I can come up with reasons, but these are assuming specific things about what is and is not morally valuable in exactly the same way that when I say arbitrary AI beings are on average substantially less valuable because I have specific preferences and values over what matters. I do not understand the philosophical position you are taking here—it feels like you’re saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?
If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term.
Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower. I disagree with both of these assumptions. Which part of your post are you referring to?
Are you assuming some kind of moral realism here? That there’s some deep moral truth, humans may or may not have insight into it, so any other intelligent entity is equally likely to?
If so, idk, I just reject your premise. I value what I chose to value, which is obviously related to human values, and an arbitrary sampled entity is not likely to be better on that front
Fascinating, I’ve never heard of this before, thanks! If anyone’s curious, I had Deep Research [take a stab at writing this] (https://chatgpt.com/share/67ac150e-ac90-800a-9f49-f02489dee8d0) which I found pretty interesting (but have totally not fact checked for accuracy)
I think you’re using the world utilitarian in a very non standard way here. “AI civilization has comparable moral value to human civilization” is a very strong claim that you don’t provide evidence for. You can’t just call this speciesism and shift the burden of proof! At the very least, we should have wide error bars over the ratio of moral value between AIs and humans, and I would argue also whether AIs have moral value at all.
I personally am happy to bite the bullet and say that I morally value human civilization continuing over an AI civilization that killed all of humanity, and that this is a significant term in my utility function.
Note that the UI is atrocious. You’re not using o1/o3-mini/o1-pro etc. It’s all the same model, a variant of o3, and the model in the bar at the top is completely irrelevant once you click the deep research button. I am very confused why they did it like this https://openai.com/index/introducing-deep-research/
I guess my issue is that this all seems strictly worse than “pledge to give 10% for the first 1-2 years after graduation, and then decide whether to commit for life”. Even “you commit for life, but with the option to withdraw 1-2 years after graduation”, ie with the default to continue. Your arguments about not getting used to a full salary apply just as well to those imo
More broadly, I think it’s bad to justify getting young people without much life experience to make a lifetime pledge, based on a controversial belief (that it should be normal to give 10%), by saying that you personally believe that belief is true. In this specific case I agree with your belief! I took the pledge (shortly after graduating I think). But there are all kinds of beliefs I disagree with that I do not want people using here. Lots of young people make choices that they regret later—I’m not saying they should be stopped from making these choices, but it’s bad to encourage them. I agree with Buck, at least to the extent of saying that undergrads who’ve been in EA for less than a year should not be encouraged to sign a lifetime pledge.
(On a meta level, the pledge can obviously be broken if someone really regrets it, it’s not legally binding. But I think arguments shouldn’t rely on the pledge being breakable)
MATS Applications + Research Directions I’m Currently Excited About
I personally think it’s quite bad to try to get people to sign a lifetime giving pledge before they’ve ever had a real job, and think this is overemphasized in EA.
I think it’s much better to eg make a pledge for the next 1-5 years, or the first year of your career, or something, and re-evaluate at the end of that, which I think mitigates some of your concerns
Member of Technical Staff is often a catchall term for “we don’t want to pigeonhole you into a specific role, you do useful stuff in whatever way seems to add the most value”, I wouldn’t read much into it
Speaking as an IMO medalist who partially got into AI safety because of reading HPMOR 10 years ago, I think this plan is extremely reasonable
I disagree. I think it’s an important principle of EA that it’s socially acceptable to explore the implications of weird ideas, even if they feel uncomfortable, and to try to understand the perspective of those you disagree with. I want this forum to be a place where posts like this can exist.
The EA community still donates far more to global health causes than animal welfare—I think the meat eater problem discourse seems like a much bigger deal than it actually is in the community. I personally think it’s all kinda silly and significantly prioritise saving human lives
This got me curious so I had deep research make me a report on my probability of dying from different causes. It estimates that in the next 20 years I’ve maybe a 1 and 1⁄2 to 3% Chance of death, of which 0.5-1% is chronic illness where it’ll probably help a lot. Infectious diseases is less than .1%, Doesn’t really matter. Accidents are .5 to 1%, AI probably helps but kind of unclear. .5 to 1% on other, mostly suicide. Plausibly AI also leads to substantially improved mental health treatments which helps there? So yeah, I buy that having AGI today Vs in twenty years has small but non trivial costs to my chances of being alive when it happens