Neel Nanda

Karma: 5,226

I lead the DeepMind mechanistic interpretability team

Neel Nanda Feb 23, 2025, 8:22 AM
18 points
7 ∶ 16
on: How confident are you that it’s preferable for America to develop AGI before China does?
Human rights abuses seem much worse in China—this alone is basically sufficient for me

Neel Nanda Feb 22, 2025, 8:59 PM
6 points
2 ∶ 0
in reply to: VettedCauses’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
Seems pretty clear that he meant your second statement

There is publicly available and sufficient evidence indicating Charity X did not provide 10K meals to homeless individuals.

Neel Nanda Feb 22, 2025, 6:57 PM
17 points
6 ∶ 0
in reply to: Jason’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
Yeah thi. In particular anytime you criticise an organisation and they are only able to respond a few weeks later, many readers will see your criticism but will not see the organization’s response. This inherently will give a misleading impression, so you must be incredibly confident that there is no mitigating context that this organisation would give lest you do their reputation undue damage empirically. I think it is obviously the case empirically that when an organisation gets critiqued including your two previous examples that there is valuable additional context they’re able to provide when given notice

Neel Nanda Feb 22, 2025, 11:26 AM
17 points
5 ∶ 0
in reply to: VettedCauses’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
I think the typical member of the EA community has more than enough technical skill to understand evidence that a web page has been edited to be different from an archived page, if pointed to both copies from a reliable source

Neel Nanda Feb 22, 2025, 11:21 AM
8 points
3 ∶ 0
in reply to: VettedCauses’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
Quoting your post:

It is not acceptable for charities to make public and important claims (such as claims intended to convince people to donate), but not provide sufficient and publicly stated evidence that justifies their important claims.

If a charity has done this, they should not be given the benefit of the doubt, because it is their own fault that there is not sufficient publicly stated evidence to justify their important claims; they had the opportunity to state this evidence but did not. Additionally, giving a charity the benefit of the doubt in this situation incentivizes not publicly stating evidence in situations where sufficient evidence does not exist, since the charity will simply be given the benefit of the doubt.

Note that I interpret this standard as “provide sufficient evidence to support their claims in the eyes of any skeptical outside observer”, as that’s the role you’re placing yourself in here

Neel Nanda Feb 21, 2025, 11:22 PM
76 points
28 ∶ 1
on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
I pretty strongly disagree with the case that you make care and think that you should obviously give charities a heads up and it is bad for everyone’s interests if you don’t. The crucial reason is that it is very easy to misunderstand things or be missing key context and you want to give them a chance to clarify the situation

Regarding your concerns, just use archive.org or archive.is to make archives or relevant web pages. This is a standard and widely used third-party service, and if the charity changes things secretly but the archived evidence remains, they look bad.

Regarding unconscious biases. I don’t actually think this is a big deal, but if you are concerned it’s a big deal then you can just email the charity and be like: Here is the document, we will publish it, but you have a chance to comment in this Google doc to point out factual inaccuracies and provide us evidence which we will attempt to take into account. And we will give you the opportunity to prepare a written response that can either be a comments immediately after this goes live or that we will include at the end of our piece. Readers can make up their own mind.

Regarding charities being held accountable for making mistakes, I don’t think there’s a big difference between the charity being publicly called out for a mistake and then they quickly fix it or the charity fixed it just before it was publicly called out. They still made the mistake and it’s still obvious it was fixed because of you

Re your point about charities should be incentivised to provide sufficient public evidence. I think this is an extremely unreasonably high standard. Charities Should try to provide as much evidence as they can, but each person will have different objections and confusions and edge cases and it is just completely impractical to provide enough evidence to address all possible cases an adversarial reviewer might ask about. You can criticise charities for providing significantly insufficient rather than just mildly insufficient evidence. But ultimately there is always going to be the potential for misunderstanding and details that feel important to you that the charity did not predict would be important

Neel Nanda Feb 20, 2025, 9:22 AM
6 points
3 ∶ 0
in reply to: Ivan Burduk’s comment on: Ivan Burduk’s Quick takes
Calendar syncing is so helpful

Neel Nanda Feb 17, 2025, 7:47 AM
2 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
This got me curious so I had deep research make me a report on my probability of dying from different causes. It estimates that in the next 20 years I’ve maybe a 1 and ¹⁄₂ to 3% Chance of death, of which 0.5-1% is chronic illness where it’ll probably help a lot. Infectious diseases is less than .1%, Doesn’t really matter. Accidents are .5 to 1%, AI probably helps but kind of unclear. .5 to 1% on other, mostly suicide. Plausibly AI also leads to substantially improved mental health treatments which helps there? So yeah, I buy that having AGI today Vs in twenty years has small but non trivial costs to my chances of being alive when it happens

Neel Nanda Feb 17, 2025, 7:44 AM
2 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
Thanks for clarifying. In that case I think that we broadly agree

Neel Nanda Feb 17, 2025, 7:30 AM
2 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
I think that even the association between functional agency and preferences in a morally valuable sense is an open philosophical question that I am not happy taking as a given.

Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not

Neel Nanda Feb 17, 2025, 7:25 AM
4 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
4% is higher than I thought! Presumably much of that is people who had pre-existing conditions which I don’t or people who got into eg a car accidents which AI probably somewhat reduces, but this seems a lot more complicated and indirect to me.

But this isn’t really engaging with my cruxes. it seems pretty unlikely to me that we will pause until we have pretty capable and impressive AIs and to me much of the non-doom scenarios comes from uncertainty about when we will get powerful ai and how capable it will be. And I expect this to be much clearer the closer we get to these systems, or at the very least the empirical uncertainty about whether it’ll happen will be a lot clearer. I would be very surprised if there was the political will to do anything about this before we got a fair bit closer to the really scary systems.

And yep, I totally put more than 4% chance that I get killed by AI in the next 20 years. But I can see this is a more controversial belief and one that requires higher standards of evidence to argue for. If I imagine a hypothetical world where I know that in 2 years we could have aligned super intelligent AI with 98% probability and it would kill us all with 2% probability. Or we could pause for 20 years and that would get it from 98 to 99%, then I guess from a selfish perspective I can kind of see your point. But I know I do value humanity not going extinct a fair amount even if I think that total utilitarianism is silly. But I observe that I’m finding this debate kind of slippery and I’m afraid that I’m maybe moving the goalposts here because I disagree on many counts so it’s not clear what exactly my cruxes are, or where I’m just attacking points in what you say that seem off

I do think that the title of your post is broadly reasonable though. I’m an advocate for making AI x-risk cases that are premised on common sense morality like “human extinction would be really really bad”, and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn’t something that I fully trust in extreme situations

Neel Nanda Feb 17, 2025, 4:41 AM
2 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
Ah, gotcha. Yes, I agree that if your expected reduction in p(doom) is less than around 1% per year of pause, and you assign zero value to future lives, then pausing is bad on utilitarian grounds

Note that my post was not about my actual numerical beliefs, but about a lower bound that I considered highly defensible—I personally expect notably higher than 1%/year reduction and was taking that as given, but on reflection I at least agree that that’s a more controversial belief (I also think that a true pause is nigh impossible)

I expect there are better solutions that achieve many of the benefits of pausing while still enabling substantially better biotech research, but that’s nitpicking

I’m not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I’m in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed

Neel Nanda Feb 17, 2025, 4:33 AM
2 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
Ah! Thanks for clarifying—if I understand correctly, you think that it’s reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not? I personally just disagree with that premise, but I can see where you’re coming from

But in that case, it’s highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful—this seems like an open philosophical question. Defining actually what they are also seems like an open question to me—does a thermostat have preferences? Does a plant that grows towards the light? While I do feel fairly confident humans are morally meaningful. Is your argument that even if there’s a good chance they’re not morally meaningful, the expected amount of moral significance is comparable to humans?

Neel Nanda Feb 14, 2025, 11:13 AM
5 points
3 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
Because there is a much higher correlation between the value of the current generation of humans and the next one than there is between the values of humans and arbitrary AI entities

Neel Nanda Feb 14, 2025, 11:13 AM
6 points
1 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don’t understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable? I can come up with reasons, but these are assuming specific things about what is and is not morally valuable in exactly the same way that when I say arbitrary AI beings are on average substantially less valuable because I have specific preferences and values over what matters. I do not understand the philosophical position you are taking here—it feels like you’re saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?

Neel Nanda Feb 14, 2025, 11:10 AM
4 points
1 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions

If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term.

Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower. I disagree with both of these assumptions. Which part of your post are you referring to?

Neel Nanda Feb 13, 2025, 7:49 PM
11 points
2 ∶ 0
in reply to: Matthew_Barnett’s comment on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
Are you assuming some kind of moral realism here? That there’s some deep moral truth, humans may or may not have insight into it, so any other intelligent entity is equally likely to?

If so, idk, I just reject your premise. I value what I chose to value, which is obviously related to human values, and an arbitrary sampled entity is not likely to be better on that front

Neel Nanda Feb 12, 2025, 3:28 AM
7 points
0 ∶ 0
in reply to: Karthik Tadepalli’s comment on: What posts are you thinking about writing?
Fascinating, I’ve never heard of this before, thanks! If anyone’s curious, I had Deep Research [take a stab at writing this] (https://chatgpt.com/share/67ac150e-ac90-800a-9f49-f02489dee8d0) which I found pretty interesting (but have totally not fact checked for accuracy)

Neel Nanda Feb 11, 2025, 10:39 PM
13 points
4 ∶ 2
on: The standard case for delaying AI appears to rest on non-utilitarian assumptions
I think you’re using the world utilitarian in a very non standard way here. “AI civilization has comparable moral value to human civilization” is a very strong claim that you don’t provide evidence for. You can’t just call this speciesism and shift the burden of proof! At the very least, we should have wide error bars over the ratio of moral value between AIs and humans, and I would argue also whether AIs have moral value at all.

I personally am happy to bite the bullet and say that I morally value human civilization continuing over an AI civilization that killed all of humanity, and that this is a significant term in my utility function.

Neel Nanda Feb 11, 2025, 2:34 AM
5 points
0 ∶ 0
in reply to: Aaron Bergman’s comment on: aaronb50′s Shortform
Note that the UI is atrocious. You’re not using o1/o3-mini/o1-pro etc. It’s all the same model, a variant of o3, and the model in the bar at the top is completely irrelevant once you click the deep research button. I am very confused why they did it like this https://openai.com/index/introducing-deep-research/