Neel Nanda comments on calebp’s Quick takes

Neel Nanda 19 Jun 2025 2:00 UTC
7 points
2 ∶ 0
I think this is reasonable as a way for the community to reflexively react to things, to be honest. The question I’m trying to answer when I see someone making a post with an argument that seems worth engaging with is: what’s the probability that I’ll learn something new or change my mind as a result of engaging with this?

When there’s a foundational assumption disagreement, it’s quite difficult to have productive conversations. The conversation kind of needs to be about the disagreement about that assumption, which is a fairly specific kind of discussion. Eg if someone hasn’t really thought about AI alignment much, thinks it’s not an issue, but isn’t familiar with the reasons I believe it matters, then I put a much lower (though still non-zero) probability that I’ll make useful updates from talking to them. Because I have a bunch of standard arguments for the most obvious objections people sometimes raise, and don’t learn much from stating them. And I think there’s a lot of value to having high-context discussion spaces where people broadly agree on these foundational claims.

These foundational claims are pretty difficult to establish consensus on if people have different priors, and discussing them doesn’t really tend to move people either way. I get a lot of value from discussing technical details of what working on AI safety is like with people, much more so than I get from the average “does AI safety matter at all?” conversation.

Obviously, if someone could convince me that AI safety doesn’t matter, that would be a big deal. But I’d guess it’s only really worth the effort if I’m reasonably sure the person understands why I believe it does matter and disagrees anyway, in a way that isn’t stemming from some intractable foundational disagreements in worldviews
- Joseph_Chu 19 Jun 2025 16:41 UTC
  3 points
  0 ∶ 1
  Parent
  I want to clarify that I don’t think ideas like the Orthogonality Thesis or Instrumental Convergence are wrong. They’re strong predictive hypotheses that follow logically from very reasonable assumptions, and even the possibility that they could be correct is more than enough justification for AI safety work to be critical.
  I was more just pointing out some examples of ideas that are very strongly held by the community, that happen to have been named and popularized by people like Bostrom and Yudkowsky, both of whom might be considered elites among us.
  P.S. I’m always a bit surprised that the Neel Nanda of Google DeepMind has the time and desire to post so much on the EA Forums (and also Less Wrong). That probably says very good things about us, and also gives me some more hope that the folks at Google are actually serious about alignment. I really like your work, so it’s an honour to be able to engage with you here (hope I’m not fanboying too much).