Nora Belrose comments on AI Pause Will Likely Backfire

Nora Belrose 17 Sep 2023 20:07 UTC
10 points
6 ∶ 2
You need to have some motivation for thinking that a fundamentally new kind of danger will emerge in future systems, in such a way that we won’t be able to handle it as it arises. Otherwise anyone can come up with any nonsense they like.
If you’re talking about e.g. Evan Hubinger’s arguments for deceptive alignment, I think those arguments are very bad, in light of 1) the white box argument I give in this post, 2) the incoherence of Evan’s notion of “mechanistic optimization,” and 3) his reliance on “counting arguments” where you’re supposed to assume that the “inner goals” of the AI are sampled “uniformly at random” from some uninformative prior over goals (I don’t think the LLM / deep learning prior is uninformative in this sense at all).
- Davidmanheim 18 Sep 2023 16:30 UTC
  10 points
  5 ∶ 2
  Parent
  You need to have some motivation for thinking that a fundamentally new kind of danger will emerge in future systems, in such a way that we won’t be able to handle it as it arises.
  That was what everyone ins AI safety was discussing for a decade or more, until around 2018. You seem to ignore these arguments about why AI will be dangerous, as well as all of the arguments that alignment will be hard. Are you familiar with all of that work?