Anthony DiGiovanni 🔸 comments on Is AI safety still neglected?

Anthony DiGiovanni 🔸 3 Apr 2022 10:53 UTC
6 points
0 ∶ 0
Ambitious value learning and CEV are not a particularly large share of what AGI safety researchers are working on on a day-to-day basis, AFAICT. And insofar as researchers are thinking about those things, a lot of that work is trying to figure out whether those things are good ideas the first place, e.g. whether they would lead to religious hell.
Sure, but people are still researching narrow alignment/corrigibility as a prerequisite for ambitious value learning/CEV. If you buy the argument that safety with respect to s-risks is non-monotonic in proximity to “human values” and control, then marginal progress on narrow alignment can still be net-negative w.r.t. s-risks, by increasing the probability that we get to “something close to ambitious alignment occurs but without a Long Reflection, technical measures against s-risks, etc.” At least, if we’re in the regime of severe misalignment being the most likely outcome conditional on no more narrow alignment work occurring, which I think is a pretty popular longtermist take. (I don’t currently think most alignment work clearly increases s-risks, but I’m pretty close to ⁵⁰⁄₅₀ due to considerations like this.)