But the difficulty of alignment doesn’t seem to imply much about whether slowing is good or bad, or about its priority relative to other goals.
At the extremes, if alignment-to-”good”-values by default was 100% likely I presume slowing down would be net-negative, and racing ahead would look great. It’s unclear to me where the tipping point is, what kind of distribution over different alignment difficulty levels one would need to have to tip from wanting to speed up vs wanting to slow down AI progress.
Seems to me like the more longtermist one is, the more slowing down looks good even when one is very optimistic about alignment. Then again there are some considerations that push against this: risk of totalitarianism, risk of pause that never ends, risk of value-agnostic alignment being solved and the first AGI being aligned to “worse” values than the default outcome.
(I realize I’m using two different definitions of alignment in this comment, would like to know if there’s standardized terminology to differentiate between them)
At the extremes, if alignment-to-”good”-values by default was 100% likely I presume slowing down would be net-negative, and racing ahead would look great. It’s unclear to me where the tipping point is, what kind of distribution over different alignment difficulty levels one would need to have to tip from wanting to speed up vs wanting to slow down AI progress.
Seems to me like the more longtermist one is, the more slowing down looks good even when one is very optimistic about alignment. Then again there are some considerations that push against this: risk of totalitarianism, risk of pause that never ends, risk of value-agnostic alignment being solved and the first AGI being aligned to “worse” values than the default outcome.
(I realize I’m using two different definitions of alignment in this comment, would like to know if there’s standardized terminology to differentiate between them)