Mainly the reason I don’t think about it more[1] is that I don’t see any realistic scenarios where AI will be motivated to produce suffering. And I don’t think it’s likely to incidentally produce lots of suffering either, since I believe that too is a narrow target.[2] I think accidental creation of something like suffering subroutines are unlikely.
That said, I think it’s likely on the default trajectory that human sadists are going to expose AIs (most likely human uploads) to extreme torture just for fun. And that could be many times worse than factory farming overall because victims can be run extremely fast and in parallel, so it’s a serious s-risk.
It’s still my second highest cause-wise priority, and a non-trivial portion of what I work on is upstream of solving s-risks as well. I’m not a monster.
Admittedly, I also think “maximise eudaimonia” and “maximise suffering” are very close to each other in goal-design space (cf. the Waluigi Effect), so many incremental alignment strategies for the former could simultaneously make the latter more likely.
I can think of plenty of scenarios that are “realistic” by AI safety standards… Scenarios that are inspired by stuff that terrorists do all the time when they’re fighting powerful governments, so lots of precedents in history, and whose realism only suffers a bit because they would not be technically possible for humans with today’s technology.
You mean threats? I’m not sure what you’re pointing towards with the terrorist thing.
I have meta-uncertainty here. I think I could think of realistic-ish scenarios if I gave it enough thought. (Though I’d have to depreciate the probability in proportion to how much effort I spend searching for it.) Tbh, I just haven’t given it enough thought. Do you have any recs for quick write-ups of some scenarios?
Mainly the reason I don’t think about it more[1] is that I don’t see any realistic scenarios where AI will be motivated to produce suffering. And I don’t think it’s likely to incidentally produce lots of suffering either, since I believe that too is a narrow target.[2] I think accidental creation of something like suffering subroutines are unlikely.
That said, I think it’s likely on the default trajectory that human sadists are going to expose AIs (most likely human uploads) to extreme torture just for fun. And that could be many times worse than factory farming overall because victims can be run extremely fast and in parallel, so it’s a serious s-risk.
It’s still my second highest cause-wise priority, and a non-trivial portion of what I work on is upstream of solving s-risks as well. I’m not a monster.
Admittedly, I also think “maximise eudaimonia” and “maximise suffering” are very close to each other in goal-design space (cf. the Waluigi Effect), so many incremental alignment strategies for the former could simultaneously make the latter more likely.
A bunch of scenarios are collected in the s-risk sub wiki
I can think of plenty of scenarios that are “realistic” by AI safety standards… Scenarios that are inspired by stuff that terrorists do all the time when they’re fighting powerful governments, so lots of precedents in history, and whose realism only suffers a bit because they would not be technically possible for humans with today’s technology.
You mean threats? I’m not sure what you’re pointing towards with the terrorist thing.
I have meta-uncertainty here. I think I could think of realistic-ish scenarios if I gave it enough thought. (Though I’d have to depreciate the probability in proportion to how much effort I spend searching for it.) Tbh, I just haven’t given it enough thought. Do you have any recs for quick write-ups of some scenarios?