I disagree with the implied theses in statements like “I’m not very sympathetic to pausing or slowing down AI as a policy proposal.”
This overlooks my arguments in section 3, which were absolutely critical to forming my opinion here. My argument here can be summarized as follows:
The utilitarian arguments for technical alignment research seem weak, because AIs are likely to be conscious like us, and also share human moral concepts.
By contrast, technical alignment research seems clearly valuable if you care about humans who currently exist, since AIs will presumably be directly aligned to them.
However, pausing AI for alignment reasons seems pretty bad for humans who currently exist (under plausible models of the tradeoff).
I have sympathies to both utilitarianism and the view that current humans matter. The weak considerations favoring pausing AI on the utilitarian side don’t outweigh the relatively much stronger and clearer arguments against pausing for currently existing humans.
The last bullet point is a statement about my values. It is not a thesis independently of my values. I feel this was pretty explicit in the post.
If you wrote a post that just said “look, we’re super uncertain about things, here’s your reminder that there are worlds in which alignment work is negative”, I’d be on board with it. But it feels like a motte-and-bailey to write a post that is clearly trying to cause the reader to feel a particular way about some policy, and then retreat to “well my main thesis was very weak and unobjectionable”.
I’m not just saying “there are worlds in which alignment work is negative”. I’m saying that it’s fairly plausible. I’d say greater than 30% probability. Maybe higher than 40%. This seems perfectly sufficient to establish the position, which I argued explicitly, that the alternative position is “fairly weak”.
It would be different if I was saying “look out, there’s a 10% chance you could be wrong”. I’d agree that claim would be way less interesting.
I don’t think what I said resembles a motte-and-bailey, and I suspect you just misunderstood me.
[ETA:
Well, I can believe it’s weak in some absolute sense. My claim is that it’s much stronger than all of the arguments you make put together.
Part of me feels like this statement is an acknowledgement that you fundamentally agree with me. You think the argument in favor of unaligned AIs being less utilitarian than humans is weak? Wasn’t that my thesis? If you started at a prior of 50%, and then moved to 65% because of a weak argument, and then moved back to 60% because of my argument, then isn’t that completely consistent with essentially every single thing I said? OK, you felt I was saying the probability is like 50%. But 60% really isn’t far off, and it’s consistent with what I wrote (I mentioned “weak reasons” in the post). Perhaps like 80% of the reason why you disagree here is because you think my thesis was something else.
More generally I get the sense that you keep misinterpreting me as saying things that are different or stronger than what I intended. That’s reasonable given that this is a complicated and extremely nuanced topic. I’ve tried to express areas of agreement when possible, both in the post and in reply to you. But maybe you have background reasons to expect me to argue a very strong thesis about utilitarianism. As a personal statement, I’d encourage you to try to read me as saying something closer to the literal meaning of what I’m saying, rather than trying to infer what I actually believe underneath the surface.]
I have lots of other disagreements with the rest of what you wrote, although I probably won’t get around to addressing them. I mostly think we just disagree on some basic intuitions about how alien-like default unaligned AIs will actually be in the relevant senses. I also disagree with your reversal tests, because I think they’re not actually symmetric, and I think you’re omitting the best arguments for thinking that they’re asymmetric.
This overlooks my arguments in section 3, which were absolutely critical to forming my opinion here. My argument here can be summarized as follows:
The utilitarian arguments for technical alignment research seem weak, because AIs are likely to be conscious like us, and also share human moral concepts.
By contrast, technical alignment research seems clearly valuable if you care about humans who currently exist, since AIs will presumably be directly aligned to them.
However, pausing AI for alignment reasons seems pretty bad for humans who currently exist (under plausible models of the tradeoff).
I have sympathies to both utilitarianism and the view that current humans matter. The weak considerations favoring pausing AI on the utilitarian side don’t outweigh the relatively much stronger and clearer arguments against pausing for currently existing humans.
The last bullet point is a statement about my values. It is not a thesis independently of my values. I feel this was pretty explicit in the post.
I’m not just saying “there are worlds in which alignment work is negative”. I’m saying that it’s fairly plausible. I’d say greater than 30% probability. Maybe higher than 40%. This seems perfectly sufficient to establish the position, which I argued explicitly, that the alternative position is “fairly weak”.
It would be different if I was saying “look out, there’s a 10% chance you could be wrong”. I’d agree that claim would be way less interesting.
I don’t think what I said resembles a motte-and-bailey, and I suspect you just misunderstood me.
[ETA:
Part of me feels like this statement is an acknowledgement that you fundamentally agree with me. You think the argument in favor of unaligned AIs being less utilitarian than humans is weak? Wasn’t that my thesis? If you started at a prior of 50%, and then moved to 65% because of a weak argument, and then moved back to 60% because of my argument, then isn’t that completely consistent with essentially every single thing I said? OK, you felt I was saying the probability is like 50%. But 60% really isn’t far off, and it’s consistent with what I wrote (I mentioned “weak reasons” in the post). Perhaps like 80% of the reason why you disagree here is because you think my thesis was something else.
More generally I get the sense that you keep misinterpreting me as saying things that are different or stronger than what I intended. That’s reasonable given that this is a complicated and extremely nuanced topic. I’ve tried to express areas of agreement when possible, both in the post and in reply to you. But maybe you have background reasons to expect me to argue a very strong thesis about utilitarianism. As a personal statement, I’d encourage you to try to read me as saying something closer to the literal meaning of what I’m saying, rather than trying to infer what I actually believe underneath the surface.]
I have lots of other disagreements with the rest of what you wrote, although I probably won’t get around to addressing them. I mostly think we just disagree on some basic intuitions about how alien-like default unaligned AIs will actually be in the relevant senses. I also disagree with your reversal tests, because I think they’re not actually symmetric, and I think you’re omitting the best arguments for thinking that they’re asymmetric.
This, in addition to the comment I previously wrote, will have to suffice as my reply.