1: “Robust alignment” is a deliberately vague term, it’s meant to incorporate your views about how hard alignment is (e.g. UDT vs. well intentioned)
4: It’s a hard question, our perspective is that the backfire->cluelessness-> don’t act chain can be thought of as low tractability
5: By “stable under reflection” we meant the AI reflecting on it’s own values (while interacting with the world), where agreement means they wouldn’t change their values much (stylistically: an AI that shares 70% of our values in 2030 has those same values in 3030). But you’re right that how AIs interact (beyond competition, handled in the last question) is important.
7. S-risks do break the scale and we couldn’t find a good simple way to deal with that (though we’ll do other polls more directly on that later). The intent of “will” was to match 100% expected probability to 100% agree on the scale
Thanks Dawn, taking these in turn:
1: “Robust alignment” is a deliberately vague term, it’s meant to incorporate your views about how hard alignment is (e.g. UDT vs. well intentioned)
4: It’s a hard question, our perspective is that the backfire->cluelessness-> don’t act chain can be thought of as low tractability
5: By “stable under reflection” we meant the AI reflecting on it’s own values (while interacting with the world), where agreement means they wouldn’t change their values much (stylistically: an AI that shares 70% of our values in 2030 has those same values in 3030). But you’re right that how AIs interact (beyond competition, handled in the last question) is important.
7. S-risks do break the scale and we couldn’t find a good simple way to deal with that (though we’ll do other polls more directly on that later). The intent of “will” was to match 100% expected probability to 100% agree on the scale
Thanks! Then I don’t think I need to update my answers. I’m looking forward to your next batch of questions!