For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:
the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)
It seems to me like this is a much higher bar than reaching AGI—and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?
Thanks, that’s a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that’s before ’reasonably good global governance”.
Of course, there’s an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper.
One additional comment is that this sort of “concepts moving around issue” is one of the things that I’ve found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it.
Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)
For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:
It seems to me like this is a much higher bar than reaching AGI—and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?
Thanks, that’s a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that’s before ’reasonably good global governance”.
Of course, there’s an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper.
One additional comment is that this sort of “concepts moving around issue” is one of the things that I’ve found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it.
Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)