“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.
I’m guessing this definition is meant to separate misalignment from misuse, but I’m curious whether you are including either/both of these 2 cases as misalignment x-risk:
AGI is deployed and we get locked into a great outcome by today’s standards, but we get a world with <=1% of the value of “humanity’s potential”. So we sort of have an existential catastrophe, without a discrete catastrophic event.
The AGI is aligned with someone’s values, and we still get a future with 0-negative value but this is due to a “mistake” e.g. the person’s values didn’t give any weight to animal suffering and this got locked in.
(See also this post which brings up a similar question)
I’m guessing this definition is meant to separate misalignment from misuse, but I’m curious whether you are including either/both of these 2 cases as misalignment x-risk:
AGI is deployed and we get locked into a great outcome by today’s standards, but we get a world with <=1% of the value of “humanity’s potential”. So we sort of have an existential catastrophe, without a discrete catastrophic event.
The AGI is aligned with someone’s values, and we still get a future with 0-negative value but this is due to a “mistake” e.g. the person’s values didn’t give any weight to animal suffering and this got locked in.
(See also this post which brings up a similar question)
1 - counts for purposes of this question
2 - doesn’t count for purposes of this question (but would be a really big deal!)