Thanks for the reply. I think the talk of 20 years is a red herring as we might only have 2 years (or less). Re your example of “A Conjunctive Case for the Disjunctive Case for Doom”, I don’t find the argument convincing because you use 20 years. Can you make the same arguments s/20/2?
And what I’m arguing is not that we are doomed by default, but the conditional on being doomed given AGI; P(doom|AGI). I’m actually reasonably optimistic that we can just stop building AGI and therefore won’t be doomed! And that’s what I’m working toward (yes, it’s going to be a lot of work; I’d appreciate more help).
On my way of viewing things, an argument for a disjunctive framing shows that “failure on intent alignment (with success in the other areas) leads to a high P(Doom | AGI), failure on outer alignment alignment (with success in the other areas) leads to a high P(Doom | AGI), etc …”. I think that you have not shown this for any of the disjuncts
Isn’t it obvious that none of {outer alignment, inner alignment, misuse risk, multipolar coordination} have come anywhere close to being solved? Do I really need to summarise progress to date and show why it isn’t a solution, when no one is even claiming to have a viable, scalable, solution to any of them!? Isn’t it obvious that current models are only safe because they are weak? Will Claude-3 spontaneously just decide not to make napalm with the Grandma’s bedtime story napalm recipe jailbreak when it’s powerful enough to do so and hooked up to a chemical factory?
So far, I’ve discussed just one disjunct, but I can imagine outlining similar assumptions for the other disjuncts.
Ok, but you really need to defeat all of them given that they are disjuncts!
I don’t think instrumental convergence alone gets you to ‘doom with >50%’.
Can you elaborate more on this? Is it because you expect AGIs to spontaneously be aligned enough to not doom us?
I’m unclear what, exactly, your arguments are meant to be. Also, I would personally find it much easier to engage with arguments in premise-conclusion format
Judging by the overall response to this post, I do think it needs a rewrite.
Thanks for the reply. I think the talk of 20 years is a red herring as we might only have 2 years (or less). Re your example of “A Conjunctive Case for the Disjunctive Case for Doom”, I don’t find the argument convincing because you use 20 years. Can you make the same arguments s/20/2?
And what I’m arguing is not that we are doomed by default, but the conditional on being doomed given AGI; P(doom|AGI). I’m actually reasonably optimistic that we can just stop building AGI and therefore won’t be doomed! And that’s what I’m working toward (yes, it’s going to be a lot of work; I’d appreciate more help).
Isn’t it obvious that none of {outer alignment, inner alignment, misuse risk, multipolar coordination} have come anywhere close to being solved? Do I really need to summarise progress to date and show why it isn’t a solution, when no one is even claiming to have a viable, scalable, solution to any of them!? Isn’t it obvious that current models are only safe because they are weak? Will Claude-3 spontaneously just decide not to make napalm with the Grandma’s bedtime story napalm recipe jailbreak when it’s powerful enough to do so and hooked up to a chemical factory?
Ok, but you really need to defeat all of them given that they are disjuncts!
Can you elaborate more on this? Is it because you expect AGIs to spontaneously be aligned enough to not doom us?
Judging by the overall response to this post, I do think it needs a rewrite.