If you apply a security mindset (Murphy’s Law) to the problem of AI alignment, it should quickly become apparent that it is very difficult.
FYI I disagree with this. I think that the difficulty of alignment is a complicated and open question, not something that is quickly apparent. In particular, security mindset is about beating adversaries, and it’s plausible that we train AIs in ways that mostly avoid them treating us as adversaries.
Interesting perspective, although I’m not sure how much we actually disagree. “Complicated and open”, to me reads as “difficult” (i.e. the fact that it is still open means it has remained unsolved. For ~20 years now.).
And re “adversaries”, I feel like this is not really what I’m thinking of when I think about applying security mindset to transformative AI (for the most part—see next para.). “Adversary” seems to be putting too much (malicious) intent into the actions of the AI. Another way of thinking about misaligned transformative AI is as a super powered computer virus that is in someways an automatic process, and kills us (manslaughters us?) as collateral damage. It seeps through every hole that isn’t patched. So eventually, in the limit of superintelligence, all the doom flows through the tiniest crack in otherwise perfect alignment (the tiniest crack in our “defences”).
However, having said that, the term adversaries is totally appropriate when thinking of human actors who might maliciously use transformative AI to cause doom (Misuse risk, as referred to in OP). Any viable alignment solution needs to prevent this from happening too! (Because we now know there will be no shortage of such threats).
Interesting perspective, although I’m not sure how much we actually disagree. “Complicated and open”, to me reads as “difficult”
Is there a rephrasing of the initial statement you would endorse that makes this clearer? I’d suggest “If you apply a security mindset (Murphy’s Law) to the problem of AI alignment, it should quickly become apparent that we do not currently possess the means to ensure that any given AI is safe.”
Yes, I would endorse that phrasing (maybe s/”safe”/”100% safe”). Overall I think I need to rewrite and extend the post to spell things out in more detail. Also change the title to something less provocative[1] because I get the feeling that people are knee-jerk downvoting without even reading it, judging by some of the comments (i.e. I’m having to repeat things I refer to in the OP).
FYI I disagree with this. I think that the difficulty of alignment is a complicated and open question, not something that is quickly apparent. In particular, security mindset is about beating adversaries, and it’s plausible that we train AIs in ways that mostly avoid them treating us as adversaries.
Interesting perspective, although I’m not sure how much we actually disagree. “Complicated and open”, to me reads as “difficult” (i.e. the fact that it is still open means it has remained unsolved. For ~20 years now.).
And re “adversaries”, I feel like this is not really what I’m thinking of when I think about applying security mindset to transformative AI (for the most part—see next para.). “Adversary” seems to be putting too much (malicious) intent into the actions of the AI. Another way of thinking about misaligned transformative AI is as a super powered computer virus that is in someways an automatic process, and kills us (manslaughters us?) as collateral damage. It seeps through every hole that isn’t patched. So eventually, in the limit of superintelligence, all the doom flows through the tiniest crack in otherwise perfect alignment (the tiniest crack in our “defences”).
However, having said that, the term adversaries is totally appropriate when thinking of human actors who might maliciously use transformative AI to cause doom (Misuse risk, as referred to in OP). Any viable alignment solution needs to prevent this from happening too! (Because we now know there will be no shortage of such threats).
Is there a rephrasing of the initial statement you would endorse that makes this clearer? I’d suggest “If you apply a security mindset (Murphy’s Law) to the problem of AI alignment, it should quickly become apparent that we do not currently possess the means to ensure that any given AI is safe.”
Yes, I would endorse that phrasing (maybe s/”safe”/”100% safe”). Overall I think I need to rewrite and extend the post to spell things out in more detail. Also change the title to something less provocative[1] because I get the feeling that people are knee-jerk downvoting without even reading it, judging by some of the comments (i.e. I’m having to repeat things I refer to in the OP).
perhaps “Why the most likely outcome of AGI is doom”?