I think it would be useful to have a rough idea of what experts think about the relative AI x-risk from misalignment (AI doesn’t try do what we want), misuse (AI does what someone—say, a hostile or incompetent actor—wants), or incompetence (AI tries to do what we want, but fails—e.g. because it’s unreliable or doesn’t understand humans well), and also of how tractable work on each of these would be at reducing x-risk. These categories are from Paul Christiano’s ‘Current work in AI alignment’ talk. Obviously, these are extremely difficult questions and we won’t get highly robust estimates, but I think something is better than nothing here.
I think (but not with high confidence) that incompetence seems much less risky than misalignment or misuse:
A sufficiently incompetent AI will be obviously not useful, so won’t be used.
AI capabilities research is related to competence, and is way less constrained and more financially incentivised than alignment or governance research.
Experts like Christiano are less worried about incompetence, in Christiano’s case because he thinks/hopes AI won’t need a deep understanding of humans to not cause doom.
But I find it hard to know where to start when comparing misalignment and misuse. My first attempt was to find existing work. I found
some detailed work on different potential sources of AI risk—but not comparing across misalignment and misuse scenarios,
expert predictions on different sources of AI x-risk (top comment here); their Misuse category excudes cases of misuse involving war. I was also granted access to the private survey writeup, which gives more context. But, unless I’ve missed something, neither explains why the experts have the views they do.
some experts’ opinions about how well we seem to be doing at alignment / how hard alignment is, but nothing on misuse or comparing the tractability of the two.
It’s quite possible I’m missing resources, so this is a call for resources, which could eventually be compiled into a document.
I’m not sure this is the most important set of resources to compile. But besides being of interest to me, I think it could be useful for anyone comparing working in AI governance or alignment, for whom personal fit considerations didn’t dominate.
Compiling resources comparing AI misuse, misalignment, and incompetence risk and tractability
Status: haven’t spent long thinking about
I think it would be useful to have a rough idea of what experts think about the relative AI x-risk from misalignment (AI doesn’t try do what we want), misuse (AI does what someone—say, a hostile or incompetent actor—wants), or incompetence (AI tries to do what we want, but fails—e.g. because it’s unreliable or doesn’t understand humans well), and also of how tractable work on each of these would be at reducing x-risk. These categories are from Paul Christiano’s ‘Current work in AI alignment’ talk. Obviously, these are extremely difficult questions and we won’t get highly robust estimates, but I think something is better than nothing here.
I think (but not with high confidence) that incompetence seems much less risky than misalignment or misuse:
A sufficiently incompetent AI will be obviously not useful, so won’t be used.
AI capabilities research is related to competence, and is way less constrained and more financially incentivised than alignment or governance research.
Experts like Christiano are less worried about incompetence, in Christiano’s case because he thinks/hopes AI won’t need a deep understanding of humans to not cause doom.
But I find it hard to know where to start when comparing misalignment and misuse. My first attempt was to find existing work. I found
some detailed work on different potential sources of AI risk—but not comparing across misalignment and misuse scenarios,
expert predictions on different sources of AI x-risk (top comment here); their Misuse category excudes cases of misuse involving war. I was also granted access to the private survey writeup, which gives more context. But, unless I’ve missed something, neither explains why the experts have the views they do.
some experts’ opinions about how well we seem to be doing at alignment / how hard alignment is, but nothing on misuse or comparing the tractability of the two.
It’s quite possible I’m missing resources, so this is a call for resources, which could eventually be compiled into a document.
I’m not sure this is the most important set of resources to compile. But besides being of interest to me, I think it could be useful for anyone comparing working in AI governance or alignment, for whom personal fit considerations didn’t dominate.