Why is it useful to think of AI-influenced coordination failures as a major threat model in the alignment landscape? My intuition would be to think of it as falling under capabilities (since the worry, if I understand it, is that—even if AI systems are aligned with their users—bad things will still happen because coordination is hard?).
This may be a disagreement about semantics. As I see it, my goal as an alignment researcher is to do whatever I can to reduce x-risk from powerful AI. And given my skillset, I mostly focus on how I can do this with technical research. And, if there are ways to shape technical development of AI that leads to better cooperation, and this reduces x-risk, I count that as part of the alignment landscape.
Another take is Critch’s description of extending alignment to groups of systems and agents, giving the multi-multi alignment problem of ensuring alignment between groups of humans and groups of AIs who all need to coordinate. I discuss this a bit more in the next post.
You’re right, this seems like mostly semantics. I’d guess it’s most clear/useful to use “alignment” a little more narrowly—reserving it for concepts that actually involve aligning things (i.e. roughly consistently with non-AI-specific uses of the word “alignment”). But the Critch(/Dafoe?) take you bring up seems like a good argument for why AI-influenced coordination failures fall under that.
This may be a disagreement about semantics. As I see it, my goal as an alignment researcher is to do whatever I can to reduce x-risk from powerful AI. And given my skillset, I mostly focus on how I can do this with technical research. And, if there are ways to shape technical development of AI that leads to better cooperation, and this reduces x-risk, I count that as part of the alignment landscape.
Another take is Critch’s description of extending alignment to groups of systems and agents, giving the multi-multi alignment problem of ensuring alignment between groups of humans and groups of AIs who all need to coordinate. I discuss this a bit more in the next post.
You’re right, this seems like mostly semantics. I’d guess it’s most clear/useful to use “alignment” a little more narrowly—reserving it for concepts that actually involve aligning things (i.e. roughly consistently with non-AI-specific uses of the word “alignment”). But the Critch(/Dafoe?) take you bring up seems like a good argument for why AI-influenced coordination failures fall under that.