A large reason to focus on opaque components of larger systems is that difficult-to-handle and existentially risky misalignment concerns are most likely to occur within opaque components rather than emerge from human built software.
Yep, this sounds positive to me. I imagine it’s difficult to do this well, but to the extent it can be done, I expect such work to generalize more than a lot of LLM-specific work.
> I don’t see any plausible x-risk threat models that emerge directly from AI software written by humans?
I don’t feel like that’s my disagreement. I’m expecting humans to create either [dangerous system that’s basically one black-box LLM] or [something very different that’s also dangerous, like a complex composite system]. I expect AIs can also make either system.
Yep, this sounds positive to me. I imagine it’s difficult to do this well, but to the extent it can be done, I expect such work to generalize more than a lot of LLM-specific work.
I don’t feel like that’s my disagreement. I’m expecting humans to create either [dangerous system that’s basically one black-box LLM] or [something very different that’s also dangerous, like a complex composite system]. I expect AIs can also make either system.