I agree that there are significant concerns here! FWIW I’m more concerned about the adversarially-manipulated layer (at least at something needing attention now). I think that a lot of these applications could work with systems that aren’t much stronger than what we have today; but that getting effective misaligned scheming would require a significant step up in capabilities. (You might have weaker forms of misalignment, but I think that those are pretty similar to “the systems just aren’t really good enough yet”.)
I thought that part of the core thesis was that as we go through the intelligence explosion, coordination tech becomes increasingly valuable (maybe critical). Are you saying that it’s plausible that we’ll get “good enough” coordination tech out of agents that are much less powerful that than the frontier during the IE? E.g. coordination tech generally uses Opus 4.7, even in the Opus 6-8 era, where coordination tech seems most (?) valuable, but we also have much more legitimate concerns about scheming capabilities?
Plausible, yes. For one thing you can run versions of the coordination tech in parallel with old cheap models, and flag and dig into discrepancies. This could make it harder for misalignment to strongly bite.
Of course if there are big misalignment issues and we’re not seriously tracking that there could be big misalignment issues, that’s gonna be a problem.
A separate cluster of threat models that is worth disentangling is creating more surface area for anti-human-user coordination within the economy, particularly if it’s much easier for smart, misaligned AI systems to coordinate with relatively stupid, corrigible AI systems (e.g., Opus 4.7). The arguments for AI <> AI coordination advantage (over AI <> human) are quite intuitive to me, but I don’t think you actually need an asymmetry here to put society in a more vulnerable state than the current one. I don’t have a great sense of how this washes out, but it feels like a crux for evaluating the net benefit of coordination tech.
Similar to how traditional → digital banking probably creates more surface area for exploitation by computer hackers, it’s probably very good to have primitive computers touching nukes rather than more modern ones.
I agree that there are significant concerns here! FWIW I’m more concerned about the adversarially-manipulated layer (at least at something needing attention now). I think that a lot of these applications could work with systems that aren’t much stronger than what we have today; but that getting effective misaligned scheming would require a significant step up in capabilities. (You might have weaker forms of misalignment, but I think that those are pretty similar to “the systems just aren’t really good enough yet”.)
I thought that part of the core thesis was that as we go through the intelligence explosion, coordination tech becomes increasingly valuable (maybe critical). Are you saying that it’s plausible that we’ll get “good enough” coordination tech out of agents that are much less powerful that than the frontier during the IE? E.g. coordination tech generally uses Opus 4.7, even in the Opus 6-8 era, where coordination tech seems most (?) valuable, but we also have much more legitimate concerns about scheming capabilities?
Plausible, yes. For one thing you can run versions of the coordination tech in parallel with old cheap models, and flag and dig into discrepancies. This could make it harder for misalignment to strongly bite.
Of course if there are big misalignment issues and we’re not seriously tracking that there could be big misalignment issues, that’s gonna be a problem.
A separate cluster of threat models that is worth disentangling is creating more surface area for anti-human-user coordination within the economy, particularly if it’s much easier for smart, misaligned AI systems to coordinate with relatively stupid, corrigible AI systems (e.g., Opus 4.7). The arguments for AI <> AI coordination advantage (over AI <> human) are quite intuitive to me, but I don’t think you actually need an asymmetry here to put society in a more vulnerable state than the current one. I don’t have a great sense of how this washes out, but it feels like a crux for evaluating the net benefit of coordination tech.
Similar to how traditional → digital banking probably creates more surface area for exploitation by computer hackers, it’s probably very good to have primitive computers touching nukes rather than more modern ones.