Thanks for the post! This seems like a clearly important and currently quite neglected area and I’d love to see more work on it.
My current hot-take is that it seems viable to make AGI research labs a sufficiently hardened target that most actors cannot exploit them. But I don’t really see a path to preventing the most well-resourced state actors from at least exfiltrating source code. There’s just so many paths to this: getting insiders to defect, supply chain attacks, etc. Because of this I suspect it’ll be necessary to get major state actors to play ball by other mechanisms (e.g. international treaties, mutually assured destruction). I’m curious if you agree or are more optimistic on this point?
I also want to note that espionage can reduce x-risk in some cases: e.g. actors may be less tempted to cut corners on safety if they have intelligence that their competitors are still far away from transformative AI. Similarly, it could be used as an (admittedly imperfect) mechanism for monitoring compliance with treaties or more informal agreements. I do still expect better infosec to be net-positive, though.
I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
Thanks for the post! This seems like a clearly important and currently quite neglected area and I’d love to see more work on it.
My current hot-take is that it seems viable to make AGI research labs a sufficiently hardened target that most actors cannot exploit them. But I don’t really see a path to preventing the most well-resourced state actors from at least exfiltrating source code. There’s just so many paths to this: getting insiders to defect, supply chain attacks, etc. Because of this I suspect it’ll be necessary to get major state actors to play ball by other mechanisms (e.g. international treaties, mutually assured destruction). I’m curious if you agree or are more optimistic on this point?
I also want to note that espionage can reduce x-risk in some cases: e.g. actors may be less tempted to cut corners on safety if they have intelligence that their competitors are still far away from transformative AI. Similarly, it could be used as an (admittedly imperfect) mechanism for monitoring compliance with treaties or more informal agreements. I do still expect better infosec to be net-positive, though.
I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.