Madhav Malhotra comments on Preventing AI Misuse: State of the Art Research and its Flaws

Madhav Malhotra Apr 23, 2023, 5:46 PM
1 point
0 ∶ 0
Thank you for your thoughtful questions!

RE: “I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?”
You’re correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways.
These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).
RE: “without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?”
The four papers I presented don’t focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).
Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model.
RE: “As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device.” and “The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don’t understand their scheduling solution or TPMs well enough to know if that’s feasible, but I’m intuitively suspicious”
You’re correct about the technical hypotheses you had about when models is unencrypted parameters are stored in memory. I agree, the authors generally give vague explanations for how to keep the keys of the models secure.
Personally, I saw the presented techniques as mainly reducing the easiest opportunities for misuse (ex: a sufficiently well-funded actor like a state or large company could plausibly bypass these techniques, whereas a rogue hacker group may lack the knowledge or resources to do so). This is a useful (but not complete) start, since it means that fewer parties with more predictable incentives can be regulated regarding their use of AI. This is relatively preferred compared to the difficulty of regulating the use of a model like LLaMA (or more advanced) after it is publicly leaked.
RE: Given this, I don’t really understand how any of these papers improve over the solution of “just encrypt the weights when not in use”? I feel like there must be something I’m missing here.
You can think of the DeepLock paper as “just encrypt the weights when not in use.” Then, the AdvParams paper becomes: “be intelligent about which parameters you encrypt so that you don’t have to encrypt/decrypt every single parameter out of millions-billions”
In contrast, the preprocessed input paper has nothing to do with encrypting weights. Its aim is to make the possession of the parameters useless (whether encrypted or not), unless you can preprocess your input in the right way with the secret key.
The hardware accelerated retraining paper is similar in that the model’s parameters are intended to be useless (encrypted or not) without the secret key and the hardware scheduling algorithm that determines which neurons get associated with which key. Here, the key is needed to flip the signs of the right weighted inputs at inference time.
RE: Trusted Multiparty Computing
Yes, your analogy is insightful about thinking of the model weights as data contributed by the developer and the in prince data as being contributed by the and user. I certainly agree with (Shevlane, 2022) that we should aim for these kinds of trusted execution environments whenever possible.
However, this may not be possible for all use cases. (I’ve just been listing the one example with a self-driving car that doesn’t have local trusted computing hardware for cost-efficiency purposes, but cannot use servers with these devices for latency reasons. There are lots of other examples in the real world, however.) The other thing to note is that different solutions can be used in combination as “layers of defence.” (Ex: encrypt parameter snapshots from training that aren’t actively being used, while deploying the most updated parameter snapshot with trusted hardware—assuming this is possible for the use case being considered.)
RE: Model Stealing and Side-Channel Attacks
Yes, the current techniques have important limitations that still need to be fixed (including these attacks and just basic fine-tuning as I showed with some of the techniques above). There’s a long way to go in deploying AI algorithms securely :-) In some ways, we’re solving this problem at an unprecedented scale after generalised models like ChatGPT became useful to many actors, without the need for any fine tuning. Though an argument is made about how the Google Cloud Computer Vision platform also faced a similar problem previously (Shevlane, 2022).