Onni Aarne comments on Preventing AI Misuse: State of the Art Research and its Flaws

Onni Aarne Apr 23, 2023, 2:04 PM
6 points
0 ∶ 0
This was a great overview, thanks!
I guess I was left a bit confused as to what the goal and threat/trust model is here. I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning? (I guess this doesn’t do anything to prevent misuse that can be achieved with the unmodified weights? You might want to make this clearer.)
As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device. Additionally, you have to have the decryption keys there as well (even in the case of the data preprocessing solution, you have to “encrypt” new data on the device for inference, right?). By default the user should be able to just read them out of there? (The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don’t understand their scheduling solution or TPMs well enough to know if that’s feasible, but I’m intuitively suspicious. Still, it doesn’t solve the issue that the decrypted weights still need to be in memory at some point.) Given this, I don’t really understand how any of these papers improve over the solution of “just encrypt the weights when not in use”? I feel like there must be something I’m missing here.

On the other hand, I think you could reasonably solve all of this just by running the model inside a “trusted execution environment” (TEE). The model would only be decrypted in the TEE, where it can’t be accessed, even by the OS. For example, the H100 supports “confidential computing”, which is supposed to enabled secure multiparty computation. And I think this problem can be thought of as a special case. The classic case of secure multiparty computation is data pooling: Multiple parties can collaborate to train a model on all of their data, without the different parties having access to each other’s data. (See page 10 here) In this case, the model developer contributes the “data” of what the model weights are, and the user contributes the data on which inference is to be run, right?
But TEEs are only decently secure: If you’re worried about genuinely sophisticated actors, e.g. nation states, you should probably not count on them.
Anyway, I’m looking forward to seeing your future work on this!
PS: Something related to consider may be model extraction attacks: If the user can just train an equivalent model by training against the “encrypted” model (and maybe leveraging some side channels), the encryption won’t be very useful. I’m not sure how feasible this is in practice, but this is certainly a key consideration for whether this kind of “encryption” approach adds much value.
- Madhav Malhotra Apr 23, 2023, 5:46 PM
  1 point
  0 ∶ 0
  Parent
  Thank you for your thoughtful questions!
  
  RE: “I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?”
  You’re correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways.
  These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).
  RE: “without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?”
  The four papers I presented don’t focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).
  Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model.
  RE: “As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device.” and “The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don’t understand their scheduling solution or TPMs well enough to know if that’s feasible, but I’m intuitively suspicious”
  You’re correct about the technical hypotheses you had about when models is unencrypted parameters are stored in memory. I agree, the authors generally give vague explanations for how to keep the keys of the models secure.
  Personally, I saw the presented techniques as mainly reducing the easiest opportunities for misuse (ex: a sufficiently well-funded actor like a state or large company could plausibly bypass these techniques, whereas a rogue hacker group may lack the knowledge or resources to do so). This is a useful (but not complete) start, since it means that fewer parties with more predictable incentives can be regulated regarding their use of AI. This is relatively preferred compared to the difficulty of regulating the use of a model like LLaMA (or more advanced) after it is publicly leaked.
  RE: Given this, I don’t really understand how any of these papers improve over the solution of “just encrypt the weights when not in use”? I feel like there must be something I’m missing here.
  You can think of the DeepLock paper as “just encrypt the weights when not in use.” Then, the AdvParams paper becomes: “be intelligent about which parameters you encrypt so that you don’t have to encrypt/decrypt every single parameter out of millions-billions”
  In contrast, the preprocessed input paper has nothing to do with encrypting weights. Its aim is to make the possession of the parameters useless (whether encrypted or not), unless you can preprocess your input in the right way with the secret key.
  The hardware accelerated retraining paper is similar in that the model’s parameters are intended to be useless (encrypted or not) without the secret key and the hardware scheduling algorithm that determines which neurons get associated with which key. Here, the key is needed to flip the signs of the right weighted inputs at inference time.
  RE: Trusted Multiparty Computing
  Yes, your analogy is insightful about thinking of the model weights as data contributed by the developer and the in prince data as being contributed by the and user. I certainly agree with (Shevlane, 2022) that we should aim for these kinds of trusted execution environments whenever possible.
  However, this may not be possible for all use cases. (I’ve just been listing the one example with a self-driving car that doesn’t have local trusted computing hardware for cost-efficiency purposes, but cannot use servers with these devices for latency reasons. There are lots of other examples in the real world, however.) The other thing to note is that different solutions can be used in combination as “layers of defence.” (Ex: encrypt parameter snapshots from training that aren’t actively being used, while deploying the most updated parameter snapshot with trusted hardware—assuming this is possible for the use case being considered.)
  RE: Model Stealing and Side-Channel Attacks
  Yes, the current techniques have important limitations that still need to be fixed (including these attacks and just basic fine-tuning as I showed with some of the techniques above). There’s a long way to go in deploying AI algorithms securely :-) In some ways, we’re solving this problem at an unprecedented scale after generalised models like ChatGPT became useful to many actors, without the need for any fine tuning. Though an argument is made about how the Google Cloud Computer Vision platform also faced a similar problem previously (Shevlane, 2022).