What would a compute monitoring plan look like? [Linkpost]

Yonadav Shavit (CS PhD student at Harvard) recently released a paper titled What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring.

The paper describes a compute monitoring regime that could allow governments to monitor training runs and detect deviations from training run regulations.

I think it’s one of the most detailed public write-ups about compute governance, and I recommend AI governance folks read (or skim) it. A few highlights below (bolding mine).

Abstract:

As advanced machine learning systems’ capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other’s compliance with potential future international agreements on advanced ML development. This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework’s primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules. At the same time, the system does not curtail the use of consumer computing devices, and maintains the privacy and confidentiality of ML practitioners’ models, data, and hyperparameters. The system consists of interventions at three stages: (1) using on-chip firmware to occasionally save snapshots of the the neural network weights stored in device memory, in a form that an inspector could later retrieve; (2) saving sufficient information about each training run to prove to inspectors the details of the training run that had resulted in the snapshotted weights; and (3) monitoring the chip supply chain to ensure that no actor can avoid discovery by amassing a large quantity of un-tracked chips. The proposed design decomposes the ML training rule verification problem into a series of narrow technical challenges, including a new variant of the Proof-of-Learning problem [Jia et al. ’21].

Solution overview:

In this section, we outline a high-level technical plan, illustrated in Figure 1, for Verifiers to monitor Provers’ ML chips for evidence that a large rule-violating training occurred. The framework revolves around chip inspections: the Verifier will inspect a sufficient random sample of the Prover’s chips (Section 3.2), and confirm that none of these chips contributed to a rule-violating training run. For the Verifier to ascertain compliance from simply inspecting a chip, we will need interventions at three stages: on the chip, at the Prover’s data-center, and in the supply chain.

  • On the chip (Section 4): When the Verifier gets access to a Prover’s chip, they need to be able to confirm whether or not that chip was involved in a rule-violating training run. Given that rule violation depends only 5 Verifying Rules on Large-Scale NN Training via Compute Monitoring on the code that was run, our solution will necessitate that ML chips logging infrequent traces of their activity, with logging done via hardware-backed firmware. We suggest that ML chips’ firmware occasionally log a copy of the current state of the chip’s high-bandwidth memory to long-term storage, and in particular, that it logs the shard of the NN’s weights stored in memory. These weight-snapshots can serve as a fingerprint of the NN training that took place on each chip.

  • At the data-center (Section 5): The Verifier needs a way to interpret the chips’ logs, and determine whether or not they are evidence for a rule-violating training run. To that end, the Prover, who is training the model, will be required to store a transcript of the training process — including training data, hyperparameters, and intermediate weight checkpoints — for each model they train. Using protocols similar to “Proof-of-Learning” [23], these training transcripts may serve as provenance for the logged weight-snapshots, which are themselves the result of the same training process. In practice, for each (hash of a) weight-snapshot logged by a chip, the Prover provides the Verifier (the hashed version of) the matching training transcript. Then the Prover and Verifier jointly and securely verify that, with high probability, retraining using the training transcript would have indeed resulted in the logged weight-snapshot (and that no other valid training transcript could have resulted in that snapshot). Finally, now that the Verifier knows an approximate training transcript of the training run that had been executed on that chip at that time, they can examine properties of the training transcript to confirm that the Prover has complied with the agreed upon rules.

  • At the supply chain (Section 6): The Verifier needs to know which ML chips the Prover owns, so that the Verifier can randomly inspect a representative sample of those chips, to confirm their ownership and that their logging mechanisms are reporting correctly. Without this chip-ownership verification step, a Prover might covertly acquire a large quantity of chips and use them for training without ever notifying the Verifier. Keeping track of chip-ownership is viable because the cutting-edge data-center chip supply chain is highly concentrated, meaning that chips originate from a few initial chokepoints and can be monitored thereafter.

These steps, put together, enable a chain of guarantees:

  • When any organization wants to train a large rule-violating ML model, they must do so using chips that the Verifier is aware of.

  • These chips will occasionally log weight-snapshots. Each time a chip logs a weight-snapshot, the Prover must report the log to the Verifier, along with (hashes of) training transcripts that establish the provenance of that weight-snapshot.

  • If the Prover tries to hide the logs from every chip involved in the training run, the Verifier will eventually find out, because it will sample and physically inspect at least one of those chips with high probability.

  • Also, the hashed training transcripts that the Prover provides along with the logged weight-snapshot need to be authentic, reflecting the training run that was run on the chip. If they are not authentic, they will be caught by the Verifier’s transcript verification procedure.

  • If the training transcript is authentic, and the run violated the rules, then the Verifier can tell, and will catch the Prover.

Thus, so long as the Prover complies with the Verifier’s steps, the Verifier will detect the Prover’s rule-violation with high probability. Just as in financial audits, a Prover’s refusal to comply with the verification steps would itself represent an indication of guilt.

Crossposted from LessWrong (157 points, 9 comments)