Table 1 shows the techniques used; the teams which were allowed to use SAEs (an interpretability technique) used them; the one which was prohibited from using them searched the data.
Also note that “training data” does not mean “instructions”. Section 3 describes their training process.
Table 1 shows the techniques used; the teams which were allowed to use SAEs (an interpretability technique) used them; the one which was prohibited from using them searched the data.
Also note that “training data” does not mean “instructions”. Section 3 describes their training process.