Denis comments on Marcel D’s Quick takes

Denis 13 Mar 2024 22:33 UTC
1 point
0 ∶ 0
There are some major differences with the type of standards that NIST usually produces. Perhaps the most obvious is that a good AI model can teach itself to pass any standardised test. A typical standard is very precisely defined in order to be reproducible by different testers. But if you make such a clear standard test for an LLM, it would, say, be a series of standard prompts or tasks, which would be the same no matter who typed them in. But in such a case, the model just trains itself on how to answer these prompts, or follows the Volkswagen model of learning how to recognize that it’s being evaluated, and to behave accordingly, which won’t be hard if the testing questions are standard.

So the test tells you literally nothing useful about the model.

I don’t think NIST (or anyone outside the AI community) has experience with the kind of evals that are needed for models, which will need to be designed specifically to be unlearnable. The standards will have to include things like red-teaming in which the model cannot know what specific tests it will be subjected to. But it’s very difficult to write a precise description of such an evaluation which could be applied consistently.

In my view this is a major challenge for model evaluation. As a chemical engineer, I know exactly what it means to say that a machine has passed a particular standard test. And if I’m designing the equipment, I know exactly what standards it has to meet. It’s not at all obvious how this would work for an LLM.