If you want independent criteria-based judgements, it might realistically be a good option to have the judgements made by an LLM—with the benefit of having the classification instantly (as a bonus you could publish the prompt used, so the judgements would be easier for people to audit).
Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.
(I’m curious how much you’ve invested in giving them detailed prompts about what information to assess in applying particular tags, or even more structured workflows, vs just taking smart models and seeing if they can one-shot it; but I don’t really need to know any of this.)
If you want independent criteria-based judgements, it might realistically be a good option to have the judgements made by an LLM—with the benefit of having the classification instantly (as a bonus you could publish the prompt used, so the judgements would be easier for people to audit).
Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.
That makes sense!
(I’m curious how much you’ve invested in giving them detailed prompts about what information to assess in applying particular tags, or even more structured workflows, vs just taking smart models and seeing if they can one-shot it; but I don’t really need to know any of this.)