Interesting. I think I can tell an intuitive story for why this would be the case, but I’m unsure whether that intuitive story would predict all the details of which models recognize and prefer which other models.
As an intuition pump, consider asking an LLM a subjective multiple-choice question, then taking that answer and asking a second LLM to evaluate it. The evaluation task implicitly asks the the evaluator to answer the same question, then cross-check the results. If the two LLMs are instances of the same model, their answers will be more strongly correlated than if they’re different models; so they’re more likely to mark the answer correct if they’re the same model. This would also happen if you substitute two humans or two sittings of the same human implace of the LLMs.
Interesting. I think I can tell an intuitive story for why this would be the case, but I’m unsure whether that intuitive story would predict all the details of which models recognize and prefer which other models.
As an intuition pump, consider asking an LLM a subjective multiple-choice question, then taking that answer and asking a second LLM to evaluate it. The evaluation task implicitly asks the the evaluator to answer the same question, then cross-check the results. If the two LLMs are instances of the same model, their answers will be more strongly correlated than if they’re different models; so they’re more likely to mark the answer correct if they’re the same model. This would also happen if you substitute two humans or two sittings of the same human implace of the LLMs.