Sorry, we did not see this comment. That is definitely on our minds. Of course humans would know it is an animal welfare assessment pretty quickly. I have given LLMs the assessment rubrik before and confirmed they can score near 100% by knowing what they will be marked on.
Good to know you are considering it! You can try to mitigate social desirability bias by asking the humans to reply as if the question were really about real life situations.
Thanks for sharing! Have you considered comparing the performance of random humans with LLMs?
Sorry, we did not see this comment. That is definitely on our minds. Of course humans would know it is an animal welfare assessment pretty quickly. I have given LLMs the assessment rubrik before and confirmed they can score near 100% by knowing what they will be marked on.
Good to know you are considering it! You can try to mitigate social desirability bias by asking the humans to reply as if the question were really about real life situations.