One other issue I thought of since my other comment: you list several valid critiques that the AI made that you’d already identified, but were not in the provided training materials. You state that this gives additional credence to the helpfulness of the models:
three we were already planning to look into but weren’t in the source materials we provided (which gives us some additional confidence in AI’s ability to generate meaningful critiques of our work in the future—especially those we’ve looked at in less depth).
However, just because the critique is not in the provided source materials, it doesn’t mean that it’s not in the wider training data of the LLM model. So for example, if Givewell talked about the identified issue of “optimal chlorine doses” in a blog comment or something, and that blog site got scraped into the LLM, then the critique is not a sign of LLM usefulness: they may just be parroting back your own findings to you.
One other issue I thought of since my other comment: you list several valid critiques that the AI made that you’d already identified, but were not in the provided training materials. You state that this gives additional credence to the helpfulness of the models:
However, just because the critique is not in the provided source materials, it doesn’t mean that it’s not in the wider training data of the LLM model. So for example, if Givewell talked about the identified issue of “optimal chlorine doses” in a blog comment or something, and that blog site got scraped into the LLM, then the critique is not a sign of LLM usefulness: they may just be parroting back your own findings to you.
That’s a great point! I should have clarified that those critiques were neither in the source materials nor in our public writings on those topics.