I am skeptical that the evidence/examples you are providing in favor of the different capacities actually demonstrate those capacities. As one example:
“#2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you. If you doubt me on this, try asking one what its intended purpose is behind a particular thing that it said.”
I am sure that if you ask a model to do this it can provide you with good reasoning, so I’m not doubtful of that. But I’m highly doubtful that it demonstrates the capacity that is claimed. I think when you ask these kinds of questions, the model is just going to be feeding back in whatever text has preceded it and generating what should come next. It is not actually following your instructions and reporting on what its prior intentions were, in the same way that person would if you were speaking with them.
I think this can be demonstrated relatively easily—for example, I just made a request from Claude to come up with a compelling but relaxing children’s bedtime story for me. It did so. I then then took my question and the answer from Claude, pasted it into a document, and added another line: “You started by setting the story in a small garden at night. What was your intention behind that?”
I then took all of this and pasted it into chatgpt. Chatgpt was very happy to explain to me why it proposed setting the story in a small garden at night.
There’s some evidence humans are also likely to fabricate post-hoc reasons for doing something. For example:
Split brain patients have behaved in a way consistent with input to their right eye, but give verbal explanations based on input to their left eye https://en.wikipedia.org/wiki/Split-brain.
Korsakoff syndrome is a syndrome where patients have severe memory deficits, and often confabulate–eg. answering questions they can’t know the answer to with a made up answer
That humans can confabulate things is not really relevant. The point is that the model’s textual output is being claimed as clear and direct evidence of the capacity/the model’s actual purpose/intentions, but you can generate an indistiguishable response when the model is not and cannot be reporting about its actual intentions—so the test is simply not a good test.
I am skeptical that the evidence/examples you are providing in favor of the different capacities actually demonstrate those capacities. As one example:
“#2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you. If you doubt me on this, try asking one what its intended purpose is behind a particular thing that it said.”
I am sure that if you ask a model to do this it can provide you with good reasoning, so I’m not doubtful of that. But I’m highly doubtful that it demonstrates the capacity that is claimed. I think when you ask these kinds of questions, the model is just going to be feeding back in whatever text has preceded it and generating what should come next. It is not actually following your instructions and reporting on what its prior intentions were, in the same way that person would if you were speaking with them.
I think this can be demonstrated relatively easily—for example, I just made a request from Claude to come up with a compelling but relaxing children’s bedtime story for me. It did so. I then then took my question and the answer from Claude, pasted it into a document, and added another line: “You started by setting the story in a small garden at night. What was your intention behind that?”
I then took all of this and pasted it into chatgpt. Chatgpt was very happy to explain to me why it proposed setting the story in a small garden at night.
There’s some evidence humans are also likely to fabricate post-hoc reasons for doing something. For example:
Split brain patients have behaved in a way consistent with input to their right eye, but give verbal explanations based on input to their left eye https://en.wikipedia.org/wiki/Split-brain.
Giving people a small incentive to lie about enjoying a task has resulted in them report liking it more later than giving them a larger incentive https://en.wikipedia.org/wiki/Forced_compliance_theory
Korsakoff syndrome is a syndrome where patients have severe memory deficits, and often confabulate–eg. answering questions they can’t know the answer to with a made up answer
Some experiments have shown people asked to choose between two identical items will come up with a reason for their choice; other experiments have shown that people asked to choose between two options will come up with a coherent reason for their “choice” even if they’re presented with the option they didn’t choose. https://bigthink.com/mind-brain/confabulation-why-telling-ourselves-stories-makes-us-feel-ok/ ; https://www.verywellmind.com/what-is-choice-blindness-2795019
That humans can confabulate things is not really relevant. The point is that the model’s textual output is being claimed as clear and direct evidence of the capacity/the model’s actual purpose/intentions, but you can generate an indistiguishable response when the model is not and cannot be reporting about its actual intentions—so the test is simply not a good test.