EDIT: I noticed that in my examples I primed Claude a little, and when unprimed Claude does not reliably (or usually) get to the answer. However Claude 4.xs are still noticeable in how little handholding they need for this class of conceptual errors, Geminis often takes like 5 hints where Claude usually gets it with one. And my impression was that Claude 3.xs were kinda hopeless (they often don’t get it even with short explanations by me, and when they do, I’m not confident they actually got it vs just wanted to agree).
EDIT: I noticed that in my examples I primed Claude a little, and when unprimed Claude does not reliably (or usually) get to the answer. However Claude 4.xs are still noticeable in how little handholding they need for this class of conceptual errors, Geminis often takes like 5 hints where Claude usually gets it with one. And my impression was that Claude 3.xs were kinda hopeless (they often don’t get it even with short explanations by me, and when they do, I’m not confident they actually got it vs just wanted to agree).