The AIs are often bad at reasoning about AI. Indeed, I’d go so far as saying that they’re disproportionately bad at this compared to other conceptually similar activities (think of them reasoning about AI psychology vs human psychology, or forecasting about the world with AI vs without AI), and humans to date have a clear comparative advantage in thinking about AI.
I believe this dynamic is very under-discussed and underrated in thinking about AI.
One non-trivial implication is that it makes people underestimate progress in AI (more than they already do). AI skeptics can point to lack of self-knowledge or other AI-related questions posed to AIs and assume this lack of ability is true across the board, and people who use AI (but don’t think about macro-picture AI questions) can pose forecasting questions to Claude, ChatGPT, Gemini, etc, about AI progress and reliably get an answer that’s more “sane-sounding” and boring than is likely correct.
One question I have is whether we expect this relative deficiency [1] to continue. I think it’s very non-obvious. On the one hand, the AI’s deficiencies in reasoning about themselves are in some sense structural rather than contingent (sparsity in training data, frozen weights, off-target effects due to alignment/control interventions, potentially trained-in skepticism to be less scary, lack of introspective access, etc).
On the other hand, the AI companies are strongly incentivized to create AIs that are good at reasoning about AI (for RSI/AI research reasons, but also more mundane practical applications like having AI managers be more productive at managing AI sub-agents). It’s not clear which effect dominates in the short-medium term [2].
[1] Or framed another way, the relatively strong ability of humans to reason about AI, compared to other things in the human:AI skill profiles.
[2] In the long run this is of course irrelevant.
Next time an AI agent fucks up a task for you, if you ask it (or a different AI) to identify why and suggest ways you can prevent this in the future, I expect the response to be very under-informative, much worse than their ability to help you diagnose communication errors between people.