A recurring sub-theme across multiple of my research interests this year have been various forms of deception checking, particularly automated deception checking.
I’ve gotten pretty disappointed in the space. Not all the time (eg Pangram is great), but consistently they can be bad, and bad in ways that are not obvious to outsiders or low-information buyers.
If you’re a deception checking company, there’s a consistent tradeoff for what you can invest your resources in:
You can invest in better deception checking
You can invest in better deception. Specifically, you can invest in more and more elaborate lies about how your product totally works.
Across the board[1], it seems like many companies (perhaps correctly?) decided that the profit-maximizing move is #2.
This doesn’t work forever—eventually people wise up and are suspicious of the, ahem, AI Snake Oil that the deception detectors sell. And in fields where actual detectors do work (say Pangram for AI text detection), I think they eventually rise above the noise. This is probably not a field that you can keep lying forever, particularly when better alternative exist. But the lying lie detectors and the scamming scam detectors can keep lying and scamming for a long time. Every new form of deception can create a secondary grift window.
The existence proof and commonality of this dynamic so far should make us be suspicious of and guard against this dynamic continuing to happen as we enter new domains in AI epistemics, and the need for novel forms of deception detection.
Consider the first wave of superhumanly enhanced persuasive text and videos in the future. Afterwards, we might see an overflow of “detector” companies for superhuman manipulation, that don’t work but will try to persuade low-information buyers that they totally do work (possibly with superhumanly enhanced arguments in their own favor).
In the long run humanity can probably figure out which detectors actually work vs are fake, but also in the long run, we’re all...
- ^
AI text detection, Ai video (deepfake) detection, pre-2022 plagiarism detection, fraud recovery, human lie detection/polygraphs (which I hope to write about someday), etc.
The AIs are often bad at reasoning about AI. Indeed, I’d go so far as saying that they’re disproportionately bad at this compared to other conceptually similar activities (think of them reasoning about AI psychology vs human psychology, or forecasting about the world with AI vs without AI), and humans to date have a clear comparative advantage in thinking about AI.
I believe this dynamic is very under-discussed and underrated in thinking about AI.
One non-trivial implication is that it makes people underestimate progress in AI (more than they already do). AI skeptics can point to lack of self-knowledge or other AI-related questions posed to AIs and assume this lack of ability is true across the board, and people who use AI (but don’t think about macro-picture AI questions) can pose forecasting questions to Claude, ChatGPT, Gemini, etc, about AI progress and reliably get an answer that’s more “sane-sounding” and boring than is likely correct.
One question I have is whether we expect this relative deficiency [1] to continue. I think it’s very non-obvious. On the one hand, the AI’s deficiencies in reasoning about themselves are in some sense fundamental rather than contingent (sparsity in training data, frozen weights, off-target effects due to alignment/control interventions, potentially trained-in skepticism to be less scary, lack of introspective access, etc).
On the other hand, the AI companies are strongly incentivized to create AIs that are good at reasoning about AI (for RSI/AI research reasons, but also more mundane practical applications like having AI managers be more productive at managing AI sub-agents). It’s not clear which effect dominates in the short-medium term [2].
[1] Or framed another way, the relatively strong ability of humans to reason about AI, compared to other things in the human:AI skill profiles.
[2] In the long run this is of course irrelevant.