I think the title overstates the strength of the conclusion
This seems like an isolated demand for rigor to me. I think it’s fine to say something is “no evidence” when, speaking pedantically, it’s only a negligible amount of evidence.
Ultimately I think you’ve only rebutted one argument for scheming—the counting argument
I mean, we do in fact discuss the simplicity argument, although we don’t go in as much depth.
the way we train AIs—including the data we train them on—could reward AIs that scheme over AIs that are honest and don’t scheme
Without a concrete proposal about what that might look like, I don’t feel the need to address this possibility.
If future AIs are “as aligned as humans”, then AIs will probably scheme frequently
I think future AIs will be much more aligned than humans, because we will have dramatically more control over them than over humans.
I don’t think you need to believe in any strong version of goal realism in order to accept the claim that AIs will intuitively have “goals” that they robustly attempt to pursue.
We did not intend to deny that some AIs will be well-described as having goals.
The goal realism section was an argument in the alternative. If you just agree with us that the indifference principle is invalid, then the counting argument fails, and it doesn’t matter what you think about goal realism.
If you think that some form of indifference reasoning still works— in a way that saves the counting argument for scheming— the most plausible view on which that’s true is goal realism combined with Huemer’s restricted indifference principle. We attack goal realism to try to close off that line of reasoning.