I think the title overstates the strength of the conclusion
This seems like an isolated demand for rigor to me. I think it’s fine to say something is “no evidence” when, speaking pedantically, it’s only a negligible amount of evidence.
Ultimately I think you’ve only rebutted one argument for scheming—the counting argument
I mean, we do in fact discuss the simplicity argument, although we don’t go in as much depth.
the way we train AIs—including the data we train them on—could reward AIs that scheme over AIs that are honest and don’t scheme
Without a concrete proposal about what that might look like, I don’t feel the need to address this possibility.
If future AIs are “as aligned as humans”, then AIs will probably scheme frequently
I think future AIs will be much more aligned than humans, because we will have dramatically more control over them than over humans.
I don’t think you need to believe in any strong version of goal realism in order to accept the claim that AIs will intuitively have “goals” that they robustly attempt to pursue.
We did not intend to deny that some AIs will be well-described as having goals.
Over on LessWrong, the phrase is more common, but the tophits are multiple posts that specifically argue against the phrase in the abstract. So overall I would not consider it an isolated demand for rigor if someone were to argue against the phrase “no evidence” on either forum.
This seems like an isolated demand for rigor to me. I think it’s fine to say something is “no evidence” when, speaking pedantically, it’s only a negligible amount of evidence.
I think that’s fair, but I’m still admittedly annoyed at this usage of language. I don’t think it’s an isolated demand for rigor because I have personally criticized many other similar uses of “no evidence” in the past.
I think future AIs will be much more aligned than humans, because we will have dramatically more control over them than over humans.
That’s plausible to me, but I’m perhaps not as optimistic as you are. I think AIs might easily end up becoming roughly as misaligned with humans as humans are to each other, at least eventually.
We did not intend to deny that some AIs will be well-described as having goals.
If you agree that AIs will intuitively have goals that they robustly pursue, I guess I’m just not sure why you thought it was important to rebut goal realism? You wrote,
The goal realist perspective relies on a trick of language. By pointing to a thing inside an AI system and calling it an “objective”, it invites the reader to project a generalized notion of “wanting” onto the system’s imagined internal ponderings, thereby making notions such as scheming seem more plausible.
But I think even on a reductionist view, it can make sense to talk about AIs “wanting” things, just like it makes sense to talk about humans wanting things. I’m not sure why you think this distinction makes much of a difference.
The goal realism section was an argument in the alternative. If you just agree with us that the indifference principle is invalid, then the counting argument fails, and it doesn’t matter what you think about goal realism.
If you think that some form of indifference reasoning still works— in a way that saves the counting argument for scheming— the most plausible view on which that’s true is goal realism combined with Huemer’s restricted indifference principle. We attack goal realism to try to close off that line of reasoning.
This seems like an isolated demand for rigor to me. I think it’s fine to say something is “no evidence” when, speaking pedantically, it’s only a negligible amount of evidence.
I mean, we do in fact discuss the simplicity argument, although we don’t go in as much depth.
Without a concrete proposal about what that might look like, I don’t feel the need to address this possibility.
I think future AIs will be much more aligned than humans, because we will have dramatically more control over them than over humans.
We did not intend to deny that some AIs will be well-described as having goals.
Minor, but: searching on the EA Forum, your post and Quentin Pope’s post are the only posts with the exact phrase “no evidence” (EDIT: in the title, which weakens my point significantly but it still holds) The closest other match on the first page is There is little (good) evidence that aid systematically harms political institutions, which to my eyes seem substantially more caveated.
Over on LessWrong, the phrase is more common, but the top hits are multiple posts that specifically argue against the phrase in the abstract. So overall I would not consider it an isolated demand for rigor if someone were to argue against the phrase “no evidence” on either forum.
I think that’s fair, but I’m still admittedly annoyed at this usage of language. I don’t think it’s an isolated demand for rigor because I have personally criticized many other similar uses of “no evidence” in the past.
That’s plausible to me, but I’m perhaps not as optimistic as you are. I think AIs might easily end up becoming roughly as misaligned with humans as humans are to each other, at least eventually.
If you agree that AIs will intuitively have goals that they robustly pursue, I guess I’m just not sure why you thought it was important to rebut goal realism? You wrote,
But I think even on a reductionist view, it can make sense to talk about AIs “wanting” things, just like it makes sense to talk about humans wanting things. I’m not sure why you think this distinction makes much of a difference.
The goal realism section was an argument in the alternative. If you just agree with us that the indifference principle is invalid, then the counting argument fails, and it doesn’t matter what you think about goal realism.
If you think that some form of indifference reasoning still works— in a way that saves the counting argument for scheming— the most plausible view on which that’s true is goal realism combined with Huemer’s restricted indifference principle. We attack goal realism to try to close off that line of reasoning.