I think the OP’s argument depends on the idea that “Nobody is going to debug the geopolitical abilities of an AI designed to build paperclips. So the fact that debugging occurs in one domain is no guarantee of success in any other.” If AI’s have human level or above capacities in the domains relevant to forming an initial plan to attempt to take over the world and beginning that plan, but have subhuman capacities/bugs in the further stages of that plan, then assuming that at least human level capacities are needed in the latter domains in order to succeed, the threshold could be pretty large, as AI’s could keep getting smarter at domains related to the initial stages of the plan which are presumably closer to the distributions it has been trained on (e. g. social manipulation/text outputting to escape a box) while failing to make as much progress in the more OOD domains.
Part of my second point is that smart people figure out for themselves what they need to know in new domains, and my definition of “general intelligence” there is little reason to think an AGI will be different. The analogies to ANI with domain specific knowledge which doesn’t generalize well seems to ignore this—though I agree it’s a reason to be slightly less worried that ANI systems could scale in ways that pose risks, without developing generalized intelligence first.
I mostly agree with you that if we get AGI and not ANI, the AGI will be able to learn the skills relevant to taking over the world. However, I think that due to inductive biases and quasi-innate intuitions, different generally intelligent systems are differently able to learn different domains. For example, it is very difficult for autistic people (particularly severely autistic people) to learn social skills. Similarly, high-quality philosophical thinking seems to be basically impossible for most humans. Applying this to AGI, it might be very hard to AGI to learn how to make long term plans or social skills.
I think the OP’s argument depends on the idea that “Nobody is going to debug the geopolitical abilities of an AI designed to build paperclips. So the fact that debugging occurs in one domain is no guarantee of success in any other.” If AI’s have human level or above capacities in the domains relevant to forming an initial plan to attempt to take over the world and beginning that plan, but have subhuman capacities/bugs in the further stages of that plan, then assuming that at least human level capacities are needed in the latter domains in order to succeed, the threshold could be pretty large, as AI’s could keep getting smarter at domains related to the initial stages of the plan which are presumably closer to the distributions it has been trained on (e. g. social manipulation/text outputting to escape a box) while failing to make as much progress in the more OOD domains.
Part of my second point is that smart people figure out for themselves what they need to know in new domains, and my definition of “general intelligence” there is little reason to think an AGI will be different. The analogies to ANI with domain specific knowledge which doesn’t generalize well seems to ignore this—though I agree it’s a reason to be slightly less worried that ANI systems could scale in ways that pose risks, without developing generalized intelligence first.
I mostly agree with you that if we get AGI and not ANI, the AGI will be able to learn the skills relevant to taking over the world. However, I think that due to inductive biases and quasi-innate intuitions, different generally intelligent systems are differently able to learn different domains. For example, it is very difficult for autistic people (particularly severely autistic people) to learn social skills. Similarly, high-quality philosophical thinking seems to be basically impossible for most humans. Applying this to AGI, it might be very hard to AGI to learn how to make long term plans or social skills.