One of the arguments for the existential risks of AGI is based on the orthogonality of the system’s goals and the instrumental goals it would develop regardless of its stated goals. So isn’t the problem having ‘goals,’ ie specific objectives which are being optimized for by taking actions, at all? I’m wondering how this line of argument would apply to a mere ‘oracle’ AGI, e.g. one which emerged out of scaling up a Foundation Model on multimodal data such that we can query it and it will output a prediction. It’s not running unconstrained with the goal of optimized some objective; it’s just been trained to output the most likely next input. How could such a system ‘go rogue’?
What are the risks of an oracle AI?
One of the arguments for the existential risks of AGI is based on the orthogonality of the system’s goals and the instrumental goals it would develop regardless of its stated goals. So isn’t the problem having ‘goals,’ ie specific objectives which are being optimized for by taking actions, at all? I’m wondering how this line of argument would apply to a mere ‘oracle’ AGI, e.g. one which emerged out of scaling up a Foundation Model on multimodal data such that we can query it and it will output a prediction. It’s not running unconstrained with the goal of optimized some objective; it’s just been trained to output the most likely next input. How could such a system ‘go rogue’?