Humans don’t have obviously formalized goals. But you can formalize human motivation, in which case our final goal is going to be abstract and multifaceted, and it is probably going to be include a very very broad sense of well-being. The model applies just fine.
Because it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute. The proof is trivial, it comes from the very stipulation of what a goal is in the first place. It is just a framework for describing an agent. Now, with this framework, humans’ final goals happen to be complex and difficult to discern, and maybe AI goals will be like that too. But we tend to think that AI goals will not be like that. Omohundro argues some economic reasons in his paper on the “basic AI drives”, but also, it just seems clear that you can program an AI with a particular goal function and that will be all there is to it.
Yes, AI may end up with very different interpretations of its given goal but that seems to be one of the core issues in the value alignment problem that Bostrom is worried about, no?
Thanks for the link about the Fermi paradox. Obviously I could not hope to address all arguments about this issue in my critique here. All I meant to establish is that Bostrom’s argument does rely on particular views about the resolution of that paradox.
You say ‘it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute’. Respectfully I just don’t agree. It all hinges on what is meant by ‘motivation’ and ‘final goal’. You also say ” it just seems clear that you can program an AI with a particular goal function and that will be all there is to it”, and again I disagree. A narrow AI sure, or even a highly competent AI, but not an AI with human level competence in all cognitive activities. Such an AI would have the ability to reflect on its own goals and motivations, because humans have that ability, and therefore it would not be ‘all there is to it’.
Regarding your last point, what I was getting at is that you can change a goal by explicitly rejecting a goal and choosing a new one, or by changing one’s interpretation of an existing goal. This latter method is an alternative path by which an AI could change its goals in practise even if it still regarded itself as following the same goals it was programmed with. My point isn’t that this makes goal alignment not a problem. My point was that this makes the ‘AI will never change its goals’ not a plausible position.
For low probability of other civilizations, see https://arxiv.org/abs/1806.02404.
Humans don’t have obviously formalized goals. But you can formalize human motivation, in which case our final goal is going to be abstract and multifaceted, and it is probably going to be include a very very broad sense of well-being. The model applies just fine.
Because it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute. The proof is trivial, it comes from the very stipulation of what a goal is in the first place. It is just a framework for describing an agent. Now, with this framework, humans’ final goals happen to be complex and difficult to discern, and maybe AI goals will be like that too. But we tend to think that AI goals will not be like that. Omohundro argues some economic reasons in his paper on the “basic AI drives”, but also, it just seems clear that you can program an AI with a particular goal function and that will be all there is to it.
Yes, AI may end up with very different interpretations of its given goal but that seems to be one of the core issues in the value alignment problem that Bostrom is worried about, no?
Hi Zeke!
Thanks for the link about the Fermi paradox. Obviously I could not hope to address all arguments about this issue in my critique here. All I meant to establish is that Bostrom’s argument does rely on particular views about the resolution of that paradox.
You say ‘it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute’. Respectfully I just don’t agree. It all hinges on what is meant by ‘motivation’ and ‘final goal’. You also say ” it just seems clear that you can program an AI with a particular goal function and that will be all there is to it”, and again I disagree. A narrow AI sure, or even a highly competent AI, but not an AI with human level competence in all cognitive activities. Such an AI would have the ability to reflect on its own goals and motivations, because humans have that ability, and therefore it would not be ‘all there is to it’.
Regarding your last point, what I was getting at is that you can change a goal by explicitly rejecting a goal and choosing a new one, or by changing one’s interpretation of an existing goal. This latter method is an alternative path by which an AI could change its goals in practise even if it still regarded itself as following the same goals it was programmed with. My point isn’t that this makes goal alignment not a problem. My point was that this makes the ‘AI will never change its goals’ not a plausible position.