I’m not really arguing for Bostrom’s position here, but I think there is a sensible interpretation of it.
Goals/motivation = whatever process the AI uses to select actions.
There is an implicit assumption that this process will be simple and of the form “maximize this function over here”. I don’t like this assumption as an assumption about any superintelligent AI system, but it’s certainly true that our current methods of building AI systems (specifically reinforcement learning) are trying to do this, so at minimum you need to make sure that we don’t build AI using reinforcement learning, or that we get it’s reward function right, or that we change how reinforcement learning is done somehow.
If you are literally just taking actions that maximize a particular function, you aren’t going to interpret them using common sense, even if you have the ability to use common sense. Again, I think we could build AI systems that used common sense to interpret human goals—but this is not what current systems do, so there’s some work to be done here.
The arguments you present here are broadly similar to ones that make me optimistic that AI will be good for humanity, but there is work to be done to get there from where we are today.
Hi rohinmshah, I agree that our current methods for building an AI do involve maximising particular functions and have nothing to do with common sense. The problem with extrapolating this to AGI is 1) these sorts of techniques have been applied for decades and have never achieved anything close to human level AI (of course that’s not proof it never can but I am quite skeptical, and Bostrom doesn’t really make the case that such techniques will be likely to lead to human level AI), and 2) as I argue in part 2 of my critique, other parts of Bostrom’s argument rely upon much broader conceptions of intelligence that would entail the AI having common sense.
these sorts of techniques have been applied for decades and have never achieved anything close to human level AI
We also didn’t have the vast amounts of compute that we have today.
other parts of Bostrom’s argument rely upon much broader conceptions of intelligence that would entail the AI having common sense.
My claim is that you can write a program that “knows” about common sense, but still chooses actions by maximizing a function, in which case it’s going to interpret that function literally and not through the lens of common sense. There is currently no way that the “choose actions” part gets routed through the “common sense” part the way it does in humans. I definitely agree that we should try to build an AI system which does interpret goals using common sense—but we don’t know how to do that yet, and that is one of the approaches that AI safety is considering.
I agree with the prediction that AGI systems will interpret goals with common sense, but that’s because I expect that we humans will put in the work to figure out how to build such systems, not because any AGI system that has the ability to use common sense will necessarily apply that ability to interpreting its goals.
If we found out today that someone created our world + evolution in order to create organisms that maximize reproductive fitness, I don’t think we’d start interpreting our sex drive using “common sense” and stop using birth control so that we more effectively achieved the original goal we were meant to perform.
I’m not really arguing for Bostrom’s position here, but I think there is a sensible interpretation of it.
Goals/motivation = whatever process the AI uses to select actions.
There is an implicit assumption that this process will be simple and of the form “maximize this function over here”. I don’t like this assumption as an assumption about any superintelligent AI system, but it’s certainly true that our current methods of building AI systems (specifically reinforcement learning) are trying to do this, so at minimum you need to make sure that we don’t build AI using reinforcement learning, or that we get it’s reward function right, or that we change how reinforcement learning is done somehow.
If you are literally just taking actions that maximize a particular function, you aren’t going to interpret them using common sense, even if you have the ability to use common sense. Again, I think we could build AI systems that used common sense to interpret human goals—but this is not what current systems do, so there’s some work to be done here.
The arguments you present here are broadly similar to ones that make me optimistic that AI will be good for humanity, but there is work to be done to get there from where we are today.
Hi rohinmshah, I agree that our current methods for building an AI do involve maximising particular functions and have nothing to do with common sense. The problem with extrapolating this to AGI is 1) these sorts of techniques have been applied for decades and have never achieved anything close to human level AI (of course that’s not proof it never can but I am quite skeptical, and Bostrom doesn’t really make the case that such techniques will be likely to lead to human level AI), and 2) as I argue in part 2 of my critique, other parts of Bostrom’s argument rely upon much broader conceptions of intelligence that would entail the AI having common sense.
We also didn’t have the vast amounts of compute that we have today.
My claim is that you can write a program that “knows” about common sense, but still chooses actions by maximizing a function, in which case it’s going to interpret that function literally and not through the lens of common sense. There is currently no way that the “choose actions” part gets routed through the “common sense” part the way it does in humans. I definitely agree that we should try to build an AI system which does interpret goals using common sense—but we don’t know how to do that yet, and that is one of the approaches that AI safety is considering.
I agree with the prediction that AGI systems will interpret goals with common sense, but that’s because I expect that we humans will put in the work to figure out how to build such systems, not because any AGI system that has the ability to use common sense will necessarily apply that ability to interpreting its goals.
If we found out today that someone created our world + evolution in order to create organisms that maximize reproductive fitness, I don’t think we’d start interpreting our sex drive using “common sense” and stop using birth control so that we more effectively achieved the original goal we were meant to perform.