“given that this paper assumes the humans choose the wrong action by accident less than 1% of the time, it seems that the AI should assign a very large amount of evidence to a shutdown command… instead the AI seems to simply ignore it?”
That’s kind-of the point, isn’t it? A value learning system will only “learn” over certain variables, according to the size of the learning space, and the prior that it is given. The examples show how if it has an error in the parameterized reward function (or equivalently in the prior), then a bad outcome will ensue. Although I agree that the examples do say much that is not also presented in the text. Anyway, it is also clear by this point that there is room for improvement on my presentation!
That’s kind-of the point, isn’t it? A value learning system will only “learn” over certain variables, according to the size of the learning space, and the prior that it is given. The examples show how if it has an error in the parameterized reward function (or equivalently in the prior), then a bad outcome will ensue. Although I agree that the examples do say much that is not also presented in the text. Anyway, it is also clear by this point that there is room for improvement on my presentation!