Only glanced at one or two sections but the âgoal realism is anti-Darwinianâ section seems possibly irrelevant to the argument to me. When you first introduce âgoal realismâ it seems like it is a view that goals are actual internal things somehow âwritten downâ in the brain/âneural net/âother physical mind, so that you could modify the bit of the system where the goal is written down and get different behaviour, rather than there really being nothing that is the representation of the AIs goals, because âgoalsâ are just behavioral dispositions. But the view your criticizing in the âgoal realism is anti-Darwinianâ section is the view that there is always a precise fact of the matter about what exactly is being represented at a particular point in time, rather than several different equally good candidates for what is represented. But I can think of representations are physically real vehicles-say, that some combination of neuron firings is the representation of flys/âblack dots that causes frogs to snap at them-without thinking it is completely determinate what-flies or black dots-is represented by those neuron firings. Determinacy of what a representation represents is not guaranteed just by the fact that a representation exists. ~
EDIT: Also, is Olah-style interpretability working presuming ârepresentation realismâ? Does it provide evidence for it? Evidence for realism about goals specifically? If not, why not?
Only glanced at one or two sections but the âgoal realism is anti-Darwinianâ section seems possibly irrelevant to the argument to me. When you first introduce âgoal realismâ it seems like it is a view that goals are actual internal things somehow âwritten downâ in the brain/âneural net/âother physical mind, so that you could modify the bit of the system where the goal is written down and get different behaviour, rather than there really being nothing that is the representation of the AIs goals, because âgoalsâ are just behavioral dispositions. But the view your criticizing in the âgoal realism is anti-Darwinianâ section is the view that there is always a precise fact of the matter about what exactly is being represented at a particular point in time, rather than several different equally good candidates for what is represented. But I can think of representations are physically real vehicles-say, that some combination of neuron firings is the representation of flys/âblack dots that causes frogs to snap at them-without thinking it is completely determinate what-flies or black dots-is represented by those neuron firings. Determinacy of what a representation represents is not guaranteed just by the fact that a representation exists. ~
EDIT: Also, is Olah-style interpretability working presuming ârepresentation realismâ? Does it provide evidence for it? Evidence for realism about goals specifically? If not, why not?
Reply