Essentially the more the setup factors out valence-relevant computation (e.g. by separating out a module, or by accessing an oracle as in your example) the less likely it is for valenced processing to happen within the agent.
I think the analogy to humans suggests otherwise. Suppose a human feels pain in their hand due to touching something hot. We can regard all the relevant mechanisms in their body outside the brain—those that cause the brain to receive the relevant signal—as mechanisms that have been “factored out from the brain”. And yet those mechanisms are involved in morally relevant pain. In contrast, suppose a human touches a radioactive material until they realize it’s dangerous. Here there are no relevant mechanisms that have been “factored out from the brain” (the brain needs to use ~general reasoning); and there is no morally relevant pain in this scenario.
Though generally if “factoring out stuff” means that smaller/less-capable neural networks are used, then maybe it can reduce morally relevant valence risks.
Good clarification. Determining which kinds of factoring are the ones which reduce valence is more subtle than I had thought. I agree with you that the DeepMind set-up seems more analogous to neural nociception (e.g. high heat detection). My proposed set-up (Figure 5) seems significantly different from the DM/nociception case, because it factors the step where nociceptive signals affect decision making and motivation. I’ll edit my post to clarify.
I think the analogy to humans suggests otherwise. Suppose a human feels pain in their hand due to touching something hot. We can regard all the relevant mechanisms in their body outside the brain—those that cause the brain to receive the relevant signal—as mechanisms that have been “factored out from the brain”. And yet those mechanisms are involved in morally relevant pain. In contrast, suppose a human touches a radioactive material until they realize it’s dangerous. Here there are no relevant mechanisms that have been “factored out from the brain” (the brain needs to use ~general reasoning); and there is no morally relevant pain in this scenario.
Though generally if “factoring out stuff” means that smaller/less-capable neural networks are used, then maybe it can reduce morally relevant valence risks.
Good clarification. Determining which kinds of factoring are the ones which reduce valence is more subtle than I had thought. I agree with you that the DeepMind set-up seems more analogous to neural nociception (e.g. high heat detection). My proposed set-up (Figure 5) seems significantly different from the DM/nociception case, because it factors the step where nociceptive signals affect decision making and motivation. I’ll edit my post to clarify.