I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, “Here is an experiment we can do to disprove this theory,” then I will pay more attention than I do.
The “ignored physical aspect of computation” is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don’t carry over to effective ones.
You don’t want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.
I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.
I don’t pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.
Will—I think “meta-reasoning” might capture what you mean by “meta-decision theory”. Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?
I agree that bounded rationality is likely to loom large, but I don’t think this means MIRI is barking up the wrong tree… just that other trees also contain parts of the squirrel.
Fixed, thanks.
I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, “Here is an experiment we can do to disprove this theory,” then I will pay more attention than I do.
The “ignored physical aspect of computation” is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don’t carry over to effective ones.
You don’t want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.
I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.
I don’t pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.
Will—I think “meta-reasoning” might capture what you mean by “meta-decision theory”. Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?
I agree that bounded rationality is likely to loom large, but I don’t think this means MIRI is barking up the wrong tree… just that other trees also contain parts of the squirrel.