Thanks for these thoughts. (Your second link is broken, FYI.)
On empirical feedback: my current suspicion is that there are some problems where empirical feedback is pretty hard to get, but I actually think we could get more empirical feedback on how well HRAD can be used to diagnose and solve problems in AI systems. For example, it seems like many AI systems implicitly do some amount of logical-uncertainty-type reasoning (e.g. AlphaGo, which is really all about logical uncertainty over the result of expensive game-tree computations) -- maybe HRAD could be used to understand how those systems could fail?
I’m less convinced that the “ignored physical aspect of computation” is a very promising direction to follow, but I may not fully understand the position you’re arguing for.
I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, “Here is an experiment we can do to disprove this theory,” then I will pay more attention than I do.
The “ignored physical aspect of computation” is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don’t carry over to effective ones.
You don’t want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.
I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.
I don’t pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.
Will—I think “meta-reasoning” might capture what you mean by “meta-decision theory”. Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?
I agree that bounded rationality is likely to loom large, but I don’t think this means MIRI is barking up the wrong tree… just that other trees also contain parts of the squirrel.
My suspicion is that MIRI agrees with you—if you read their job post on their software engineering internship, it seems that they’re looking for people who can rapidly prototype and test AI Alignment ideas that have implications in machine learning.
Thanks for these thoughts. (Your second link is broken, FYI.)
On empirical feedback: my current suspicion is that there are some problems where empirical feedback is pretty hard to get, but I actually think we could get more empirical feedback on how well HRAD can be used to diagnose and solve problems in AI systems. For example, it seems like many AI systems implicitly do some amount of logical-uncertainty-type reasoning (e.g. AlphaGo, which is really all about logical uncertainty over the result of expensive game-tree computations) -- maybe HRAD could be used to understand how those systems could fail?
I’m less convinced that the “ignored physical aspect of computation” is a very promising direction to follow, but I may not fully understand the position you’re arguing for.
Fixed, thanks.
I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, “Here is an experiment we can do to disprove this theory,” then I will pay more attention than I do.
The “ignored physical aspect of computation” is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don’t carry over to effective ones.
You don’t want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions.
I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely.
I don’t pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.
Will—I think “meta-reasoning” might capture what you mean by “meta-decision theory”. Are you familiar with this research (e.g. Nick Hay did a thesis w/Stuart Russell on this topic recently)?
I agree that bounded rationality is likely to loom large, but I don’t think this means MIRI is barking up the wrong tree… just that other trees also contain parts of the squirrel.
My suspicion is that MIRI agrees with you—if you read their job post on their software engineering internship, it seems that they’re looking for people who can rapidly prototype and test AI Alignment ideas that have implications in machine learning.