I hear this all the time, but I also notice that people saying it have not investigated the fundamental limits to controllability that you would encounter with any control system.
As a philosopher, would you not want to have a more generalisable and robust argument that this is actually going to work out?
Of course I’d prefer to have something more robust. But I don’t think the lack of that means it’s necessarily useless.
I don’t think control is likely to scale to arbitrarily powerful systems. But it may not need to. I think the next phase of the problem is like “keep things safe for long enough that we can get important work out of AI systems”, where the important work has to be enough that it can be leveraged to something which sets us up well for the following phases.
I don’t think control is likely to scale to arbitrarily powerful systems. But it may not need to… which sets us up well for the following phases.
Under the concept of ‘control’, I am including the capacity of the AI system to control their own components’ effects.
I am talking about fundamental workings of control. Ie. control theory and cybernetics. I.e. as general enough that results are applicable to any following phases as well.
Anders Sandberg has been digging lately into fundamental controllability limits. Could be interesting to talk with Anders.
Of course I’d prefer to have something more robust. But I don’t think the lack of that means it’s necessarily useless.
I don’t think control is likely to scale to arbitrarily powerful systems. But it may not need to. I think the next phase of the problem is like “keep things safe for long enough that we can get important work out of AI systems”, where the important work has to be enough that it can be leveraged to something which sets us up well for the following phases.
Under the concept of ‘control’, I am including the capacity of the AI system to control their own components’ effects.
I am talking about fundamental workings of control. Ie. control theory and cybernetics.
I.e. as general enough that results are applicable to any following phases as well.
Anders Sandberg has been digging lately into fundamental controllability limits.
Could be interesting to talk with Anders.