First off, I really appreciate the straightshooter conclusion of ‘QC is unlikely to be helpful to address current bottlenecks in AI alignment.’ even while you both spent many hours looking into it.
Second, I’m curious to hear any thoughts on the amateur speculation I threw at Pablo in a chat at the last AI Safety Camp:
Would quantum computing afford the mechanisms for improved prediction of the actions that correlated agents would decide on?
As a toy model, I’m imagining hundreds of almost-homogenous reinforcement learning agents within a narrow distribution of slightly divergent maps of the state space, probability weightings/policies, and environmental inputs. Would current quantum computing techniques, assuming the hardware to run them on is available, be able to more quickly/precisely derive the % portions of those agents at say State1 would take Action1, Action2, or Action3?
I have a broad vague sense that if that set-up works out, you could leverage that to create a ‘regulator agent’ for monitoring some ‘multi-agent system’ composed of quasi-homogenous autonomous ‘selfish agents’ (e.g. each negotiating on behalf of their respective human interest group) that has a meaningful influence on our physical environment. This regulator would interface directly with a few of the selfish agents. If that selfish agent subset are about to select Action1, it will predict what % of other, slightly divergent algorithms would also decide Action1. If the regulator prognoses that an excessive number of Action1s will be taken – leading to reduced rewards to or robustness of the collective (e.g. Tragedy of the Commons case of overutilisation of local resources) – it would override that decision by commanding a compensating number of the agents to instead select the collectively-conservative Action2.
That’s a lot of jargon, half of which I feel I have little clue about… But curious to read any arguments you have on how this would (not) work.
Would current quantum computing techniques, assuming the hardware to run them on is available, be able to more quickly/precisely derive the % portions of those agents at say State1 would take Action1, Action2, or Action3?
I think so! But I also think that you can do it easily with a bunch of GPUs. Let me explain: the idea is parallelizing the process of the agents and then just sampling from the agents. You can do that using “quantum parallelism”, but I feel it will be simpler to just use GPUs for that.
I believe that you might be able to get some (polynomial, probably quadratic) speedup in the precision of the estimate using quantum resources, although I am not sure how useful is that.
First off, I really appreciate the straightshooter conclusion of ‘QC is unlikely to be helpful to address current bottlenecks in AI alignment.’ even while you both spent many hours looking into it.
Second, I’m curious to hear any thoughts on the amateur speculation I threw at Pablo in a chat at the last AI Safety Camp:
Would quantum computing afford the mechanisms for improved prediction of the actions that correlated agents would decide on?
As a toy model, I’m imagining hundreds of almost-homogenous reinforcement learning agents within a narrow distribution of slightly divergent maps of the state space, probability weightings/policies, and environmental inputs. Would current quantum computing techniques, assuming the hardware to run them on is available, be able to more quickly/precisely derive the % portions of those agents at say State1 would take Action1, Action2, or Action3?
I have a broad vague sense that if that set-up works out, you could leverage that to create a ‘regulator agent’ for monitoring some ‘multi-agent system’ composed of quasi-homogenous autonomous ‘selfish agents’ (e.g. each negotiating on behalf of their respective human interest group) that has a meaningful influence on our physical environment. This regulator would interface directly with a few of the selfish agents. If that selfish agent subset are about to select Action1, it will predict what % of other, slightly divergent algorithms would also decide Action1. If the regulator prognoses that an excessive number of Action1s will be taken – leading to reduced rewards to or robustness of the collective (e.g. Tragedy of the Commons case of overutilisation of local resources) – it would override that decision by commanding a compensating number of the agents to instead select the collectively-conservative Action2.
That’s a lot of jargon, half of which I feel I have little clue about… But curious to read any arguments you have on how this would (not) work.
I think so! But I also think that you can do it easily with a bunch of GPUs. Let me explain: the idea is parallelizing the process of the agents and then just sampling from the agents. You can do that using “quantum parallelism”, but I feel it will be simpler to just use GPUs for that.
I believe that you might be able to get some (polynomial, probably quadratic) speedup in the precision of the estimate using quantum resources, although I am not sure how useful is that.