How dependent is current AGI safety work on deep RL? Recently there has been a lot of emphasis on advances in deep RL (and ML more generally), so it would be interesting to know what the implications would be if it turns out that this particular paradigm cannot lead to AGI.
There is some AGI safety work that specifically targets deep RL, under the asumption that deep RL might scale to AGI. But there is also a lot of other work, both on failure modes and on solutions, that is much more independent of the method being used to create the AGI.
I do not have percentages on how it breaks down. Things are in flux. A lot of the new technical alignment startups seem to be mostly working in a deep RL context. But a significant part of the more theoretical work, and even some of the experimental work, involves reasoning about a very broad class of hypothetical future AGI systems, not just those that might be produced by deep RL.
How dependent is current AGI safety work on deep RL? Recently there has been a lot of emphasis on advances in deep RL (and ML more generally), so it would be interesting to know what the implications would be if it turns out that this particular paradigm cannot lead to AGI.
There is some AGI safety work that specifically targets deep RL, under the asumption that deep RL might scale to AGI. But there is also a lot of other work, both on failure modes and on solutions, that is much more independent of the method being used to create the AGI.
I do not have percentages on how it breaks down. Things are in flux. A lot of the new technical alignment startups seem to be mostly working in a deep RL context. But a significant part of the more theoretical work, and even some of the experimental work, involves reasoning about a very broad class of hypothetical future AGI systems, not just those that might be produced by deep RL.