anonymous6 comments on Complex Systems for AI Safety [Pragmatic AI Safety #3]

anonymous6 26 May 2022 17:57 UTC
2 points
0 ∶ 0
One way I think it is plausible to draw lines between RL/core DL is that post-AlphaGo a lot of people were very bullish on specifically deep networks + reinforcement learning. Part of the idea was that supervised learning required inordinately costly human labeling, whereas RL would be able to learn from cheap simulations and even improve itself online in the world. OpenAI was originally almost 100% RL-focused. That thread of research is far from dead but it has certainly not panned out the way people hoped at the time (e.g. OpenAI has shifted heavily away from RL).
Meanwhile non-RL deep learning methods, especially generative models that kind of sidestep the labeling issue, have seen spectacular success.