Jay Bailey comments on Complex Systems for AI Safety [Pragmatic AI Safety #3]

Jay Bailey 24 May 2022 0:54 UTC
2 points
0 ∶ 0
Possibly a newbie question: I noticed I was confused about the paragraph around deep learning vs. reinforcement learning.

“One example of obviously suboptimal resource allocation is that the AI safety community spent a very large fraction of its resources on reinforcement learning until relatively recently. While reinforcement learning might have seemed like the most promising area for progress towards AGI to a few of the initial safety researchers, this strategy meant that not many were working on deep learning.”

I thought that reinforcement learning was a type of deep learning. My own understanding is that deep learning is any form of ML using multilayered neural networks, and that reinforcement learning today uses multilayered neural networks, and thus could be called “deep reinforcement learning”, but is generally just RL for short. If that were true that would mean RL research was also DL research.

Am I misunderstanding some of the terminology?
- ThomasW 26 May 2022 17:33 UTC
  3 points
  0 ∶ 0
  Parent
  The terminology around AI (AI, ML, DL, RL) is a bit confused sometimes. You’re correct that deep reinforcement learning does indeed use deep neural nets, so it could be considered a part of deep learning. However, colloquially deep learning is often taken to mean the parts that aren’t RL (so supervised, unsupervised, and self-supervised deep learning). RL is pretty qualitatively different from those in the way it is trained, so it makes sense that there would be a different term, but it can create confusion.
- anonymous6 26 May 2022 17:57 UTC
  2 points
  0 ∶ 0
  Parent
  One way I think it is plausible to draw lines between RL/core DL is that post-AlphaGo a lot of people were very bullish on specifically deep networks + reinforcement learning. Part of the idea was that supervised learning required inordinately costly human labeling, whereas RL would be able to learn from cheap simulations and even improve itself online in the world. OpenAI was originally almost 100% RL-focused. That thread of research is far from dead but it has certainly not panned out the way people hoped at the time (e.g. OpenAI has shifted heavily away from RL).
  Meanwhile non-RL deep learning methods, especially generative models that kind of sidestep the labeling issue, have seen spectacular success.