Rohin Shah comments on Blake Richards on Why he is Skeptical of Existential Risk from AI

Rohin Shah 16 Jun 2022 17:09 UTC
3 points
0 ∶ 0
We’re tackling the problem “you tried out a long sequence of actions, and only at the end could you tell whether the outcomes were good or not, and now you have to figure out which actions ”.
Some approaches to this that don’t involve “long-term credit assignment” as normally understood by RL practitioners:
- Have humans / other AI systems tell you which of the actions were useful. (One specific way this could be achieved is to use humans / AI systems to provide a dense reward, kinda like in summarizing books from human feedback.)
- Supervise the AI system’s reasoning process rather than the outcomes it gets (e.g. like chain-of-thought prompting but with more explicit supervision).
- Just don’t even bother, do regular old self-supervised learning on a hard task; in order to get good performance maybe the model has to develop “general intelligence” (i.e. something akin to the algorithms humans use in order to do long-term planning; after all our long-term planning doesn’t work via trial and error).
I think it’s also plausible that (depending on your definitions) long-term reasoning isn’t needed for powerful AI.