Some basic knowledge of (relatively) old-school probabilistic graphical models, along with basic understanding of variational inference. Not that graphical models are going to be used directly for any SOTA models any more, but the mathematical formalism is still very useful.
For example, understanding how inference on a graphical model works motivates the control-as-inference perspective on reinforcement learning. This is useful for understanding things like decision transformers, or this post on how to interpret RLHF on language models.
It would also be essential background to understand the causal incentives research agenda.
So the same tools come up in two very different places, which I think makes a case for their usefulness.
This is in some sense math-heavy, and some of the concepts are pretty dense, but without many mathematical prerequisites. You have to understand basic probability (how expected values and log likelihoods work, mental comfort going between E and ∫ notation), basic calculus (like “set the derivative = 0 to maximize”), and be comfortable algebraically manipulating sums and products.
Some basic knowledge of (relatively) old-school probabilistic graphical models, along with basic understanding of variational inference. Not that graphical models are going to be used directly for any SOTA models any more, but the mathematical formalism is still very useful.
For example, understanding how inference on a graphical model works motivates the control-as-inference perspective on reinforcement learning. This is useful for understanding things like decision transformers, or this post on how to interpret RLHF on language models.
It would also be essential background to understand the causal incentives research agenda.
So the same tools come up in two very different places, which I think makes a case for their usefulness.
This is in some sense math-heavy, and some of the concepts are pretty dense, but without many mathematical prerequisites. You have to understand basic probability (how expected values and log likelihoods work, mental comfort going between E and ∫ notation), basic calculus (like “set the derivative = 0 to maximize”), and be comfortable algebraically manipulating sums and products.