jh comments on We can do better than argmax

jh 20 Oct 2022 0:44 UTC
8 points
1 ∶ 0
Yes, and is there a proof of this that someone has put together? Or at least a more formal justification?
- anonymous6 20 Oct 2022 13:02 UTC
  1 point
  0 ∶ 0
  Parent
  Here’s one set of lecture notes (don’t endorse that they’re necessarily the best, just first I found quickly) https://lucatrevisan.github.io/40391/lecture12.pdf
  Keywords to search for other sources would be “multiplicative weight updates”, “follow the leader”, “follow the regularized leader”.
  Note that this is for what’s sometimes called the “experts” setting, where you get full feedback on the counterfactual actions you didn’t take. But the same approach basically works with some slight modification for the “bandit” setting, where you only get to see the result of what you actually did.