FWIW, I didnât get the impression thereâs a very principled justification for softmax in this post, if thatâs what you intended by âhighly principledâ. That it might work better than naive argmax in practice on some counts isnât really enough, and there wasnât really much comparison to enlightened argmax, which is optimal in theory.
Iâd probably require being provably (approximately) optimal for a principled justification. Quickly checking bandits and the Gittins index on Wikipedia, bandits are general problems and the Gittins index is just the value of the aggregate reward. I guess you could say âmaximize Gittins indexâ (use the Gittins index policy), but thatâs, imo, just a formal characterization of what enlightened argmax should be under certain problem assumptions, and doesnât provide much useful guidance on its own. Like what procedure should we follow to maximize the Gittins index? Is it just calculate really hard?
Also, according to the Wikipedia page, the Gittins index policy is optimal if the projects are independent, but not necessarily if they arenât, and the problem is NP-hard in general if they can be dependent.
Not in this post, we just link to this one. By âprincipledâ I just mean ânot arbitrary, has a nice short derivation starting with something fundamental (like the entropy)â.
Yeah, the Gittins stuff would be pitched at a similar level of handwaving.
FWIW, I didnât get the impression thereâs a very principled justification for softmax in this post, if thatâs what you intended by âhighly principledâ. That it might work better than naive argmax in practice on some counts isnât really enough, and there wasnât really much comparison to enlightened argmax, which is optimal in theory.
Iâd probably require being provably (approximately) optimal for a principled justification. Quickly checking bandits and the Gittins index on Wikipedia, bandits are general problems and the Gittins index is just the value of the aggregate reward. I guess you could say âmaximize Gittins indexâ (use the Gittins index policy), but thatâs, imo, just a formal characterization of what enlightened argmax should be under certain problem assumptions, and doesnât provide much useful guidance on its own. Like what procedure should we follow to maximize the Gittins index? Is it just calculate really hard?
Also, according to the Wikipedia page, the Gittins index policy is optimal if the projects are independent, but not necessarily if they arenât, and the problem is NP-hard in general if they can be dependent.
Not in this post, we just link to this one. By âprincipledâ I just mean ânot arbitrary, has a nice short derivation starting with something fundamental (like the entropy)â.
Yeah, the Gittins stuff would be pitched at a similar level of handwaving.