Do we have reason to believe softmax is a better approximation to “Enlightened argmax” than just directly trying to approximate Enlightened argmax or its outputs?
Do we have reason to believe softmax is a better approximation to “Enlightened argmax” than just directly trying to approximate Enlightened argmax or its outputs?