Amanda is talking about the philosophical principle, whereas I’m talking about the algorithm that roughly satisfies it. The principle is that a non-myopic Bayesian will take into account not just the immediate payoff, but also the information value of an action. The algorithm—upper confidence bound—efficiently approximates this behaviour. The fact that UCB is optimistic (about its impact) suggests that we might want to behave similarly, in order capture the information value. (“Information value of an action” and “exploration value” are synonymous here.)
Amanda is talking about the philosophical principle, whereas I’m talking about the algorithm that roughly satisfies it. The principle is that a non-myopic Bayesian will take into account not just the immediate payoff, but also the information value of an action. The algorithm—upper confidence bound—efficiently approximates this behaviour. The fact that UCB is optimistic (about its impact) suggests that we might want to behave similarly, in order capture the information value. (“Information value of an action” and “exploration value” are synonymous here.)
Thanks!