RyanCarey comments on We should expect to worry more about speculative risks

RyanCarey 30 May 2022 12:44 UTC
4 points
0 ∶ 0
Amanda is talking about the philosophical principle, whereas I’m talking about the algorithm that roughly satisfies it. The principle is that a non-myopic Bayesian will take into account not just the immediate payoff, but also the information value of an action. The algorithm—upper confidence bound—efficiently approximates this behaviour. The fact that UCB is optimistic (about its impact) suggests that we might want to behave similarly, in order capture the information value. (“Information value of an action” and “exploration value” are synonymous here.)
- Stefan_Schubert 30 May 2022 15:21 UTC
  4 points
  0 ∶ 0
  Parent
  Thanks!