Steven Byrnes comments on A mesa-optimization perspective on AI valence and moral patienthood

Steven Byrnes 14 Sep 2021 0:07 UTC
7 points
0 ∶ 0
AlphaGo has a human-created optimizer, namely MCTS. Normally people don’t use the term “mesa-optimizer” for human-created optimizers.

Then maybe you’ll say “OK there’s a human-created search-based consequentialist planner, but the inner loop of that planner is a trained ResNet, and how do you know that there isn’t also a search-based consequentialist planner inside each single run through the ResNet?”

Admittedly, I can’t prove that there isn’t. I suspect that there isn’t, because there seems to be no incentive for that (there’s already a search-based consequentialist planner!), and also because I don’t think ResNets are up to such a complicated task.
- Ofer 16 Sep 2021 9:27 UTC
  3 points
  0 ∶ 0
  Parent
  (I don’t know/remember the details of AlphaGo, but if the setup involves a value network that is trained to predict the outcome of an MCTS-guided gameplay, that seems to make it more likely that the value network is doing some sort of search during inference.)
  - Steven Byrnes 16 Sep 2021 18:08 UTC
    2 points
    0 ∶ 0
    Parent
    Hmm, yeah, I guess you’re right about that.