kokotajlod comments on Fanaticism in AI: SERI Project

kokotajlod 24 Sep 2021 12:51 UTC
3 points
0 ∶ 0
Nice work!
However, imposing a bounded utility function on any decision involving lives saved or happy lives instantiated seems unpalatable, as it suggests that life diminishes in value. Thus, in decisions surrounding human lives and other unbounded utility values it seems that an instrumentally rational agent will maximize expected utility and reach a fanatical verdict. Therefore, if an agent is instrumentally rational, she will reach fanatical verdicts through maximizing expected utility.
I’ve only skimmed it so maybe this is answered in the paper somewhere, but: I think this is the part I’d disagree with. I don’t think bounded utility functions are that bad, compared to the alternatives (such as fanaticism! And worse, paralysis! See my sequence.)
More importantly though, if we are trying to predict how superintelligent AIs will behave, we can’t assume that they’ll share our intuitions about the unpalatability of unbounded utility functions! I feel like the conclusion should be: Probably superintelligent AIs will either have bounded utility functions or be fanatical.
- Jake Arft-Guatelli 25 Sep 2021 19:58 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks for the comment!
  I do briefly discuss bounded utility functions as an objection to the argument for fanatical Superintelligences. I generally take the view that imposing bounded utility functions is difficult to do in a way that doesn’t seem arbitrary—in practice this might be less of an issue as one might be able observe the agent and impose bounded functions when necessary (I think this may raise other questions, but it does seem very possible in practice).
  I don’t think bounded utility functions are bad intrinsically, but I do think the problems created by denying fanaticism (a denial which can result form overly imposing bounded utility functions) are potentially worse than fanaticism. By these problems I’m referring back to those provided in Wilkinson’s paper.
  More importantly though, if we are trying to predict how superintelligent AIs will behave, we can’t assume that they’ll share our intuitions about the unpalatability of unbounded utility functions!
  I think this is a very good point, and agree that we could end up with Superintelligences either imposing a bounded utility function or being fanatical. I think I would be somewhat intuitively inclined to think they would be fanatical more often than not in this case, but that isn’t really a substantiated view on my part. Either way we still end up with fanatical verdicts being reached and the concerns that entails.