Greg_Colbourn comments on Discussion with Eliezer Yudkowsky on AGI interventions

Greg_Colbourn 11 Nov 2021 18:43 UTC
3 points
0 ∶ 0
Yes, concern is optimisation during training. My intuition is along the lines of “sufficiently large pile of linear algebra with reward function-> basic AI drives maximise reward->reverse engineers [human behaviour / protein folding / etc] and manipulates the world so as to maximise it’s reward ->[foom / doom]”.

I wouldn’t say “personality” comes into it. In the above scenario the giant pile of linear algebra is completely unconscious and lacks self-awareness; it’s more akin to a force of nature, a blind optimisation process.
- Brian_Tomasik 12 Nov 2021 0:48 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks. :) Regarding the AGI’s “personality”, what I meant was what the AGI itself wants to do, if we imagine it to be like a person, rather than what the training that produced it was optimizing for. If we think of gradient descent to train the AGI as like evolution and the AGI at some step of training as like a particular human in humanity’s evolution, then while evolution itself is optimizing something, the individual human is just an adaptation executor and doesn’t directly care about his inclusive fitness. He just responds to his environment as he was programmed to do. Likewise, the GPT-X agent may not really care about trying to reduce training errors by modifying its network weights; it just responds to its inputs in human-ish ways.