Phib answers Half-baked alignment idea

Phib 28 Mar 2023 6:02 UTC
1 point
0 ∶ 0
Half-baked thoughts in response, cuz I don’t think I really know what I’m talking about (yet, but here goes).

I imagine this is a little similar in execution to a simbox…

(https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need)

It seems recently that the likely contenders for AGI (AI labs if not their models), are already ‘out of the box’ with all the API stuff that is going on now.

Regarding adversarial(?) training like you were describing, interesting idea tho, I have no clue what to think :) but thanks for sharing!
- ozb 28 Mar 2023 13:29 UTC
  2 points
  0 ∶ 0
  Parent
  Thanks, that link definitely touches on many of the same points! Where my proposal is more concrete is that the models should learn morals using evolutionary pressures / RL rewards that are designed using game theory to push towards cooperation and tit-for-tat.