Chris Leong comments on Where I’m at with AI risk: convinced of danger but not (yet) of doom

Chris Leong 3 Apr 2023 16:37 UTC
2 points
0 ∶ 0
There was an example where some group accidentally performed a large run where they trained the AI to be maximally offensive rather than minimally offensive.

Actually, rereading I don’t really know where I was going with the color example. I think I probably messed up as you said.
- Larks 3 Apr 2023 19:05 UTC
  2 points
  0 ∶ 0
  Parent
  You could also imagine a situation something like a property being defined by a PCA component, hence not being robust to inversion because PCA components are only unique up to multiplication by a scalar.