There was an example where some group accidentally performed a large run where they trained the AI to be maximally offensive rather than minimally offensive.
Actually, rereading I don’t really know where I was going with the color example. I think I probably messed up as you said.
You could also imagine a situation something like a property being defined by a PCA component, hence not being robust to inversion because PCA components are only unique up to multiplication by a scalar.
There was an example where some group accidentally performed a large run where they trained the AI to be maximally offensive rather than minimally offensive.
Actually, rereading I don’t really know where I was going with the color example. I think I probably messed up as you said.
You could also imagine a situation something like a property being defined by a PCA component, hence not being robust to inversion because PCA components are only unique up to multiplication by a scalar.