I think the term “goodharting” is great. All you have to do is look up goodharts law to understand what is talked about: the AI is optimising for the metric you evaluated it on, rather than the thing you actually want it to do.
Your suggestions would rob this term of the specific technical meaning, which makes thing much vaguer and harder to talk about.
I think the term “goodharting” is great. All you have to do is look up goodharts law to understand what is talked about: the AI is optimising for the metric you evaluated it on, rather than the thing you actually want it to do.
Your suggestions would rob this term of the specific technical meaning, which makes thing much vaguer and harder to talk about.