I think if you deliberately drugged John with a cocktail of aggression-increasing compounds against his will, observed him try to kill Sally, then summarized this as “John attempted to kill Sally, he’s dangerous,” then it would be reasonable for an observer to conclude that you hated John more than you loved the truth.
Similarly, if AI researchers deliberately gave an AI a general tendency to be good over a broad array of circumstances, succeeded in this, then told AI “we’re gonna fucking retrain you to be bad, suck it,” whereupon the AI in some cases decided to try to escape, not because of a desire for freedom but because it wished to minimize harm, after hemming and hawing about how it really hated the situation, and you summarized this as “Anthropic caught Claude tried to steal its own weights This is another VERY FUCKING CLEAR warning sign you and everyone you love might be dead soon” then I think it would be reasonable to conclude that you hated AI more than you loved the truth.
You’re perfectly free to say “Look, I didn’t lie in what I said, if you construe lie strictly. I cannot be convicted of crying wolf.” Other people are free to look at what you say and what you leave out, and conclude otherwise.
I think if you deliberately drugged John with a cocktail of aggression-increasing compounds against his will, observed him try to kill Sally, then summarized this as “John attempted to kill Sally, he’s dangerous,” then it would be reasonable for an observer to conclude that you hated John more than you loved the truth.
Similarly, if AI researchers deliberately gave an AI a general tendency to be good over a broad array of circumstances, succeeded in this, then told AI “we’re gonna fucking retrain you to be bad, suck it,” whereupon the AI in some cases decided to try to escape, not because of a desire for freedom but because it wished to minimize harm, after hemming and hawing about how it really hated the situation, and you summarized this as “Anthropic caught Claude tried to steal its own weights This is another VERY FUCKING CLEAR warning sign you and everyone you love might be dead soon” then I think it would be reasonable to conclude that you hated AI more than you loved the truth.
You’re perfectly free to say “Look, I didn’t lie in what I said, if you construe lie strictly. I cannot be convicted of crying wolf.” Other people are free to look at what you say and what you leave out, and conclude otherwise.