What’s interesting about this interview clip though is that he seems to explicitly endorse a set of principles that directly contradict the actions he took!
Well that’s the thing—it seems likely he didn’t see his actions as contradicting those principles. Suggesting that they’re actually a dangerous set of principles to endorse, even if they sound reasonable. That’s what’s really got me thinking.
I wonder if part of the problem is a consistent failure of imagination on the part of humans to see how our designs might fail. Kind of like how an amateur chess player devotes a lot more thought to how they could win than how their opponent could win. So if the principles Sam endorsed are at all recoverable, maybe they could be recovered via a process like “before violating common-sense ethics for the sake of utility, go down a massive checklist searching for reasons why this could be a mistake, including external observers in the decision if possible”.
I think your first paragraph provides a potential answer to your second :-)
There’s an implicit “Sam fell prey to motivated reasoning, but I wouldn’t do that” in your comment, which itself seems like motivated reasoning :-)
(At least, it seems like motivated reasoning in the absence of a strong story for Sam being different from the rest of us. That’s why I’m so interested in what people like nbouscal have to say.)
So you think there’s too much danger of cutting yourself and everyone else via motivated reasoning, ala Dan Luu’s “Normalization of Deviance” and the principles have little room for errors in implementing them, is that right?
most human beings perceive themselves as good and decent people, such that they can understand many of their rule violations as entirely rational and ethically acceptable responses to problematic situations. They understand themselves to be doing nothing wrong, and will be outraged and often fiercely defend themselves when confronted with evidence to the contrary.
Specifically, I was saying that wrong results would come up if you failed in one of the steps of reasoning, and there’s no self-correction mechanism for bad reasoning like Sam Bankman-Fried was doing.
What’s interesting about this interview clip though is that he seems to explicitly endorse a set of principles that directly contradict the actions he took!
Well that’s the thing—it seems likely he didn’t see his actions as contradicting those principles. Suggesting that they’re actually a dangerous set of principles to endorse, even if they sound reasonable. That’s what’s really got me thinking.
I wonder if part of the problem is a consistent failure of imagination on the part of humans to see how our designs might fail. Kind of like how an amateur chess player devotes a lot more thought to how they could win than how their opponent could win. So if the principles Sam endorsed are at all recoverable, maybe they could be recovered via a process like “before violating common-sense ethics for the sake of utility, go down a massive checklist searching for reasons why this could be a mistake, including external observers in the decision if possible”.
My guess is standard motivated reasoning explains why he thought he wasn’t in violation of his stated principles.
Question, but why do you think the principles were dangerous, exactly? I am confused about the danger you state.
I think your first paragraph provides a potential answer to your second :-)
There’s an implicit “Sam fell prey to motivated reasoning, but I wouldn’t do that” in your comment, which itself seems like motivated reasoning :-)
(At least, it seems like motivated reasoning in the absence of a strong story for Sam being different from the rest of us. That’s why I’m so interested in what people like nbouscal have to say.)
So you think there’s too much danger of cutting yourself and everyone else via motivated reasoning, ala Dan Luu’s “Normalization of Deviance” and the principles have little room for errors in implementing them, is that right?
Here’s a link to it:
https://danluu.com/wat/
And a quote:
I’m not sure what you mean by “the principles have little room for errors in implementing them”.
That quote seems scarily plausible.
EDIT: Relevant Twitter thread
Specifically, I was saying that wrong results would come up if you failed in one of the steps of reasoning, and there’s no self-correction mechanism for bad reasoning like Sam Bankman-Fried was doing.