rime comments on Strawmen, steelmen, and mithrilmen: getting the principle of charity right

rime 11 Jun 2023 2:44 UTC
2 points
0 ∶ 0
The problem with strawmanning and steelmanning isn’t a matter of degree, and I don’t think goldilocks can be found in that dimension at all. If you find yourself asking “how charitable should I be in my interpretation?” I think you’ve already made a mistake.
Instead, I’d like to propose a fourth category. Let’s call it.. uhh.. the “blindman”! ^^
The blindman interpretation is to forget you’re talking to a person, stop caring about whether they’re correct, and just try your best to extract anything usefwl from what they’re saying.^[1] If your inner monologue goes “I agree/disagree with that for reasons XYZ,” that mindset is great for debating or if you’re trying to teach, but it’s a distraction if you’re purely aiming to learn. If I say “1+1=3″ right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.
For example, when I skimmed the post “agentic mess”, I learned something I thought was exceptionally important, even though I didn’t actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn’t before, so I gave them a strong upvote because that’s my policy for posts that cause me to learn something I deem important—however that learning comes about.
Likewise, when I scrolled through a different post, I found a single sentence^[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don’t really care what they think or what I agree with, what I care about is learning something.
1. ^
  “What they believe is tangential to how the patterns behave in your own models, and all that matters is finding patterns that work.”
  From a comment on reading to understand vs reading to defer/argue/teach.
2. ^
  “The Waluigi Effect: After you train an LLM to satisfy a desirable property $P$ , then it’s easier to elicit the chatbot into satisfying the exact opposite of property $P$ .”
- Jack Lewars 15 Jun 2023 7:36 UTC
  3 points
  0 ∶ 0
  Parent
  You might enjoy the book ‘Thanks for the Feedback’, which basically emphasises this point a lot.