The problem with strawmanning and steelmanning isn’t a matter of degree, and I don’t think goldilocks can be found in that dimension at all. If you find yourself asking “how charitable should I be in my interpretation?” I think you’ve already made a mistake.
Instead, I’d like to propose a fourth category. Let’s call it.. uhh.. the “blindman”! ^^
The blindman interpretation is to forget you’re talking to a person, stop caring about whether they’re correct, and just try your best to extract anything usefwl from what they’re saying.[1] If your inner monologue goes “I agree/disagree with that for reasons XYZ,” that mindset is great for debating or if you’re trying to teach, but it’s a distraction if you’re purely aiming to learn. If I say “1+1=3″ right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.
For example, when I skimmed the post “agentic mess”, I learned something I thought was exceptionally important, even though I didn’t actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn’t before, so I gave them a strong upvote because that’s my policy for posts that cause me to learn something I deem important—however that learning comes about.
Likewise, when I scrolled through a different post, I found a single sentence[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don’t really care what they think or what I agree with, what I care about is learning something.
“The Waluigi Effect: After you train an LLM to satisfy a desirable property P, then it’s easier to elicit the chatbot into satisfying the exact opposite of property P.”
The problem with strawmanning and steelmanning isn’t a matter of degree, and I don’t think goldilocks can be found in that dimension at all. If you find yourself asking “how charitable should I be in my interpretation?” I think you’ve already made a mistake.
Instead, I’d like to propose a fourth category. Let’s call it.. uhh.. the “blindman”! ^^
The blindman interpretation is to forget you’re talking to a person, stop caring about whether they’re correct, and just try your best to extract anything usefwl from what they’re saying.[1] If your inner monologue goes “I agree/disagree with that for reasons XYZ,” that mindset is great for debating or if you’re trying to teach, but it’s a distraction if you’re purely aiming to learn. If I say “1+1=3″ right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.
For example, when I skimmed the post “agentic mess”, I learned something I thought was exceptionally important, even though I didn’t actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn’t before, so I gave them a strong upvote because that’s my policy for posts that cause me to learn something I deem important—however that learning comes about.
Likewise, when I scrolled through a different post, I found a single sentence[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don’t really care what they think or what I agree with, what I care about is learning something.
“What they believe is tangential to how the patterns behave in your own models, and all that matters is finding patterns that work.”
From a comment on reading to understand vs reading to defer/argue/teach.
“The Waluigi Effect: After you train an LLM to satisfy a desirable property P, then it’s easier to elicit the chatbot into satisfying the exact opposite of property P.”
You might enjoy the book ‘Thanks for the Feedback’, which basically emphasises this point a lot.