I often second-guess my EA Forum comments with Claude, especially when someone mentions a disagreement that doesn’t make sense to me.
When doing this I try to ask it to be honest / not be sycophantic, but this only helps so much, so I’m curious for better prompts to prevent sycophancy.
I imagine at some point all my content could go through an [can I convince an LLM that this is reasonable and not inflammatory] filter. But a lower bar is just doing this for specific comments that are particularly contentious or argumentative.
Would a potential cure to the sycophancy be to reverse the framing to Claude, so that it perceives that you are your opponent and you are looking for flaws with the comment? I realize that this would not get quite what you are looking for, but getting strong arguments for the other side could be helpful.
Agreed that this would be good. But it can be annoying to do without additional tooling.
I’d like to see tools that try to ask a question from a few different angles / perspectives / motivations and compare results, but this would be some work.
I often second-guess my EA Forum comments with Claude, especially when someone mentions a disagreement that doesn’t make sense to me.
When doing this I try to ask it to be honest / not be sycophantic, but this only helps so much, so I’m curious for better prompts to prevent sycophancy.
I imagine at some point all my content could go through an [can I convince an LLM that this is reasonable and not inflammatory] filter. But a lower bar is just doing this for specific comments that are particularly contentious or argumentative.
Would a potential cure to the sycophancy be to reverse the framing to Claude, so that it perceives that you are your opponent and you are looking for flaws with the comment? I realize that this would not get quite what you are looking for, but getting strong arguments for the other side could be helpful.
Agreed that this would be good. But it can be annoying to do without additional tooling.
I’d like to see tools that try to ask a question from a few different angles / perspectives / motivations and compare results, but this would be some work.