I love this idea! I just took it for a spin and the quality of the feedback isn’t at a point I would find it very useful yet. My sense is that it’s limited by the quality of the agents rather than anything about the design of the app, though maybe changes in the scaffold could help.
Most of the critiques were myopic, such as:
It labeled one sentence in my intro as a “hasty generalization/unsupported claim” when I spend most of the post supporting that statement.
In one sentence, it raised a flag for “missing context” about a study I reference, with a different flag affirming that the link embedded in the sentence provides the context
I make a claim about how the majority of people view an issue, providing support for the claim, then discussing the problems with the view. It raised a flag on the claim I’m critiquing, calling it unscientific– even from the sentences immediately before and after, it should be clear that was exactly my point!
I did a quick look at this. I largely agree there were some incorrect checks.
It seems like these specific issues were mostly from the Fallacy Check? That one is definitely too aggressive (in addition to having limited context), I’ll work on tuning it down. Note that you can choose which evaluators to run on each post, so going forward you might want to just skip that one at this point.
I’m looking now at the Fact Check. It did verify most of the claims it investigated on your post as correct, but not all (almost no posts get all, especially as the error rate is significant).
It seems like with chickens/shrimp it got a bit confused by numbers killed vs. numbers alive at any one time or something.
In the case of ICAWs, it looked like it did a short search via Perplexity, and didn’t find anything interesting. The official sources claim they don’t use aggressive tactics, but a smart agent would have realized it needed to search more. I think to get this one right would have involved a few more searches—meaning increased costs. There’s definitely some tinkering/improvements to do here.
That makes sense, I don’t want to be overly fussy if it was getting most things right. I guess the thing is, it’s not helpful if it mostly recognizes true facts as true but mistakes some true facts as false, if it does not accurately flag a significant number of incorrect facts, which in clicking through a bunch of flags I didn’t see almost any I thought necessitated an edit.
I love this idea! I just took it for a spin and the quality of the feedback isn’t at a point I would find it very useful yet. My sense is that it’s limited by the quality of the agents rather than anything about the design of the app, though maybe changes in the scaffold could help.
Most of the critiques were myopic, such as:
It labeled one sentence in my intro as a “hasty generalization/unsupported claim” when I spend most of the post supporting that statement.
In one sentence, it raised a flag for “missing context” about a study I reference, with a different flag affirming that the link embedded in the sentence provides the context
I make a claim about how the majority of people view an issue, providing support for the claim, then discussing the problems with the view. It raised a flag on the claim I’m critiquing, calling it unscientific– even from the sentences immediately before and after, it should be clear that was exactly my point!
I could list several more examples, most of the flags I clicked on were misunderstandings in similar ways. Article is here if you want to take a look: https://www.roastmypost.org/docs/jr1MShmVhsK6Igp0yJCL2/reader
Thanks for the feedback!
I did a quick look at this. I largely agree there were some incorrect checks.
It seems like these specific issues were mostly from the Fallacy Check? That one is definitely too aggressive (in addition to having limited context), I’ll work on tuning it down. Note that you can choose which evaluators to run on each post, so going forward you might want to just skip that one at this point.
It looks like maybe 60% fallacy check and 40% fact check. For instance, fact check:
claims there are more farmed chickens than shrimps (!)
Claims ICAW does not use aggressive tactics, apparently basing that on vague copy on their website
I’m looking now at the Fact Check. It did verify most of the claims it investigated on your post as correct, but not all (almost no posts get all, especially as the error rate is significant).
It seems like with chickens/shrimp it got a bit confused by numbers killed vs. numbers alive at any one time or something.
In the case of ICAWs, it looked like it did a short search via Perplexity, and didn’t find anything interesting. The official sources claim they don’t use aggressive tactics, but a smart agent would have realized it needed to search more. I think to get this one right would have involved a few more searches—meaning increased costs. There’s definitely some tinkering/improvements to do here.
That makes sense, I don’t want to be overly fussy if it was getting most things right. I guess the thing is, it’s not helpful if it mostly recognizes true facts as true but mistakes some true facts as false, if it does not accurately flag a significant number of incorrect facts, which in clicking through a bunch of flags I didn’t see almost any I thought necessitated an edit.