It’s really cool that you’ve done this and released the code!
Am I understanding right that the givewell baseline you’re trying to beat used GPT, while your approach uses Claude? How can you be sure that the improvements aren’t due to the model choice, rather than the architecture?
If you read my blog post, I go into detail about why this is not a model issue. It’s about how you frame the question much more than what the model contains. For this purpose any decent model would have had the same result. The main benefit that Claude gives is direct in terminal code writing and execution.
It’s really cool that you’ve done this and released the code!
Am I understanding right that the givewell baseline you’re trying to beat used GPT, while your approach uses Claude? How can you be sure that the improvements aren’t due to the model choice, rather than the architecture?
If you read my blog post, I go into detail about why this is not a model issue. It’s about how you frame the question much more than what the model contains. For this purpose any decent model would have had the same result. The main benefit that Claude gives is direct in terminal code writing and execution.