Executive summary: QURI has released Squiggle AI, a tool that uses language models to automatically generate probabilistic models and cost-effectiveness analyses, making complex estimation more accessible while serving as an experiment in AI-assisted reasoning.
Key points:
Tool combines LLMs with Squiggle programming language to generate cost-effectiveness analyses and Fermi estimates, requiring no prior coding knowledge
Current performance shows promise but has limitations: overconfidence in estimates, 200-line code limit, and occasional workflow stalls
Typical workflow costs $0.10-0.35 and takes 20 seconds to 3 minutes, producing 100-200 line models
Early testing shows significant efficiency gains (reducing model creation from 2-3 hours to 10-30 minutes) but outputs should be treated as starting points rather than definitive analyses
Best practices include generating multiple models per question, being specific with inputs, and combining with complementary research tools
Development revealed that LLMs can handle controversial estimates with proper prompting, but making complex models easily understandable remains challenging
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: QURI has released Squiggle AI, a tool that uses language models to automatically generate probabilistic models and cost-effectiveness analyses, making complex estimation more accessible while serving as an experiment in AI-assisted reasoning.
Key points:
Tool combines LLMs with Squiggle programming language to generate cost-effectiveness analyses and Fermi estimates, requiring no prior coding knowledge
Current performance shows promise but has limitations: overconfidence in estimates, 200-line code limit, and occasional workflow stalls
Typical workflow costs $0.10-0.35 and takes 20 seconds to 3 minutes, producing 100-200 line models
Early testing shows significant efficiency gains (reducing model creation from 2-3 hours to 10-30 minutes) but outputs should be treated as starting points rather than definitive analyses
Best practices include generating multiple models per question, being specific with inputs, and combining with complementary research tools
Development revealed that LLMs can handle controversial estimates with proper prompting, but making complex models easily understandable remains challenging
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.