I’m not sure how to best write about these on the EA Forum / LessWrong. They feel too technical and speculative to gain much visibility.
But I’m happy for people interested in the area to see them. Like with all things, I’m eager for feedback.
Here’s a brief summary of them, written by Claude.
---
1. AI-Assisted Auditing
A system where AI agents audit humans or AI systems, particularly for organizations involved in AI development. This could provide transparency about data usage, ensure legal compliance, flag dangerous procedures, and detect corruption while maintaining necessary privacy.
2. Consistency Evaluations for Estimation AI Agents
A testing framework that evaluates AI forecasting systems by measuring several types of consistency rather than just accuracy, enabling better comparison and improvement of prediction models. It’s suggested to start with simple test sets and progress to adversarial testing methods that can identify subtle inconsistencies across domains.
3. AI for Epistemic Impact Estimation
An AI tool that quantifies the value of information based on how it improves beliefs for specific AIs. It’s suggested to begin with narrow domains and metrics, then expand to comprehensive tools that can guide research prioritization, value information contributions, and optimize information-seeking strategies.
4. Multi-AI-Critic Document Comments & Analysis
A system similar to “Google Docs comments” but with specialized AI agents that analyze documents for logical errors, provide enrichment, and offer suggestions. This could feature a repository of different optional open-source agents for specific tasks like spot-checking arguments, flagging logical errors, and providing information enrichment.
5. Rapid Prediction Games for RL
Specialized environments where AI agents trade or compete on predictions through market mechanisms, distinguishing between Information Producers and Consumers. The system aims to both evaluate AI capabilities and provide a framework for training better forecasting agents through rapid feedback cycles.
6. Analytics on Private AI Data
A project where government or researcher AI agents get access to private logs/data from AI companies to analyze questions like: How often did LLMs lie or misrepresent information? Did LLMs show bias toward encouraging user trust? Did LLMs employ questionable tactics for user retention? This addresses the limitation that researchers currently lack access to actual use logs.
7. Prediction Market Key Analytics Database
A comprehensive analytics system for prediction markets that tracks question value, difficulty, correlation with other questions, and forecaster performance metrics. This would help identify which questions are most valuable to specific stakeholders and how questions relate to real-world variables.
8. LLM Resolver Agents
A system for resolving forecasting questions using AI agents with built-in desiderata including: triggering experiments at specific future points, deterministic randomness methods, specified LLM usage, verifiability, auditability, proper caching/storing, and sensitivity analysis.
9. AI-Organized Information Hubs
A platform optimized for AI readers and writers where systems, experts, and organizations can contribute information that is scored and filtered for usefulness. Features would include privacy levels, payment proportional to information value, and integration of multiple file types.
I’ve spent some time in the last few months outlining a few epistemics/AI/EA projects I think could be useful.
Link here.
I’m not sure how to best write about these on the EA Forum / LessWrong. They feel too technical and speculative to gain much visibility.
But I’m happy for people interested in the area to see them. Like with all things, I’m eager for feedback.
Here’s a brief summary of them, written by Claude.
---
1. AI-Assisted Auditing
A system where AI agents audit humans or AI systems, particularly for organizations involved in AI development. This could provide transparency about data usage, ensure legal compliance, flag dangerous procedures, and detect corruption while maintaining necessary privacy.
2. Consistency Evaluations for Estimation AI Agents
A testing framework that evaluates AI forecasting systems by measuring several types of consistency rather than just accuracy, enabling better comparison and improvement of prediction models. It’s suggested to start with simple test sets and progress to adversarial testing methods that can identify subtle inconsistencies across domains.
3. AI for Epistemic Impact Estimation
An AI tool that quantifies the value of information based on how it improves beliefs for specific AIs. It’s suggested to begin with narrow domains and metrics, then expand to comprehensive tools that can guide research prioritization, value information contributions, and optimize information-seeking strategies.
4. Multi-AI-Critic Document Comments & Analysis
A system similar to “Google Docs comments” but with specialized AI agents that analyze documents for logical errors, provide enrichment, and offer suggestions. This could feature a repository of different optional open-source agents for specific tasks like spot-checking arguments, flagging logical errors, and providing information enrichment.
5. Rapid Prediction Games for RL
Specialized environments where AI agents trade or compete on predictions through market mechanisms, distinguishing between Information Producers and Consumers. The system aims to both evaluate AI capabilities and provide a framework for training better forecasting agents through rapid feedback cycles.
6. Analytics on Private AI Data
A project where government or researcher AI agents get access to private logs/data from AI companies to analyze questions like: How often did LLMs lie or misrepresent information? Did LLMs show bias toward encouraging user trust? Did LLMs employ questionable tactics for user retention? This addresses the limitation that researchers currently lack access to actual use logs.
7. Prediction Market Key Analytics Database
A comprehensive analytics system for prediction markets that tracks question value, difficulty, correlation with other questions, and forecaster performance metrics. This would help identify which questions are most valuable to specific stakeholders and how questions relate to real-world variables.
8. LLM Resolver Agents
A system for resolving forecasting questions using AI agents with built-in desiderata including: triggering experiments at specific future points, deterministic randomness methods, specified LLM usage, verifiability, auditability, proper caching/storing, and sensitivity analysis.
9. AI-Organized Information Hubs
A platform optimized for AI readers and writers where systems, experts, and organizations can contribute information that is scored and filtered for usefulness. Features would include privacy levels, payment proportional to information value, and integration of multiple file types.