Improving capability evaluations for AI governance: Open Philanthropy’s new request for proposals

Summary

Open Philanthropy is launching a request for proposals to improve AI capability evaluations. We’re looking to fund work on more demanding GCR-relevant benchmarks, better evaluation science, and improving third-party model access and infrastructure.

Click here to apply

More details:

Below, we explain what we’re looking for, and why we think this work matters.

Why capability evaluations matter

The ability to accurately evaluate AI capabilities is becoming increasingly important, for three main reasons:

1. Evaluations are key inputs to AI governance

Many current governance proposals rely heavily on knowing what AI systems can and cannot do. “If-then commitments” are one prominent example — companies agree to take specific actions (like pausing training) if their systems display certain capabilities. But for these approaches to work, we need reliable ways to measure those capabilities.

2. AI capabilities underpin key disagreements about AI risk

Many fundamental disagreements about AI risk stem from different beliefs about what AI systems can or will soon be able to do. For example, skepticism about certain loss-of-control scenarios often comes down to disagreement about whether AI systems could become effective autonomous agents with long-term planning capabilities. Better evaluations could help resolve some of these disagreements, or at least help us identify the key cruxes.

3. We need better situational awareness of what frontier models can and cannot do

Though seen some genuinely challenging, risk-relevant evaluations do exist (e.g. Cybench for AI cyberoffense capabilities, RE-Bench for AI R&D capabilities), but many crucial capabilities remain poorly measured, and benchmarks are saturating quickly. To respond appropriately, we need to understand what AI systems can and can’t do.

Three current problems with capability evaluations

Capability evaluations currently face three major challenges:

  1. Existing benchmarks for risk-relevant capabilities are inadequate. We need more demanding tests that can meaningfully evaluate frontier models’ performance on tasks relevant to catastrophic risks, resist saturation even as capabilities advance, and rule in (not just rule out) serious risks.

  2. The science of capability evaluation remains underdeveloped. We don’t yet understand how many capabilities scale, the relationships between different capabilities, or how post-training enhancements will affect performance. This makes interpreting current evaluation results and predicting future results challenging.

  3. Third-party evaluators already face significant access constraints, and increasing security requirements will make access harder. Maintaining meaningful independent scrutiny will require advances in technical infrastructure, evaluation and audit protocols, and access frameworks.

What we’re looking to fund

To address these challenges, we’re seeking proposals in three areas:

GCR-relevant capability evaluations for AI agents

We want to fund new evaluations that:

  1. Test agentic, risk-relevant capabilities, such as AI R&D, situational awareness, and adaptation to novel adversarial environments

  2. Are extremely challenging, ideally taking world-class experts multiple days

For more on why we think this is important, what we’re looking for, and previous work we think is useful, see this section of our RFP.

Improving the science of capabilities development and evaluations

Current capability evaluations are more like snapshots than predictive tools: they tell us what models can do now, but not what they’re likely to do next. We want to improve understanding of questions such as:

  1. How capabilities scale with different inputs

  2. Relationships between different capabilities

  3. Best practices for evaluation methodology

For open questions here we think are important, and past work we’ve found useful, see this.

Improving third-party model access and evals infrastructure

Independent evaluations are crucial for reliably assessing AI capabilities. As the stakes get higher, we can’t trust AI companies to verify their own claims. But as security requirements increase, getting meaningful external access will become harder.

We’re looking for approaches to resolve the tension between security requirements and meaningful external oversight, including:

  1. Understanding necessary access requirements and how to secure them

  2. Improving evaluation infrastructure

  3. Developing verifiable auditing techniques

For open questions here we think are important, and past work we’ve found useful, see this.

How to engage

Even if you’re not planning to apply for funding, this RFP contains many open research questions that we think are important for the field — we encourage you to read the full RFP if you’re interested in capability evaluation. Consider applying if you have relevant expertise or ideas, and please share with others who might be interested.

Anyone is eligible to apply. Applications will be open until 1st April.

Click here to apply