Sharing what context I’m able to: Our work in this space so far has mostly been around assessing both risks and effective safeguards for AI-biorisk and AI-cybersecurity risk.
Superforecasters and domain experts tend to be relatively aligned on these topics so far (e.g., see this study as one example, and a later update here). (We’ve completed private research on AI-cybersecurity risk and will be publishing some of it soon.)
The bulk of our funding has gone toward AI-focused forecasting projects (e.g. LEAP, AI-biorisk, economic effects of AI) or ‘automating forecasting research’-type work that has the ultimate goal of assisting decisionmakers (e.g. ForecastBench), so I think this is most of what FRI should be evaluated on.
I’m not sure what comparison class people had in mind previously, but I agree it seems broadly correct to consider this work alongside other AI-related funding opportunities. As noted above, I’d argue that it is appropriate and valuable to have “AI measurement” as an important funding domain alongside areas like “AI governance,” “Technical AI safety research,” “AI field-building,” etc. It seems valuable for one part of the AI grantmaking portfolio to be generating evidence that can be used to sharpen views on AI timelines, to assess risk in various domains (bio, cyber, catastrophic risk), to assess magnitudes of benefits (for calibrating cost-benefit analyses on policies), and to predict the likelihood and impact of various policies (e.g. the effectiveness of DNA synthesis screening for biorisk), etc. This type of fundamental research can inform and support more effective action in the other domains.
I also think forecasting research can have direct impacts on AI governance via direct decision-making partnerships like I described above: i.e., directly partnering with and advising important government agencies and frontier AI companies, among others, on high-stakes decisions related to AI regulation, implementing effective safeguards to reduce AI-cyber risk, and more. We have already seen some early impacts along these lines, as previously mentioned.
I agree. Due to confidentiality, we have primarily shared details of our impact case studies with our funders and had them assess the value of the impact we are making. Establishing evidence of impact publicly is more challenging due to confidentiality considerations. But elsewhere in the thread people have mentioned citations as one reasonable metric for evidence of impact for research organizations that have more diffuse impacts. We have targets for growing our prominent citations over time to assess our impact, and I’ve shared examples of prominent citations to FRI research in my comment above. I also hope that over time, we can share more case studies publicly and provide more of the reasoning for why we believe we had an impact and whether it was positive. The benchmarks RFP case study described above is one example that can be discussed relatively publicly.
I broadly agree on these points. We are running longitudinal expert panels, partnering with important institutions to improve their decision-making, and automating forecasting research, so I see our work as distinct from online betting/forecasting platforms.