Thanks for sharing your pragmatic overview here! I like the idea a lot.
Despite well-known shortcomings of narrowly optimising for metrics/benchmarks, I believe that curated benchmark datasets can be very helpful for progress on AI safety. To expand, following value propositions also seem promising:
Get more specific: try to encapsulate certain qualities of AI systems that we care about in a benchmark making that quality more specific and tractable
Make it more accessible: probably lower entry point to the field and can facilitate communication among the community
Thanks for sharing your pragmatic overview here! I like the idea a lot.
Despite well-known shortcomings of narrowly optimising for metrics/benchmarks, I believe that curated benchmark datasets can be very helpful for progress on AI safety. To expand, following value propositions also seem promising:
Get more specific: try to encapsulate certain qualities of AI systems that we care about in a benchmark making that quality more specific and tractable
Make it more accessible: probably lower entry point to the field and can facilitate communication among the community