isaduan comments on Race to the Top: Benchmarks for AI Safety

isaduan 5 Dec 2022 20:37 UTC
4 points
0 ∶ 0
Good question. Benchmarks provide empirical, quantitative evaluation. They can be static datasets, e.g. ImageNet. They can also be models! For example, CLIP is a model capable of image captioning and is used to evaluate image generation models like DALLE2, specifically how aligned the generated images are to text inputs.
The bottom line is, benchmarks should provide a way for AI labs and researchers to compare with each other in a fair way, representing the research progress towards goals that the research community cares about.
Hope this helps!