Very interesting critique. I’ve seen this kinds of comments in academic circles doing evals work, and there have been attempts to improve the situation such as the General Scales Framework:
Think of it as passing an IQ test instead of a school exam, more predictive power. It’s not percect ofc but thankfully some people are really taking this seriously.
Very interesting critique. I’ve seen this kinds of comments in academic circles doing evals work, and there have been attempts to improve the situation such as the General Scales Framework:
https://​​arxiv.org/​​abs/​​2503.06378
Think of it as passing an IQ test instead of a school exam, more predictive power. It’s not percect ofc but thankfully some people are really taking this seriously.