I hope you are correct. I find it very hard to judge without standardized, non-gameable benchmarks for agents.
I hope you are correct. I find it very hard to judge without standardized, non-gameable benchmarks for agents.