I hope you are correct. As an outsider, I find it very hard to judge without standardized, non-gameable benchmarks for agents.
Current theme: default
Less Wrong (text)
Less Wrong (link)
I hope you are correct. As an outsider, I find it very hard to judge without standardized, non-gameable benchmarks for agents.