I would wait for METR’s actual evaluation — ’30 hours’ is just based on claims of continued effort, not actual successful performance on carefully measured tasks.
I would wait for METR’s actual evaluation — ’30 hours’ is just based on claims of continued effort, not actual successful performance on carefully measured tasks.