If you can get a better score than our human subjects did on any of METR’s RE-Bench evals, send it to me and we will fly you out for an onsite interview
Caveats:
you’re employable (we can sponsor visas from most but not all countries)
use same hardware
honor system that you didn’t take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking
If you can get a better score than our human subjects did on any of METR’s RE-Bench evals, send it to me and we will fly you out for an onsite interview
Caveats:
you’re employable (we can sponsor visas from most but not all countries)
use same hardware
honor system that you didn’t take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking
(Crossposted from twitter.)