Hey, cool toy model (:
I bet there’s not enough data on METR about how messy are the tasks to include it here, but I would expect it to have real world consequences and to tug in the direction of agents being less viable outside of well defined domains.
Hey, cool toy model (:
I bet there’s not enough data on METR about how messy are the tasks to include it here, but I would expect it to have real world consequences and to tug in the direction of agents being less viable outside of well defined domains.