I broadly agree with section 1, and in fact since we published I’ve been looking into how time horizon varies between domains. Not only is there lots of variance in time horizon, the rate of increase also varies significantly.
I do not have any particular benchmarks in mind, but I think the ones Mechanize is aiming to develop will capture economic value more closely. So you may want to look at the existing benchmarks which you think will more closely resemble those. I would also ask Mechanize about it, @Thomas Kwa.
I broadly agree with section 1, and in fact since we published I’ve been looking into how time horizon varies between domains. Not only is there lots of variance in time horizon, the rate of increase also varies significantly.
See a preliminary graph plus further observations on LessWrong shortform.
Thanks for sharing, Thomas! I expect benchmarks whose scores are closer to being proportional to economic output improve slower.
FWIW I predict they will be a constant factor harder but improve at similar rates. Any particular benchmarks you think I should look at?
I do not have any particular benchmarks in mind, but I think the ones Mechanize is aiming to develop will capture economic value more closely. So you may want to look at the existing benchmarks which you think will more closely resemble those. I would also ask Mechanize about it, @Thomas Kwa.
Your top comment no longer includes the graph.