This analysis seems to be entirely based on METR’s time horizon research. I think that research is valuable, but it raises the concern that any findings may be a result of particular quirks of METR’s approach, you describe some of those in here.
Are you aware of any alternative groups that have explored this question? It feels to me like it’s not a question you explicitly need time horizons to answer.
Yes, that is a big limitation. Even more limiting is that it is only based on a subset of METR’s data on this. That’s enough to raise the question and illustrate what an answer might look like in data like this, but not to really answer it.
I’m not aware of others exploring this question, but I haven’t done much looking.
This is a very important question to be asking.
This analysis seems to be entirely based on METR’s time horizon research. I think that research is valuable, but it raises the concern that any findings may be a result of particular quirks of METR’s approach, you describe some of those in here.
Are you aware of any alternative groups that have explored this question? It feels to me like it’s not a question you explicitly need time horizons to answer.
Yes, that is a big limitation. Even more limiting is that it is only based on a subset of METR’s data on this. That’s enough to raise the question and illustrate what an answer might look like in data like this, but not to really answer it.
I’m not aware of others exploring this question, but I haven’t done much looking.
yea doesn’t arc leaderboard have somewhat opposing trends? https://arcprize.org/leaderboard