It seems plausible to me that a useful version of forecasting grant outcomes would be too time-consuming to be worthwhile. (I donāt really have a strong stance on the matter currently.) And your experience with useful forecasting for LessWrong work being very time-consuming definitely seems like relevant data.
But this part of your answer confused me:
my sense is that it actually takes a lot of time until you can get a group of 5 relatively disagreeable people to agree on an operationalization that makes sense to everyone, and so this isnāt really super feasible to do for lots of grants
Naively, Iād have thought that, if that was a major obstacle, you could just have a bunch of separate operationalisations, and people can forecast on whichever ones they want to forecast on. If, later, some or all operationalisations do indeed seem to have been too flawed for it to be useful to compare reality to them, assess calibration, etc., you could just not do those things for those operationalisations/āthat grant.
(Note that Iām not necessarily imagining these forecasts being made public in advance or afterwards. They could be engaged in internally to the extent that makes senseāsometimes ignoring them if that seems appropriate in a given case.)
Is there a reason Iām missing for why this doesnāt work?
Or was the point about difficulty of agreeing on an operationalisation really meant just as evidence of how useful operationalisations are hard to generate, as opposed to the disagreement itself being the obstacle?
Thanks for that answer.
It seems plausible to me that a useful version of forecasting grant outcomes would be too time-consuming to be worthwhile. (I donāt really have a strong stance on the matter currently.) And your experience with useful forecasting for LessWrong work being very time-consuming definitely seems like relevant data.
But this part of your answer confused me:
Naively, Iād have thought that, if that was a major obstacle, you could just have a bunch of separate operationalisations, and people can forecast on whichever ones they want to forecast on. If, later, some or all operationalisations do indeed seem to have been too flawed for it to be useful to compare reality to them, assess calibration, etc., you could just not do those things for those operationalisations/āthat grant.
(Note that Iām not necessarily imagining these forecasts being made public in advance or afterwards. They could be engaged in internally to the extent that makes senseāsometimes ignoring them if that seems appropriate in a given case.)
Is there a reason Iām missing for why this doesnāt work?
Or was the point about difficulty of agreeing on an operationalisation really meant just as evidence of how useful operationalisations are hard to generate, as opposed to the disagreement itself being the obstacle?