We haven’t historically done this. As someone who has tried pretty hard to incorporate forecasting into my work at LessWrong, my sense is that it actually takes a lot of time until you can get a group of 5 relatively disagreeable people to agree on an operationalization that makes sense to everyone, and so this isn’t really super feasible to do for lots of grants. I’ve made forecasts for LessWrong, and usually creating a set of forecasts that actually feels useful in assessing our performance takes me at least 5-10 hours.
It’s possible that other people are much better at this than I am, but this makes me kind of hesitant to use at least classical forecasting methods as part of LTFF evaluation.
It seems plausible to me that a useful version of forecasting grant outcomes would be too time-consuming to be worthwhile. (I don’t really have a strong stance on the matter currently.) And your experience with useful forecasting for LessWrong work being very time-consuming definitely seems like relevant data.
But this part of your answer confused me:
my sense is that it actually takes a lot of time until you can get a group of 5 relatively disagreeable people to agree on an operationalization that makes sense to everyone, and so this isn’t really super feasible to do for lots of grants
Naively, I’d have thought that, if that was a major obstacle, you could just have a bunch of separate operationalisations, and people can forecast on whichever ones they want to forecast on. If, later, some or all operationalisations do indeed seem to have been too flawed for it to be useful to compare reality to them, assess calibration, etc., you could just not do those things for those operationalisations/that grant.
(Note that I’m not necessarily imagining these forecasts being made public in advance or afterwards. They could be engaged in internally to the extent that makes sense—sometimes ignoring them if that seems appropriate in a given case.)
Is there a reason I’m missing for why this doesn’t work?
Or was the point about difficulty of agreeing on an operationalisation really meant just as evidence of how useful operationalisations are hard to generate, as opposed to the disagreement itself being the obstacle?
I think the most lightweight-but-still-useful forecasting operationalization I’d be excited about is something like
12/24/120 months from now, will I still be very excited about this grant?
12/24/120 months from now, will I be extremely excited about this grant?
This gets at whether people think it’s a good idea ex post, and also (if people are well-calibrated) can quantify whether people are insufficiently or too risk/ambiguity-averse, in the classic sense of the term.
This seems helpful to assess fund managers’ calibration and improve their own thinking and decision-making. It’s less likely to be useful for communicating their views transparently to one another, or to the community, and it’s susceptible to post-hoc rationalization. I’d prefer an oracle external to the fund, like “12 months from now, will X have a ≥7/10 excitement about this grant on a 1-10 scale?”, where X is a person trusted by the fund managers who will likely know about the project anyway, such that the cost to resolve the forecast is small.
I plan to encourage the funds to experiment with something like this going forward.
Just to make sure I’m understanding, are you also indicating that the LTFF doesn’t write down in advance what sort of proxies you’d want to see from this grant after x amount of time? And that you think the same challenges with doing useful forecasting for your LessWrong work would also apply to that?
These two things (forecasts and proxies) definitely seem related, and both would involve challenges in operationalising things. But they also seem meaningfully different.
I’d also think that, in evaluating a grant, I might find it useful to partly think in terms of “What would I like to see from this grantee x months/years from now? What sorts of outputs or outcomes would make me update more in favour of renewing this grant—if that’s requested—and making similar grants in future?”
We’ve definitely written informally things like “this is what would convince me that this grant was a good idea”, but we don’t have a more formalized process for writing down specific objective operationalizations that we all forecast on.
We haven’t historically done this. As someone who has tried pretty hard to incorporate forecasting into my work at LessWrong, my sense is that it actually takes a lot of time until you can get a group of 5 relatively disagreeable people to agree on an operationalization that makes sense to everyone, and so this isn’t really super feasible to do for lots of grants. I’ve made forecasts for LessWrong, and usually creating a set of forecasts that actually feels useful in assessing our performance takes me at least 5-10 hours.
It’s possible that other people are much better at this than I am, but this makes me kind of hesitant to use at least classical forecasting methods as part of LTFF evaluation.
Thanks for that answer.
It seems plausible to me that a useful version of forecasting grant outcomes would be too time-consuming to be worthwhile. (I don’t really have a strong stance on the matter currently.) And your experience with useful forecasting for LessWrong work being very time-consuming definitely seems like relevant data.
But this part of your answer confused me:
Naively, I’d have thought that, if that was a major obstacle, you could just have a bunch of separate operationalisations, and people can forecast on whichever ones they want to forecast on. If, later, some or all operationalisations do indeed seem to have been too flawed for it to be useful to compare reality to them, assess calibration, etc., you could just not do those things for those operationalisations/that grant.
(Note that I’m not necessarily imagining these forecasts being made public in advance or afterwards. They could be engaged in internally to the extent that makes sense—sometimes ignoring them if that seems appropriate in a given case.)
Is there a reason I’m missing for why this doesn’t work?
Or was the point about difficulty of agreeing on an operationalisation really meant just as evidence of how useful operationalisations are hard to generate, as opposed to the disagreement itself being the obstacle?
I think the most lightweight-but-still-useful forecasting operationalization I’d be excited about is something like
This gets at whether people think it’s a good idea ex post, and also (if people are well-calibrated) can quantify whether people are insufficiently or too risk/ambiguity-averse, in the classic sense of the term.
This seems helpful to assess fund managers’ calibration and improve their own thinking and decision-making. It’s less likely to be useful for communicating their views transparently to one another, or to the community, and it’s susceptible to post-hoc rationalization. I’d prefer an oracle external to the fund, like “12 months from now, will X have a ≥7/10 excitement about this grant on a 1-10 scale?”, where X is a person trusted by the fund managers who will likely know about the project anyway, such that the cost to resolve the forecast is small.
I plan to encourage the funds to experiment with something like this going forward.
I agree that your proposed operationalization is better for the stated goals, assuming similar levels of overhead.
Just to make sure I’m understanding, are you also indicating that the LTFF doesn’t write down in advance what sort of proxies you’d want to see from this grant after x amount of time? And that you think the same challenges with doing useful forecasting for your LessWrong work would also apply to that?
These two things (forecasts and proxies) definitely seem related, and both would involve challenges in operationalising things. But they also seem meaningfully different.
I’d also think that, in evaluating a grant, I might find it useful to partly think in terms of “What would I like to see from this grantee x months/years from now? What sorts of outputs or outcomes would make me update more in favour of renewing this grant—if that’s requested—and making similar grants in future?”
We’ve definitely written informally things like “this is what would convince me that this grant was a good idea”, but we don’t have a more formalized process for writing down specific objective operationalizations that we all forecast on.