There’s a bunch of things going on here. But roughly speaking, I think there’s at least two things going on:
When people think of “success from judgmental forecasting”, they usually think of a narrow thing that looks like the end product of the most open part of what Metaculus and Good Judgement .* does: coming up with good answers to specific, well-defined and useful questions. But a lot of the value of forecasting comes before and after that.
Even in the near-ideal situation of specific, well-defined forecasts, there are often metrics other than pure accuracy (beyond a certain baseline) that matters more.
For the first point, Ozzie Gooen (and I’m sure many other people) has thought a lot more about this. But my sense is that there’s a long pipeline of things that makes a forecast actually useful for people:
Noticing quantifiable uncertainty. I think a lot of the value of forecasting comes from the pre-question operationalization stage. This is being able to recognize both that something that might be relevant to a practical decision that you (or a client, or the world) rely on is a) uncertain and b) can be reasonably quantifiable. I think a lot of our assumptions we do not recognize as such, or the uncertainty is not crisp enough that we can even think that it’s a question we can ask others.
Data collection. Not sure where this fits in the pipeline, but often precise forecasts of the future are contextualized in the world of the relevant data that you have.
Question operationalization. This is what William Kiely’s question is referring to, which I’ll answer more in detail there. But roughly, it’s making your quantifiable uncertainty into a precise, well-defined question that can be evaluated and scored later.
Actual judgmental forecasting. This is mostly what I did, and what the leaderboards are ranked on, and what people think about when they think about “forecasting.”
Making those forecasts useful. If this is for yourself, it’s usually easier in some sense. If it’s for the “world” or the “public,” making forecasts useful often entails clear communication and marketing/advertising the forecasts so it can be taken up by relevant decision-makers (even if it’s just individuals). If it’s for a client, then this involves working closely with the client to make sure the client understands both your forecasts and its relevant implications, as well as possibly “going back to the drawing board” if the questions that you thought was operationalized well isn’t actually useful for the client.
Evaluation. Usually, if the earlier steps are done well, this is easy because the question is set up to be easy to evaluate. That said, there are tradeoffs here. For example, if people trusts you to evaluate forecasts well, you can afford to cut corners and thus expand the range of what is “quantifiable”, or start with worse question operationalizations and still deliver value.
For the second point, accuracy often trades off against other things. For example, cost-effectiveness and interpretability may matter more for clients.
If you spend a lot of time drilling down to a few questions, your forecasts are more “expensive” (both literally and figuratively) per question, and you will not be able to provide as much value in total. For interpretability, often just a number is not as helpful for clients, both in the sense of literal clients you directly work with and the world.
One thing that drives this point home to me is the existing “oracles” we have, like the stock market. There’s a sense in which the stock market is extremely accurate (for example options are mostly “correctly” priced for future prices, but for many of our non-financial decisions it takes a LOT of effort to interpret what signals the market sends that’s relevant to our future decisions, like how scared we should be of a future recession or large-scale famine.
There’s a bunch of things going on here. But roughly speaking, I think there’s at least two things going on:
When people think of “success from judgmental forecasting”, they usually think of a narrow thing that looks like the end product of the most open part of what Metaculus and Good Judgement .* does: coming up with good answers to specific, well-defined and useful questions. But a lot of the value of forecasting comes before and after that.
Even in the near-ideal situation of specific, well-defined forecasts, there are often metrics other than pure accuracy (beyond a certain baseline) that matters more.
For the first point, Ozzie Gooen (and I’m sure many other people) has thought a lot more about this. But my sense is that there’s a long pipeline of things that makes a forecast actually useful for people:
Noticing quantifiable uncertainty. I think a lot of the value of forecasting comes from the pre-question operationalization stage. This is being able to recognize both that something that might be relevant to a practical decision that you (or a client, or the world) rely on is a) uncertain and b) can be reasonably quantifiable. I think a lot of our assumptions we do not recognize as such, or the uncertainty is not crisp enough that we can even think that it’s a question we can ask others.
Data collection. Not sure where this fits in the pipeline, but often precise forecasts of the future are contextualized in the world of the relevant data that you have.
Question operationalization. This is what William Kiely’s question is referring to, which I’ll answer more in detail there. But roughly, it’s making your quantifiable uncertainty into a precise, well-defined question that can be evaluated and scored later.
Actual judgmental forecasting. This is mostly what I did, and what the leaderboards are ranked on, and what people think about when they think about “forecasting.”
Making those forecasts useful. If this is for yourself, it’s usually easier in some sense. If it’s for the “world” or the “public,” making forecasts useful often entails clear communication and marketing/advertising the forecasts so it can be taken up by relevant decision-makers (even if it’s just individuals). If it’s for a client, then this involves working closely with the client to make sure the client understands both your forecasts and its relevant implications, as well as possibly “going back to the drawing board” if the questions that you thought was operationalized well isn’t actually useful for the client.
Evaluation. Usually, if the earlier steps are done well, this is easy because the question is set up to be easy to evaluate. That said, there are tradeoffs here. For example, if people trusts you to evaluate forecasts well, you can afford to cut corners and thus expand the range of what is “quantifiable”, or start with worse question operationalizations and still deliver value.
For the second point, accuracy often trades off against other things. For example, cost-effectiveness and interpretability may matter more for clients.
If you spend a lot of time drilling down to a few questions, your forecasts are more “expensive” (both literally and figuratively) per question, and you will not be able to provide as much value in total. For interpretability, often just a number is not as helpful for clients, both in the sense of literal clients you directly work with and the world.
One thing that drives this point home to me is the existing “oracles” we have, like the stock market. There’s a sense in which the stock market is extremely accurate (for example options are mostly “correctly” priced for future prices, but for many of our non-financial decisions it takes a LOT of effort to interpret what signals the market sends that’s relevant to our future decisions, like how scared we should be of a future recession or large-scale famine.