Hi Eliezer, I wonder if you’ve considered trying to demonstrate the superiority of your epistemic approach by participating in one of the various forecasting tournaments funded by IARPA, and trying to be classified as a ‘superforecaster’. For example the new Hybrid Forecasting Competition is actively recruiting participants.
To me your advice seems in tension with the recommendations that have come out of that research agenda (via Tetlock and others) which finds forecasts carefully aggregated from many people perform better than almost any individuals—and individuals that beat the aggregation were almost always lucky and can’t repeat the feat. I’d be interested to see how an anti-modest approach fares in direct quantified competition with alternatives.
It would be understandable if you didn’t think that was the best use of your time, in which case perhaps some others who endorse and practice the mindset you recommend could find the time to do it instead.
I think with Eliezer’s approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don’t need to be a superforecaster to study them.
I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.
It’s worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can’t compare outcomes directly, and so need to use the reasoning processes themselves as data.
That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won’t outperform uniform aggregation.
(The modest view has two components—approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)
OK so it seems like the potential areas of disagreement are:
How much external confirmation do you need to know that you’re a superforecaster (or have good judgement in general), or even the best forecaster?
How narrowly should you define the ‘expert’ group?
How often should you define who is a relevant expert based on whether you agree with them in that specific case?
How much should you value ‘wisdom of the crowd (of experts)’ against the views of the one best person?
How much to follow a preregistered process to whatever conclusion it leads to, versus change the algorithm as you go to get an answer that seems right?
We’ll probably have to go through a lot of specific cases to see how much disagreement there actually is. It’s possible to talk in generalities and feel you disagree, but actually be pretty close on concrete cases.
Note that it’s entirely possible that non-modest contributors will do more to enhance the accuracy of a forecasting tournament because they try harder to find errors, but less right than others’ all-things-considered views, because of insufficient deference to the answer the tournament as a whole spits out. Active traders enhance market efficiency, but still lose money as a group.
As for Eliezer knowing how to make good predictions, but not being able to do it himself, that’s possible (though it would raise the question of how he has gotten strong evidence that these methods work). But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
This is also my model of Eliezer; my point is that my thoughts on modesty / anti-modesty are mostly disconnected to whether or not Eliezer is right about his forecasting accuracy, and mostly connected to the underlying models of how modesty and anti-modesty work as epistemic positions.
How narrowly should you define the ‘expert’ group?
I want to repeat something to make sure there isn’t confusion or double illusion of transparency; “narrowness” doesn’t mean just the size of the group but also the qualities that are being compared to determine who’s expert and who isn’t.
It’s an interesting just so story about what IARPA has to say about epistemology, but the actual story is much more complicated. For instance, the fact that “Extremizing” works to better calibrate general forecasts, but that extremizing of superforecaster’s predictions makes them worse.
Furthermore, that contrary to what you seem to be claiming about people not being able to outperform others, there are in fact “superforecasters” who out perform the average participant year after year, even if they can’t outperform the aggregate when their forecasts are factored in.
Not sure how this is a ‘just so story’ in the sense that I understand the term.
“the fact that “Extremizing” works to better calibrate general forecasts, but that extremizing of superforecaster’s predictions makes them worse.”
How is that in conflict with my point? As superforecasters spend more time talking and sharing information with one another, maybe they have already incorporated extremising into their own forecasts.
I know very well about superforecasters (I’ve read all of his books and interviewed Tetlock last week), but I am pretty sure an aggregation of superforecasters beats almost all of them individually, which speaks to the benefits of averaging a range of people’s views in most cases. Though in many cases you should not give much weight to those who are clearly in a worse epistemic position (non-superforecasters, whose predictions Tetlock told me were about 10-30x less useful).
How is that in conflict with my point? As superforecasters spend more time talking and sharing information with one another, maybe they have already incorporated extremising into their own forecasts.
Doesn’t this clearly demonstrate that the superforecasters are not using modest epistemology? At best, this shows that you can improve upon a “non-modest” epistemology by aggregating them together, but does not argue against the original post.
Hi Halffull—now I see what you’re saying, but actually the reverse is true. That superforecasters have already extremised shows their higher levels of modesty. Extremising is about updating based on other people’s views, and realising that because they have independent information to add, after hearing their view, you can be more confident of where to shift from your prior.
Imagine two epistemic peers estimating the weighting of a coin. They start with their probabilities bunched around 50% because they have been told the coin will probably be close to fair. They both see the same number of flips, and then reveal their estimates of the weighting. Both give an estimate of p=0.7. A modest person, who correctly weights the other person’s estimates as equally as informative as their own, will now offer a number quite a bit higher than 0.7, which takes into account the equal information both of them has to pull them away from their prior.
Once they’ve done that, there won’t be gains from further extremising. But a non-humble participant would fail to properly extremise based on the information in the other person’s view, leaving accuracy to be gained if this is done at a later stage by someone running the forecasting tournament.
Imagine two epistemic peers estimating the weighting of a coin. They start with their probabilities bunched around 50% because they have been told the coin will probably be close to fair. They both see the same number of flips, and then reveal their estimates of the weighting. Both give an estimate of p=0.7. A modest person, who correctly weights the other person’s estimates as equally as informative as their own, will now offer a number quite a bit higher than 0.7, which takes into account the equal information both of them has to pull them away from their prior.
This is what I’m talking about when I say “jut so stories” about the data from the GJP. One explanation is that superforecasters are going through this thought process, another would be that they discard non-superforecasters’ knowledge, and therefore end up as more extreme without explicitly running the extremizing algorithm on their own forecasts.
Similarly, the existence of super-forecasters themselves argues for a non-modest epistemology, while the fact that the extremized aggregation beats the superforecasters may argue for somewhat of a more modest epistemology. Saying that the data here points one way or the other to my mind is cherrypicking.
″...the existence of super-forecasters themselves argues for a non-modest epistemology...”
I don’t see how. No theory on offer argues that everyone is an epistemic peer. All theories predict some people have better judgement and will be reliably able to produce better guesses.
As a result I think superforecasters should usually pay little attention to the predictions of non-superforecasters (unless it’s a question on which expertise pays few dividends).
Hi Eliezer, I wonder if you’ve considered trying to demonstrate the superiority of your epistemic approach by participating in one of the various forecasting tournaments funded by IARPA, and trying to be classified as a ‘superforecaster’. For example the new Hybrid Forecasting Competition is actively recruiting participants.
To me your advice seems in tension with the recommendations that have come out of that research agenda (via Tetlock and others) which finds forecasts carefully aggregated from many people perform better than almost any individuals—and individuals that beat the aggregation were almost always lucky and can’t repeat the feat. I’d be interested to see how an anti-modest approach fares in direct quantified competition with alternatives.
It would be understandable if you didn’t think that was the best use of your time, in which case perhaps some others who endorse and practice the mindset you recommend could find the time to do it instead.
I think with Eliezer’s approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don’t need to be a superforecaster to study them.
I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.
It’s worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can’t compare outcomes directly, and so need to use the reasoning processes themselves as data.
That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won’t outperform uniform aggregation.
(The modest view has two components—approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)
OK so it seems like the potential areas of disagreement are:
How much external confirmation do you need to know that you’re a superforecaster (or have good judgement in general), or even the best forecaster?
How narrowly should you define the ‘expert’ group?
How often should you define who is a relevant expert based on whether you agree with them in that specific case?
How much should you value ‘wisdom of the crowd (of experts)’ against the views of the one best person?
How much to follow a preregistered process to whatever conclusion it leads to, versus change the algorithm as you go to get an answer that seems right?
We’ll probably have to go through a lot of specific cases to see how much disagreement there actually is. It’s possible to talk in generalities and feel you disagree, but actually be pretty close on concrete cases.
Note that it’s entirely possible that non-modest contributors will do more to enhance the accuracy of a forecasting tournament because they try harder to find errors, but less right than others’ all-things-considered views, because of insufficient deference to the answer the tournament as a whole spits out. Active traders enhance market efficiency, but still lose money as a group.
As for Eliezer knowing how to make good predictions, but not being able to do it himself, that’s possible (though it would raise the question of how he has gotten strong evidence that these methods work). But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
This is also my model of Eliezer; my point is that my thoughts on modesty / anti-modesty are mostly disconnected to whether or not Eliezer is right about his forecasting accuracy, and mostly connected to the underlying models of how modesty and anti-modesty work as epistemic positions.
I want to repeat something to make sure there isn’t confusion or double illusion of transparency; “narrowness” doesn’t mean just the size of the group but also the qualities that are being compared to determine who’s expert and who isn’t.
It’s an interesting just so story about what IARPA has to say about epistemology, but the actual story is much more complicated. For instance, the fact that “Extremizing” works to better calibrate general forecasts, but that extremizing of superforecaster’s predictions makes them worse.
Furthermore, that contrary to what you seem to be claiming about people not being able to outperform others, there are in fact “superforecasters” who out perform the average participant year after year, even if they can’t outperform the aggregate when their forecasts are factored in.
Not sure how this is a ‘just so story’ in the sense that I understand the term.
“the fact that “Extremizing” works to better calibrate general forecasts, but that extremizing of superforecaster’s predictions makes them worse.”
How is that in conflict with my point? As superforecasters spend more time talking and sharing information with one another, maybe they have already incorporated extremising into their own forecasts.
I know very well about superforecasters (I’ve read all of his books and interviewed Tetlock last week), but I am pretty sure an aggregation of superforecasters beats almost all of them individually, which speaks to the benefits of averaging a range of people’s views in most cases. Though in many cases you should not give much weight to those who are clearly in a worse epistemic position (non-superforecasters, whose predictions Tetlock told me were about 10-30x less useful).
Doesn’t this clearly demonstrate that the superforecasters are not using modest epistemology? At best, this shows that you can improve upon a “non-modest” epistemology by aggregating them together, but does not argue against the original post.
Hi Halffull—now I see what you’re saying, but actually the reverse is true. That superforecasters have already extremised shows their higher levels of modesty. Extremising is about updating based on other people’s views, and realising that because they have independent information to add, after hearing their view, you can be more confident of where to shift from your prior.
Imagine two epistemic peers estimating the weighting of a coin. They start with their probabilities bunched around 50% because they have been told the coin will probably be close to fair. They both see the same number of flips, and then reveal their estimates of the weighting. Both give an estimate of p=0.7. A modest person, who correctly weights the other person’s estimates as equally as informative as their own, will now offer a number quite a bit higher than 0.7, which takes into account the equal information both of them has to pull them away from their prior.
Once they’ve done that, there won’t be gains from further extremising. But a non-humble participant would fail to properly extremise based on the information in the other person’s view, leaving accuracy to be gained if this is done at a later stage by someone running the forecasting tournament.
This is what I’m talking about when I say “jut so stories” about the data from the GJP. One explanation is that superforecasters are going through this thought process, another would be that they discard non-superforecasters’ knowledge, and therefore end up as more extreme without explicitly running the extremizing algorithm on their own forecasts.
Similarly, the existence of super-forecasters themselves argues for a non-modest epistemology, while the fact that the extremized aggregation beats the superforecasters may argue for somewhat of a more modest epistemology. Saying that the data here points one way or the other to my mind is cherrypicking.
″...the existence of super-forecasters themselves argues for a non-modest epistemology...”
I don’t see how. No theory on offer argues that everyone is an epistemic peer. All theories predict some people have better judgement and will be reliably able to produce better guesses.
As a result I think superforecasters should usually pay little attention to the predictions of non-superforecasters (unless it’s a question on which expertise pays few dividends).