I think with Eliezer’s approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don’t need to be a superforecaster to study them.
I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.
It’s worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can’t compare outcomes directly, and so need to use the reasoning processes themselves as data.
That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won’t outperform uniform aggregation.
(The modest view has two components—approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)
OK so it seems like the potential areas of disagreement are:
How much external confirmation do you need to know that you’re a superforecaster (or have good judgement in general), or even the best forecaster?
How narrowly should you define the ‘expert’ group?
How often should you define who is a relevant expert based on whether you agree with them in that specific case?
How much should you value ‘wisdom of the crowd (of experts)’ against the views of the one best person?
How much to follow a preregistered process to whatever conclusion it leads to, versus change the algorithm as you go to get an answer that seems right?
We’ll probably have to go through a lot of specific cases to see how much disagreement there actually is. It’s possible to talk in generalities and feel you disagree, but actually be pretty close on concrete cases.
Note that it’s entirely possible that non-modest contributors will do more to enhance the accuracy of a forecasting tournament because they try harder to find errors, but less right than others’ all-things-considered views, because of insufficient deference to the answer the tournament as a whole spits out. Active traders enhance market efficiency, but still lose money as a group.
As for Eliezer knowing how to make good predictions, but not being able to do it himself, that’s possible (though it would raise the question of how he has gotten strong evidence that these methods work). But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
This is also my model of Eliezer; my point is that my thoughts on modesty / anti-modesty are mostly disconnected to whether or not Eliezer is right about his forecasting accuracy, and mostly connected to the underlying models of how modesty and anti-modesty work as epistemic positions.
How narrowly should you define the ‘expert’ group?
I want to repeat something to make sure there isn’t confusion or double illusion of transparency; “narrowness” doesn’t mean just the size of the group but also the qualities that are being compared to determine who’s expert and who isn’t.
I think with Eliezer’s approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don’t need to be a superforecaster to study them.
I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.
It’s worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can’t compare outcomes directly, and so need to use the reasoning processes themselves as data.
That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won’t outperform uniform aggregation.
(The modest view has two components—approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)
OK so it seems like the potential areas of disagreement are:
How much external confirmation do you need to know that you’re a superforecaster (or have good judgement in general), or even the best forecaster?
How narrowly should you define the ‘expert’ group?
How often should you define who is a relevant expert based on whether you agree with them in that specific case?
How much should you value ‘wisdom of the crowd (of experts)’ against the views of the one best person?
How much to follow a preregistered process to whatever conclusion it leads to, versus change the algorithm as you go to get an answer that seems right?
We’ll probably have to go through a lot of specific cases to see how much disagreement there actually is. It’s possible to talk in generalities and feel you disagree, but actually be pretty close on concrete cases.
Note that it’s entirely possible that non-modest contributors will do more to enhance the accuracy of a forecasting tournament because they try harder to find errors, but less right than others’ all-things-considered views, because of insufficient deference to the answer the tournament as a whole spits out. Active traders enhance market efficiency, but still lose money as a group.
As for Eliezer knowing how to make good predictions, but not being able to do it himself, that’s possible (though it would raise the question of how he has gotten strong evidence that these methods work). But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
This is also my model of Eliezer; my point is that my thoughts on modesty / anti-modesty are mostly disconnected to whether or not Eliezer is right about his forecasting accuracy, and mostly connected to the underlying models of how modesty and anti-modesty work as epistemic positions.
I want to repeat something to make sure there isn’t confusion or double illusion of transparency; “narrowness” doesn’t mean just the size of the group but also the qualities that are being compared to determine who’s expert and who isn’t.