But as I understand it, Eliezer regards himself as being able to do unusually well using the techniques he has described, and so would predict his own success in forecasting tournaments.
This is also my model of Eliezer; my point is that my thoughts on modesty / anti-modesty are mostly disconnected to whether or not Eliezer is right about his forecasting accuracy, and mostly connected to the underlying models of how modesty and anti-modesty work as epistemic positions.
How narrowly should you define the ‘expert’ group?
I want to repeat something to make sure there isn’t confusion or double illusion of transparency; “narrowness” doesn’t mean just the size of the group but also the qualities that are being compared to determine who’s expert and who isn’t.
I think with Eliezer’s approach, superforecasters should exist, and it should be possible to be aware that you are a superforecaster. Those both seem like they would be lower probability under the modest view. Whether Eliezer personally is a superforecaster seems about as relevant as whether Tetlock is one; you don’t need to be a superforecaster to study them.
I expect Eliezer to agree that a careful aggregation of superforecasters will outperform any individual superforecaster; similarly, I expect Eliezer to think that a careful aggregation of anti-modest reasoners will outperform any individual anti-modest reasoner.
It’s worth considering what careful aggregations look like when not dealing with binary predictions. The function of a careful aggregation is to disproportionately silence error while maintaining signal. With many short-term binary predictions, we can use methods that focus on the outcomes, without any reference to how those predictors are estimating those outcomes. With more complicated questions, we can’t compare outcomes directly, and so need to use the reasoning processes themselves as data.
That suggests a potential disagreement to focus on: the anti-modest view suspects that one can do a careful aggregation based on reasoner methodology (say, weighing more highly forecasters who adjust their estimates more frequently, or who report using Bayes, or so on), whereas I think the modest view suspects that this won’t outperform uniform aggregation.
(The modest view has two components—approving of weighting past performance, and disapproving of other weightings. Since other approaches can agree on the importance of past performance, and the typical issues where the two viewpoints differ are those where we have little data on past performance, it seems more relevant to focus on whether the disapproval is correct than whether the approval is correct.)