Why I don’t trust forecasters
Pseudonymous accounts, commonly found on prediction platforms and forums, provide individuals with the opportunity to share their forecasts and predictions without revealing their true identity. While this anonymity can protect the privacy and security of users, it also creates an environment ripe for manipulation. Pseudonymous accounts can present an inconsistent track record by making predictions that support opposing outcomes on different platforms or even within the same community.
One tactic employed by pseudonymous accounts to deceive followers is uncorrelated betting strategies—to place bets or predictions that cover multiple outcomes in an uncorrelated manner. By doing so, these accounts increase the probability of being correct on at least one prediction. For example, if an account predicts a AI will takeoff fast and slow on different platforms, they are essentially hedging their bets, ensuring that they can claim accuracy regardless of the actual outcome. This strategy allows them to maintain an illusion of expertise while minimizing the risk of being proven wrong. Even financial costs to betting can be compensated for given the grant and employment opportunities offered to successfull forecasters.
Another deceptive practice seen with pseudonymous accounts is selective disclosure. This means that individuals only reveal their identity when their predictions have been accurate or appear to be favorably aligned with the actual outcome. By withholding information about incorrect forecasts, these accounts create an inflated perception of their success rate and erode the reliability of their overall track record. Such selective disclosure can mislead followers into believing that the account possesses a higher level of accuracy than it genuinely does.
Relying on the track records of pseudonymous accounts can have significant consequences. Strategists and funders may make decisions based on inaccurate information, leading to impaired impact. Individuals seeking guidance on effective charities might be misled into making donation that are doomed to fail.
While pseudonymous accounts can provide a platform for diverse opinions and insights, it is crucial to approach any purported track records with skepticism. The ability to bet both ways, over multiple bets in uncorrelated ways, and selectively disclose favorable outcomes can create a distorted perception of accuracy.
I agree that the potential for this exists, and if it was an extended practice it would be concerning. Have you seen people who claim to have a good forecasting record engage with pseudonym exploitation though?
My understanding is that most people who claim this have proof records associated to a single pseudonym user in select platforms (eg Metaculus), which evades the problem you suggest.
You couldn’t know who is and is not engaging in this behaviour. Anyone with a good forecasting record may have shadow accounts.
I’m not familiar with proof records. Could you elaborate further? If this is verification such as identity documents, this could go some way to preventing manipulation.
If someone is doing the shadow account thing (ie, a boiler room scam, I think), there will be exponentially fewer forecasters for each number of successful bets. I don’t think this is the case for the well known ones
I mean rankings like https://www.metaculus.com/rankings/?question_status=resolved&timeframe=3month
I suggest that “why I don’t trust pseudonymous forecasters” would be a more appropriate title. When I saw the title I expected an argument that would apply to all/most forecasting, but this worry is only about a particular subset
The idea is that the potential for pseudonymous forecasting makes all forecaster track records suspect
I think you point to some potential for scepticism, but I don’t think this is convincing. Selective disclosure is unlikely to be a problem where a user can only point to summary statistics for their whole activity, like on Metaculus. An exception might be if only a subset of stats were presented, like ranking in past 3/6/12 months without giving Briers or other periods etc. But you could just ask for all the relevant stats.
The uncorrelated betting isn’t a problem if you just require a decent volume of questions in the track record. If you basically want at least 100 binary questions to form a track record, and say 20 of them were hard enough such that the malicious user wanted to hedge on them, you’d need 2^20 accounts to cover all possible answer sets. If they just wanted good performance on half of them, you’d still need 2^10 accounts.
A more realistic reason for scepticism is that points/ranking on Metaculus is basically a function of activity over time. You can be only a so-so forecaster but have an impressive Metaculus record just by following the crowd on loads of questions or picking probabilities that guarentee points. But Brier scores, especially relative to the community, should reveal this kind of chicanery.
The biggest reason for scepticism regarding forecasting as it’s used in EA is generalisation across domains. How confident should we be that the forecasters/heuristics/approaches that are good for U.S. political outcomes or Elon Musk activity translate successfully to predicting the future of AI or catastrophic pandemics or whatever? Michael Aird’s talk mentions some good reasons why some translation is reasonable to expect, but this is an open and ongoing question.
I don’t think the forecaster needs 2^10 accounts if they pick a set of problems with mutually correlated outcomes. For example, you can make two accounts for AI forecasting, and have one bet consistently more AI skeptical than the average and the other more AI doomy than the average. You could do more than 2, too, like very skeptical, skeptical, average, doomy, very doomy. One of them could end up with a good track record in AI forecasting.
If doing well across domains is rewarded much more than similar performance within a domain, it would be harder to get away with this (assuming problems across domains have relatively uncorrelated outcomes, but you could probably find sources of correlation across some domains, like government competence). But then someone could look only for easy questions across domains to build their track record. So, maybe there’s a balance to strike. Also, rather than absolute performance across possibly different questions like the Brier score, you should measure performance relative to peers on each question and average that. Maybe something like relative returns on investment in prediction markets, with a large number of bets and across a large number of domains.
Good point on the correlated outcomes. I think you’re right that cross-domain performance could be a good measure, especially since performance in a single domain could be driven by having a single foundational prior that turned out to be right, rather than genuine forecasting skill.
On the second point, I’m pretty sure the Metaculus results already just compare your Brier to the community based on the same set of questions. So you could base inter-forecaster comparisons based on that difference (weakly).
I don’t have much sense this happens.
It certainly does happen, the question is to what extent