Evan_Gaensbauer comments on August Open Thread: EA Global!

Evan_Gaensbauer Aug 16, 2015, 6:47 PM
0 points
0 ∶ 0
To clarify, there is a class of persons known as “superforecasters”. I don’t know the details of the science to back it up, except their efficacy has indeed been validly measured, so you’ll have to look more up yourself to learn how it happens. What happens, though, is superforecasters are humans who, even though they don’t usually have domain expertise in a particular subject, predict outcomes in a particular domain with more success than experts in the domain, e.g., economics. I think that might be one layperson forecaster versus one expert, rather than the consensus of experts making the prediction, but I don’t know. I don’t believe there’s been a study about the prediction success rates of a consensus of superforecasters vs. a consensus of domain experts on predicting outcomes relevant to their expertise. That would be very interesting. These are rather new results.

Anyway, superforecasters can also beat algorithms which try to learn how to make predictions, which are in turn also better than experts. So, no human or machine yet is better than superforecasters at making lots of types of predictions. In case you’re wondering, no, it’s not just you, that is a ludicruous and stupendous outcome. Like, what? mind blown. The researchers were surprised too.

From the linked NPR article:

For most of his professional career, Tetlock studied the problems associated with expert decision making. His book Expert Political Judgment is considered a classic, and almost everyone in the business of thinking about judgment speaks of it with unqualified awe.

All of his studies brought Tetlock to at least two important conclusions.

First, if you want people to get better at making predictions, you need to keep score of how accurate their predictions turn out to be, so they have concrete feedback.

But also, if you take a large crowd of different people with access to different information and pool their predictions, you will be in much better shape than if you rely on a single very smart person, or even a small group of very smart people. [emphasis mine]

Takeaways for effective altruist predictions:
- Track your predictions. Any effective altruist seeing value in prediction markets takes this as a given.
- There are characteristics which make some forecasters better than others, even adjusting for level of practice and calibration. I don’t know what these characteristics are, but I’m guessing it’s some sort of analytic mindest. Maybe effective altruists, in this sense, might also turn out to be great forecaster. That’d be very fortuitous for us. We need to look into this more.
- If, like me, you perceive much potential in prediction markets for effective altruism, you’d value a diversity of intellectual perspectives, to increase chances of hitting the “wisdom of the crowds” effect Tetlock mentions. Now, SydMartin, I know both you and I know what a shadow a lack of diversity casts on effective altruism. I emphasized the last paragraph because you just last week commented on the propensity of effective altruism to be presumptuous and elitist about its own abilities as well. I believe a failure to accurately predict future outcomes on the part of this community would be due more to a lack of intellectual diversity, i.e., everyone hailing from mostly the same university majors (e.g., philosophy, economics, computer science). I think this would more play a factor than sociopolitical homegeneity within effective altruism. Still, that’s just my pet hypothesis that’s yet to pan out in any way.
I also wonder about how well near-future prediction ability translates to far-future predictions. In order to test how well you are able to predict thing you predict near-future events or changes. You increase your accuracy at doing these and assume it translates to the far-future. Lots of people make decisions based around your far-future predictions based on your track record of being an accurate predictor. Perhaps, however, your model of forecasting is actually wildly inaccurate when it comes to long term predictions. I’m not sure how we could account for this. Thoughts?

I’d be concerned a successful track record of near-term predictions would tell us much about potential success with long-term predictions. First of all, for existential risks, I suspect predictions made in the near-term relating to the field of only a single existential risk, such as A.I. risks, should be counted toward expectations for their long-term track record[1]. Even if it’s more complicated than that, I think there is something near-term prediction track records can tell us. If someone near-future prediction track records are awful, that at least informs us the team or person in question isn’t great at predictions at all. So, we would not want to rely on their predictions further afield as well.

It’s like science. We can’t inductively conclude that there correct predictions of the past will continue on some arbitrary timescale, but we can rule out bad predictors from being reliable by process of elimination.

I think prediction markets might apply to all focus areas of effective altruism, though not always to the same extent. Running intervention experiments is difficult or expensive. For example, while GiveDireclty, IPA, and the Poverty Action Lab build on so many millions of dollars of development aid each year already, effective altruism itself has been responsible to inject the same empiricism into animal activism. Intervention experiments into animal activism have been expensive for organizations like Mercy For Animals, so these experiments aren’t carried out, or refined to try to find better methods, often. Also, there’s difficulty in getting the cooperation of animal activists on randomized control trials, as their community isn’t as receptive yet. Further, both due the low numbers of volunteers, like Peter Hurford, from effective altruism, and our lack of experience, it’s difficult to get experimental designs right the first time, and in as short a timeframe as, e.g., Animal Charity Evaluators, would hope.

However, after a first successful experiment, for whatever value of “success” effective altruism or others assign, other organizations could design experiments using the same paradigm and preregister their plans. Then, an EA prediction registry or market could look at the details of the experiment, or demand more details, and predict the chance it would confirm the hypothesis/goal/whatever. They could judge the new design on how it deviates from the original template, or how closely they expect it to replicate, or how biased they think it will be. If the most reliable forecasters weren’t confident in the experiment, that would inform to the rest of us whether it’s worth us funding it when organizations ask for funding. This way, we can select animal advocacy RCTs or other studies more efficiently, when we’re limited by how many we carry out because of a scarcity of resources.

Of course, this isn’t just for experiments, or animal activism. The great thing about a prediction market anyone can enter is nobody needs to centrally allocate the information to all predictors. They could have expertise, hunches, or whatever nobody else knows about, and as long as they’re confident in their own analysis or information, they’ll bet on it. I was discussing certificate of impact purchases on Facebook yesterday, and Lauren Lee came forward stating she might prefer prediction markets to predict the value and success of a project before it’s started, rather than a posterior evaluation based on impact certificates. I don’t see a reason there shouldn’t be both, though.

Presuming effective altruism becomes bigger and more ambitious in the future, the community will try more policy interventions, research projects, and small- and large-scale interventions we won’t have tested yet. Of course, some experiments won’t need to rely on prediction markets, but there is little reason forecasters couldn’t bet on their success as well to hone their prediction skills.

[1] Yes, this counts as predicting how successful predictions would be. Go meta!