Philip Tetlock on why accurate forecasting matters for everything, and how you can do it better

This is a linkpost for #60 – Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better. You can listen to the episode on that page, or by subscribing to the 80,000 Hours Podcast wherever you get podcasts.

We view Tetlock’s work as so core to living well that we’ve brought him back for a second and longer appearance on the show — his first appearance was back in episode 15. Some questions this time around include:

What would it look like to live in a world where elites across the globe were better at predicting social and political trends? What are the main barriers to this happening?
What are some of the best opportunities for making forecaster training content?
What do extrapolation algorithms actually do, and given they perform so well, can we get more access to them?
Have any sectors of society or government started to embrace forecasting more in the last few years?
If you could snap your fingers and have one organisation begin regularly using proper forecasting, which would it be?
When if ever should one use explicit Bayesian reasoning?

Key points

When intelligence analysts are doing a postmortem on policy towards Iraq or Iran or any other part of the world, they can’t go back in history and rerun, they have to try to figure out what would have happened from the clues that are available. And those clues are a mixture of things, some of them are going to be more beliefs about causation, the personalities and capacities of individuals and organizations. Others are going to even be more statistical, economic time series and things like that. So it’s going to be a real challenge. I mean this is research in progress so inevitably I have to be more tentative and I’m speculating, but I’m guessing we’re looking for hybrid thinkers. We’re looking for thinkers who are comfortable with statistical reasoning, but also have a good deal of strategic savvy and also recognize that, “Oh, maybe my strategic savvy isn’t quite as savvy as I think it is, so I have to be careful.”

The peculiar thing in the real world is how comfortable we are at making pretty strong factual claims that turn out on close inspection to be counterfactual. Every time you claim you know whether someone was a good or a bad president, or whether someone made a good or bad policy decision, you’re implicitly making claims about how the world would have unfolded in an alternative universe to which you have no empirical access, you have only your imagination.

The best forecasters we find are able to distinguish between 10 and 15 degrees of uncertainty for the types of questions that IARPA is asking about in these tournaments, like whether Brexit is going to occur or if Greece is going to leave the eurozone or what Russia is going to do in the Crimea, those sorts of things. Now, that’s really interesting because a lot of people when they look at those questions say, “Well you can’t make probability judgements at all about that sort of thing because they’re unique.”
And I think that’s probably one of the most interesting results of the work over the last 10 years. I mean, you take that objection, which you hear repeatedly from extremely smart people, that these events are unique and you can’t put probabilities on them. You take that objection and you say, “Okay, let’s take all the events that the smart people say are unique and let’s put them in a set and let’s call that set ‘allegedly unique events’. Now let’s see if people can make forecasts within that set of allegedly unique events and if they can, if they can make meaningful probability judgments of these allegedly unique events, maybe the allegedly unique events aren’t so unique after all, maybe there is some recurrence component.” And that is indeed the finding, that when you take the set of allegedly unique events, hundreds of allegedly unique events, you find that the best forecasters make pretty well calibrated forecasts fairly reliably over time and don’t regress too much toward the mean.

Am I a believer in climate change or am I disbeliever, if I say, “Well, when I think about the UN intergovernmental panel on Climate Change forecasts for the year 2100, the global surface temperature forecasts, I’m 72% confident that they’re within plus or minus 0.3 degrees centigrade in their projections”? And you kind of look at me and say, “Well, it’s kind of precise and odd,” but I’ve just acknowledged I think there is a 28% chance they could be wrong. Now they could be wrong on the upside or the downside, but let’s say error bars are symmetric, so there’s a 14% chance that they could be overestimating as well as underestimating.
So I’m flirting with the idea that they might be wrong, right? So if you are living in a polarized political world in which expressions of political views are symbols of tribal identification, they’re not statements that, “Oh, this is my best faith effort to understand the world. I’ve thought about this and I’ve read these reports and I’ve looked at… I’m not a climate expert, but here’s my best guesstimate.” And if I went to all the work of doing that, and by the way, I haven’t, I don’t have the cognitive energy to do this, but if someone had gone to all the cognitive energy of reading all these reports and trying to get up to speed on it and concluded say 72%, what would the reward be? They wouldn’t really belong in any camp, would they?

I think that in a competitive nation state system where there’s no world government, that even intelligent self-aware leaders will have serious conflicts of interest and that they will know that there is no guarantee of peace and comity. But I think you’re less likely to observe gross miscalculations, either in trade negotiations or nuclear negotiations. I think you’re more likely to see an appreciation of the need to have systems that prevent accidental war and prevent and put constraints on cyber and bio warfare competition as well as nuclear.
So those would be things I think would fall out fairly naturally from intelligent leaders who want to preserve their power, and the influence of their nations, but also want to avoid cataclysms. I’m not utopian about it. I think we would still live in a very imperfect world. But if we lived in a world in which the top leadership of every country was open to consulting competently technocratically run forecasting tournaments for estimates on key issues, we would on balance, be better off.

Well, what would we know about counterfactual reasoning in the real world is that it’s very ideologically self-serving. That people pretty much invent counterfactual scenarios that are convenient and prop up their preconceptions. So for conservatives, it’s pretty much self evident that without Reagan, the Cold War would have continued and might’ve well have gotten much worse because the Soviets would’ve seen weakness and they would’ve pushed further. And for liberals it was pretty obvious that the Soviet Union was economically collapsing and that things would have happened pretty much the way they did and Reagan managed to waste hundreds of billions of dollars in unnecessary defense expenditures. So you get these polar opposite positions that people can entrench in indefinitely.

Articles, books, and other media discussed in the show

The new Calibration Training App made by Open Phil and Clearer Thinking.
Find out more about Effective Altruism Global conferences.

—

Review of the forecasting literature by Daniel Kokotajlo for AI Impacts
Superforecasting: The Art and Science of Prediction and Expert Political Judgment: How Good Is It? How Can We Know? by Tetlock
Slate Star Codex review of Superforecasting
Tetlock on Twitter

—

Einstein–Szilárd letter on Wikipedia
Genius in the Shadows: A Biography of Leo Szilard, the Man Behind the Bomb by William Lanouette and Bela Silard

—

Good Judgement Open tournament
The M4 Competition: Results, findings, conclusion and way
forward by Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos.

—

Econtalk episode with Bent Flyvbjerg on Megaprojects
In Shuffling Cards, 7 Is Winning Number by Gina Kolata, featuring the discovery of Persi Diaconis
Study of Mathematically Precocious Youth looks at the impact of intelligence on life outcomes among the most intellectually gifted
Prospect Theory: An Analysis of Decision under Risk by Daniel Kahneman and Amos Tversky
Thinking, Fast and Slow by Daniel Kahneman
Ray Dalio on Bridgewater Associates’ culture: How to build a company where the best ideas win

Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about one of the world’s most pressing problems and how you can use your career to solve it. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Today’s interview with Professor Philip Tetlock was recorded last week at Effective Altruism Global San Francisco.

We last interviewed Tetlock back in November 2017 for episode 15. That’s a great episode which I recommend going back and listening to, as it will give you more context, but doing so isn’t necessary to make sense of today’s conversation.

Philip is the Annenberg University Professor at the University of Pennsylvania, a legendary social scientist, and a personal hero of mine.

He has spent the better part of 40 years collecting forecasts about the future from tens of thousands of people, to try to figure out how accurately people can predict the future, and what sorts of thinking styles allow people to do the best job of it.

He was co-principal investigator of The Good Judgment Project, a many-year study of the feasibility of improving the accuracy of probability judgments of high-stakes, real-world events.

His research has resulted in over 200 articles in peer-reviewed journals and two books: Superforecasting: The Art and Science of Prediction, and Expert Political Judgment: How Good Is It? How Can We Know?.

Why go and interview Philip a second time?

Firstly, in 2017 I only got an hour with him, and I simply had a lot more to ask.

Second, I believe Philip’s work is the sine qua non of being rational and having good judgement, and as a result it’s relevant to everyone no matter what they’re doing.

Making accurate predictions is essential for good decision-making in your own life. For instance, if you can’t predict your probability of success in different career paths, you’re going to find it really hard to choose between them.

And that’s just a big picture example — correctly assessing the probability of things is important everywhere in life on an hourly basis, whether you’re deciding whether to call your bank to try to resolve a problem, to apologise to someone you had a fight with, or what to put in your suitcase for a trip overseas.

Third, we’re especially interested in what impact advanced AI might have on the world.

For that, it’s really useful to know what capabilities AI will have at different times. Needless to say, that’s exceedingly hard, but the lessons from Tetlock’s work give us as good a shot as possible of producing sensible estimates.

Fourth, improving judgement and foresight seem especially important for longtermists like me who want to make the world better not just today but for the thousands of generations who may be yet to come. It can be hard to figure out what things we can change about the world now that will consistently point it in the right direction over hundreds or thousands of years, but improving humanity’s capability to correctly foresee the effect of our actions seems like a great guess for something that will help.

As a result many people believe that this is among the most promising broad interventions to positively shape the long-term future.

Fifth, people who love Philip’s research also tend to love 80,000 Hours. So if you’ve just been tempted to tune in for the first time for this interview, welcome to the party — go check out our website and hopefully you’ll learn a lot of things you’ll find useful.

Sixth and finally, our biggest donor, the Open Philanthropy Project have also funded the creation of a tool that helps you better calibrate your probability estimates, and hopefully thereby make better decisions. We’ve put that on our website, and will link to it from the show notes.

Alright, just before we get to that, as I said, this interview was recorded recently at Effective Altruism Global San Francisco where Philip was a speaker.

Effective Altruism Global is the conference for people interested in using evidence and careful analysis to do as much good as possible. If you enjoy this conversation, maybe you should get yourself along to one of these events. The next big one is in London, the weekend of the 18th to 20th of October.

For antipodeans like me there’s a smaller one coming up in Sydney on the weekend of the 28th and 29th of September.

You can find out more about both of those at eaglobal.org.

Alright, that was a bit of ado there, but without any more of it, here’s Philip Tetlock.

Robert Wiblin: Thanks for returning to the podcast, Philip.

Philip Tetlock: Well, thank you.

Robert Wiblin: So we plan to talk about new results in forecasting research, but first, what are you working on right at the moment and why do you think it’s important work?

Philip Tetlock: At this very moment, I am working on what you might think of as the opposite of forecasting. I’m working on backward reasoning in time as opposed to forward reasoning in time.

Robert Wiblin: What does that look like?

Philip Tetlock: Well, it looks like what people in the research literature call counterfactuals. What would have happened if history had taken a different turn at various points.

Robert Wiblin: So this is the tournament involving Civilization V, is that right?

Philip Tetlock: Well, the reason for starting the research in simulated worlds as opposed to the real world is because historical counterfactuals in the real world are unknowable. Historical counterfactuals are a source of almost endless ideological friction and debate. When we started the forecasting tournaments, there was a huge debate, for example, about the role of the Reagan administration and its tactics for dealing with the Soviet Union and whether Reagan was either bringing us closer to a nuclear war or whether Reagan was actually bringing us closer to world peace. It was very polarizing and people disagreed completely on where things were going and after the fact, when the outcomes are known, everybody claimed to be able to explain what happened. So even though their expectations were very different, everybody wound up in a place where they felt comfortable with their preconceptions. Conservatives felt that Reagan had won the Cold War and liberals felt that the Cold War ended… would have ended pretty much the way it did without Reagan and with the two term Carter presidency and a Mondale follow up.

Robert Wiblin: So in this tournament you’re setting people up in particular situations in Civilization V, this famous computer game, and then I guess changing the scenario a little bit and then getting people to forecast what will happen and seeing how accurately they can forecast what would have happened if the starting conditions had been a little bit different?

Philip Tetlock: That’s right. You’re able to do something in the simulated world you can’t do in the real world. You can go back in time and you can say, “Well, what if something different had to happen at turn 100? How would the various aspects of the world have changed whether you changed President Reagan or you change President Trump or you change whether the magnitude of the recession in 2008, whether Bernanke was leading the Federal Reserve.” You’ve got a long list of things you can change in economics or in politics or in military affairs and people just deeply disagree about these things and they can disagree forever because nobody can go back in your time machine, rerun history and see what would have happened. So the peculiar thing in the real world is how comfortable we are making pretty strong factual claims that turn out on close inspection to be counterfactual. Every time you claim you know whether someone was a good or a bad president or whether someone made a good or bad policy decision, you’re implicitly making claims about how the world would have unfolded in an alternative universe to which you have no empirical access, you have only your imagination.

Robert Wiblin: So do we have any existing research on how good people are at kind of factual reasoning or does the fact that we can’t go down these alternative histories, basically you meant that people haven’t been able to research this?

Philip Tetlock: Well, what we know about counterfactual reasoning in the real world is that it’s very ideologically self-serving. That people pretty much invent counterfactual scenarios that are convenient and prop up their preconceptions. So for conservatives, it’s pretty much self evident that without Reagan, the Cold War would have continued and might well have gotten much worse because the Soviets would’ve seen weakness and they would’ve pushed further. And for liberals it was pretty obvious that the Soviet Union was economically collapsing and that things would have happened pretty much the way they did and Reagan managed to waste hundreds of billions of dollars in unnecessary defense expenditures. So you get these polar opposite positions that people can entrench in indefinitely.

Robert Wiblin: Yes, because they’ll never know. So it’s like you have a free hand to be particularly ideological about these cases.

Philip Tetlock: Right. It’s as if you’re doing clinical trials in medicine and you’ve got to make up the data in the control group. You never had to actually run the control group, you just say, “Let’s just make up the data and we’ll… You know what, all of our treatments are working lo and behold.”

Robert Wiblin: So to get a large enough sample to figure out how accurately people can assess counterfactual outcomes or how well they can do the comparison I guess, do you have to have hundreds of people playing Civilization V for many years and then making lots of predictions about different scenarios? What’s the scale of the enterprise here?

Philip Tetlock: Well, that would certainly be one way of doing it. I think that the research sponsor IARPA is not quite that patient, I think they would like to see a steeper rising learning curve and probably a smaller number of participants generating the learning. My hunch is that to do really well in this game is going to require a mixture of content knowledge, we need people who are really good at playing Civ5 who have a strategic savvy, but also people who are aware that strategic savvy doesn’t readily translate into forecasting skill. So we’ve certainly found that there are some people who know a lot about Civ5 who don’t necessarily make very accurate forecasts. And there are other people who don’t know that much about Civ5 but knew some pretty simple statistical rules and do reasonably well. But I don’t think we’ve reached the optimal performance frontier by any means, I think we’re lagging and if people would like to volunteer and they should feel free to contact me. If people have knowledge of Civ5 and would be willing to donate 20 to 40 hours of time playing in a pretty intense research competition in the middle of August and in return for somewhat modest compensation but not nontrivial, then they should feel free to contact me.

Robert Wiblin: Because normally people don’t get compensated at all for playing computer games, so I think that’s a step up.

Philip Tetlock: Right. Well, they’re not playing a computer game, they’re actually doing forecasting. They’re watching artificial intelligence agents that represent civilizations playing a computer game, and then the question is how skillfully can they make sense of that particular run of history?

Robert Wiblin: I see. So the humans don’t actually play the game, they just look at the scenarios and try to predict who’s going to win or what outcomes will happen? And I suppose that speeds things up a lot because the AIs can play much faster than people can.

Philip Tetlock: That’s correct. What you see is what they call a world report, you get to see how the game unfolded and the question is, how deep an understanding can you extract from seeing that history, how deep an understanding can you extract of the causal principles driving the game?

Robert Wiblin: Yeah. I was surprised when you said that there’s people who are good at playing Civ5, good at winning but not terribly good at making forecasts because it seems like in order to be good at playing the game don’t you have to kind of be good at making forecasts about what will happen in the game if you take different actions. It seems like they’re almost the same skill.

Philip Tetlock: I think that’s a great question and I was working with that more or less that assumption myself, but it seems that for the counterfactual questions that are being posed in a simulation that is as complex as Civ5 where the combinatorics are staggering and the number of possible states of civilizations and variables probably is greater than number of atoms in the universe, that even very skilled Civ5 players will have serious blind spots that can be exploited by clever question posers.

Robert Wiblin: It’s interesting because it’s possible that they have some kind of gut intuition about what action is going to be best but then between other actions that they’re not seriously considering they don’t have the trained intuition to handle those very well.

Philip Tetlock: We don’t have a detailed enough understanding yet of what exactly is going wrong. We think that the winning model is going to be human beings… in this competition by the way, machine learning is not allowed, so we’re well aware that, and IARPA is well aware that you could just put machine learning to the task and it could run the game millions of times and put AlphaZero in there and Demis Hassabis is going to be the world champion of Civ5 as well as countless other games. So we’re aware of that, that’s why the range of research we’re allowed to on Civ5 is so restricted. We’re only allowed to look at correlations essentially, we’re not allowed to do the counterfactual thought experiments ourselves. In a sense they’re putting us in the same position of ignorance as actual intelligence analysts would be when they’re trying to make sense of the world.

Philip Tetlock: When intelligence analysts are doing a postmortem on policy toward Iraq or Iran or any other part of the world, they can’t go back in history and rerun, they have to try to figure out what would have happened from the clues that are available. And those clues are a mixture of things, some of them are going to be more beliefs about causation, the personalities and capacities of individuals and organizations, others are going to even be more statistical, economic time series and things like that. So it’s going to be a real challenge. I mean this is research in progress so inevitably I have to be more tentative and I’m speculating, but I’m guessing we’re looking for hybrid thinkers. We’re looking for thinkers who are comfortable with statistical reasoning, but also have a good deal of strategic savvy and also recognize that, oh, maybe my strategic savvy isn’t quite as savvy as I think it is, so I have to be careful.

Robert Wiblin: Do you have any view at this point on how likely we are to be able to generalize from Civilization V to the real world?

Philip Tetlock: Well, that’s the really big question because I think it’s safe to assume that the US Intelligence Community is not all that curious about who’s better at forecasting in the Civ5 computer games, I think the hope is that if you can identify methods of enhancing accuracy, enhancing the performance of human teams making sense of Civ5 games, that those methods will transfer to better performance in the real world. Now how exactly you make that inferential leap, that’s a complicated question.

Robert Wiblin: Can I venture to suggest that it actually might not be too bad because although it’s kind of a simplified version of the real world… Civilization, the gameboard is a very simplified version of things that are actually going on in the real world, to some extent when we try to do forecasts we have to create this simplified schema in our own minds that might well end up kind of resembling something like the board on Civ5 and try to map out the plays that people can make. Well, we can’t model the full complexity of the world, all we can model is something that’s on the level of the simplicity of Civ5. So maybe it’s like yeah, even if the real world is different, it’s going to be not so dissimilar from how we actually try to make forecasts.

Philip Tetlock: I think that’s fair. I think Civ5 world has three basic similarities with the real world. One is the complexity of causation. You’ve got many variables influencing many other variables and you have feedback loops along variables. So you have negative and positive feedback loops and you’ve got interactive causation, complexity of causation is a big similarity. Another similarity is path dependency, that once you’ve gone down a certain path you can’t go back. So there are certain categories of effects which compound once you’ve made certain moves. Some events are irreversible in their consequences or extremely difficult to reverse. And then finally a randomness or stochasticity. The artificial intelligence agents have a certain amount of randomness built into their play and that’s probably an essential property to prevent them from being very easily exploitable, so there may be some game theory sense underlying that. At any rate, we don’t know how much randomness there is woven into the AIs because we’re not allowed to look at the programming code.

Philip Tetlock: I mean, we’re not… and so these are all-

Robert Wiblin: So even you are not allowed to-

Philip Tetlock: So just as we don’t know how much randomness there is in our world, we also don’t know how much randomness there is in Civ5. So all these things make Civ5… when you set the ground rules right, I think the research sponsors have done a good job of setting up the ground rules in a sensible way. When you set the ground rules up correctly, you do sort of put the players, the observers of Civ5 in a position of ignorance, somewhat similar to the position of ignorance that real world analysts are in. The big difference is that in Civ5 you know what the ground truth is, you know what happens in the what if worlds. We don’t get to see that until they give us the feedback and whether we’re right or wrong.

Robert Wiblin: Yeah. Do the forecasters get to see the entire gameboard or just the fraction that one player would be able to see?

Philip Tetlock: They get to see the entire gameboard.

Robert Wiblin: Okay. Interesting. Yeah, so it’s a bit like… yeah. Okay, so they’ve got access to kind of cable news, they’ve got access to satellite data, that kind of thing.

Philip Tetlock: Yes, they do and the whole thing is complete when they see it. They see the game, it’s all there to be read and you can see return zero to 500 or-

Robert Wiblin: Yeah. Let’s move on from the Civ5 competition though hopefully there’ll be some Civ5 addicts out there who can get in touch and potentially participate.

Philip Tetlock: I really hope there’s some. It’s a serious cognitive challenge. It’s something that I don’t think anybody has ever done before trying to improve counterfactual reasoning in a simulation in the hope that we can make that stick later on in the real world.

Robert Wiblin: Yeah. I think listeners here have a pretty high demand for cognition, so might be well suited to it.

Robert Wiblin: If you know Civ5 really well and are interested in spending many hours seriously testing your subjective-probability forecasting skills you can fill out a sign-up form which you can get to at 80k.link/civ. We’ll also link to it in the show notes and the associated blog post.

Robert Wiblin: When we last spoke about 18 months ago, you’re just launching this hybrid forecasting competition, which I think aimed to pair algorithms with human forecasters and and see how well they did and you were looking for people to participate. How has that one gone? Are there any kind of early achievements or findings coming out of that research?

Philip Tetlock: Oh, well that wasn’t my competition. That was an exercise that I was helping IARPA to recruit people for and I know that there are some interesting results. I’m not sure how much IARPA wants those to be talked about right now, so I should be careful. I would simply say that I think it’s been very difficult for algorithms to get a lot of traction on the kinds of questions that IARPA is interested in asking about. So if you’re talking about using… now we are talking about using machine learning and what are the types of questions in the real world where machine learning becomes useful and what are the types of questions where it flounders? And the short answer is, of course, the more data-rich the world, the more likely machine learning is to get traction.

Philip Tetlock: So if we’re trying to predict macroeconomic statistics for OECD countries, we have long time series, we can look at how the variables change over time, we can also look at how the time series are intercorrelated with each other, we can see what the lags are, we can create complicated econometric models. You can do a lot of interesting things and machine learning might be able to get some traction, reasonable traction vis-a-vis human forecasters there. But if you’re talking about trying to predict the outcome of the Syrian Civil War or indeed relations with China and the South China Sea or the state of negotiations with North Korea or the US China Trade War or the state of the eurozone and Brexit, all these things the machine learning people I think quite rightly kind of roll their eyes a bit and they say-

Robert Wiblin: “We don’t have enough training data to make sense of this.”

Philip Tetlock: We don’t have enough training data to make sense, yeah, exactly. The base rates are elusive, the covariation structures are elusive, so it’s hard for us to even get started. Now, of course, those questions are also very hard for humans. They may be impossible for machine learning and very hard for humans, but humans are able to do, I think significantly better than zero. Which suggests that the jobs of certain categories of human beings may be secure for a bit longer, it may be that if you’re a loan officer in a bank, your usefulness is highly questionable in a machine learning world, whereas-

Robert Wiblin: The CIA operatives are.

Philip Tetlock: The geopolitical analysts, they might have a longer future. Now of course, if loan officers in banks are serving another kind of function, if they’re serving a political function and it’s supposed to be sending, giving money to friends and doing this or that, then the loan officers can rest secure. But if they’re playing a pure profit maximization game or accuracy maximization game, then-

Robert Wiblin: Maybe not.

Philip Tetlock: Yeah.

Robert Wiblin: Okay. Yeah. We’ll return to the algorithms a little bit later on, I’m keen to learn more about how they’ve done in your research. Yeah, as I was prepping for this interview, I was looking back over some of your work and a point that I think had stuck with me over the years was this observation that people seem to kind of have only three probability settings, or people who haven’t been exposed to forecasting and probabilistic reasoning they kind of think things either have 0% probability, they definitely won’t happen, or they have kind of a 50% probability, they might happen, but we don’t know, or they’re 100% likely and they’re definitely going to happen. Is this a kind of a general finding that many people kind of reason in that way they flip between these three different likelihoods?

Philip Tetlock: Yeah, it’s… was a joke that I first heard from Amos Tversky in the 1980s that people could only do that, but it was a joke, it wasn’t intended to be a description of a serious research finding, but it is a stylized fact that people have a hard time making subtle distinctions in the maybe zone and they do gravitate toward yes and no and certainty. We’re ambiguity averse and we have a hard time making subtle distinctions along probability continuums. So I think that’s fair and I think that the best forecasters are able to resist that and they’re characterized by a capacity to make many more than three degrees of distinction among uncertainty. The best forecasters, in a paper by Jeffrey Friedman, Richard Zeckhauser and Barbara Mellers and others did, I was part of that team, the best forecasters we find are able to make between 10 and 15 distinguished… between 10 and 15 degrees of uncertainty for the types of questions that IARPA is asking about in these tournaments like whether Brexit is going to occur or if Greece is going to leave the eurozone or what Russia is going to do in the Crimea, those sorts of things. Now, that’s really interesting because a lot of people when they look at those questions say, “Well you can’t make probability judgements at all about that sort of thing because they’re unique.”

Philip Tetlock: And I think that’s probably one of the most interesting results of the work over the last 10 years. I mean, you take that objection, which you hear repeatedly from extremely smart people that these events are unique and you can’t put probabilities on them, you take that objection and you say, “Okay, let’s take all the events that the smart people say are unique and let’s put them in a set and let’s call that set allegedly unique events. Now let’s see if people can make forecasts within that set of allegedly unique events and if they can, if they can make meaningful probability judgments of these allegedly unique events, maybe the allegedly unique events aren’t so unique after all, maybe there is some recurrence component.” And that is indeed the finding that when you take the set of allegedly unique events, hundreds of allegedly unique events, you find that the best forecasters make pretty well calibrated forecasts fairly reliably over time and don’t regress too much toward the mean.

Robert Wiblin: When I was reading the stylized fact of like, yeah, people think… people are drawn to things if it’s 0%, 50% likely or 100% likely. I was wondering whether that tendency might be able to explain some kind of weird behavior that I observe in people. So one is that it seems quite common for people to kind of have a relatively uninformed view about something but become extremely confident about their kind of split… their quick judgments about it even though if they really sat down and thought about it and realized that there’s so much that they don’t know, which kind of seems like they can have some evidence that gets them to 80%, 90% likely or like 90% confidence and then they just kind of push it up to 100 because they can’t be bothered thinking about this anymore. And then you’ve got the… yeah, these people who are very under confident about the ability to draw distinctions between like 40% likely and 60% likely, who kind of get stuck in this maybes and they’re like, well it’s unknowable, it’s like it might happen, it might not happen, and they kind of miss out on the opportunity to draw these most of the distinctions between likelihood.

Philip Tetlock: Exactly. And I mean, take an issue that is politically polarizing in the United States, such as climate change, and forecasts of how rapidly the climate is changing as a function of greenhouse gases and perhaps other factors. Would I be considered to be a climate… am I a believer in climate change or am I disbeliever, a denialist… as it were, If I say to you, “Well, when I think about the UN intergovernmental panel on Climate Change Forecast for year 2100, the global surface temperature forecasts, I’m 70%… 72% confident that they’re within plus or minus 0.3 degrees centigrade in their projections.” And you kind of look at me and say, “Well, it’s kind of precise and odd,” but I’ve just acknowledged I think there is a 28% chance they could be wrong. Now they could be wrong on the upside or the downside, but let’s say error bars are symmetric, so there’s a 14% chance that they could be-

Robert Wiblin: Underestimating.

Philip Tetlock: Could be overestimating as well as underestimating. So I’m flirting with the idea that they might be wrong, right? So if you are living in a polarized political world in which expressions of political views are symbols of tribal identification, they’re not statements that, oh, this is my best faith effort to understand the world. I’ve thought about this and I’ve read these reports and I’ve looked at… I’m not a climate expert, but here’s my best guesstimate. And if I went through all the work of doing that, and by the way, I haven’t, I’m just…] this is a hypothetical person, I don’t have the cognitive energy to do this, but if someone had gone to all the cognitive energy of reading all these reports and trying to get up to speed on it and concluded say 72%, what would the reward be? They wouldn’t really belong in any camp, would they?

Philip Tetlock: The climate proponents would kind of roll their eyes and say, “Get on board. You’re slowing down the momentum for the cause by giving sucker and some emotional support to the denialists,” and denialists will say, “Well, you’re kind of suckered by the believers.” You’re not going to please anybody very much. You’re not going to have a community of co-believers with whom you can comfortably talk about climate change in the bar. You’re going to be weird, you’re going to be an outlier.

Robert Wiblin: Might be able to cobble together kind of four economists or something to have a beer with.

Philip Tetlock: Could be something like that, but there’s not a good intellectual home for you. And if you think that the major function of your beliefs is to help you fit into the social world, it’s not to help you make sense of the world itself, then why go to all the bother of participating in forecasting tournaments? And I think that’s one of the key reasons why forecasting tournaments are hard sell. I think people… forecasts do not just serve an accuracy function, people aren’t just interested in accuracy, they’re interested in fitting in, they want to avoid embarrassment, they don’t want their friends to call them names, I don’t want to be called a denialist or a racist or whatever other kind of thing I might be… whatever the epithet you might incur by assigning a probability on the wrong side of maybe.

Robert Wiblin: Speaking of climate change, often when I go into the media, kind of every week, it seems like there are new wild predictions being made about how bad climate change could be, which I guess sometimes sounds suspect to me, but I’m not a climate scientist and I don’t really have time in my day-to-day work to look into how scientifically grounded these forecasts are. Yeah, I guess you might have encountered these forecasts as well and those… there are also people who claim that it’s not going to be a problem at all. How do you kind of disentangle a problem like that in real life?

Philip Tetlock: Well, the first… one of the things I’ve learned to do over all this work is never pretend to be a subject matter expert in anything my people are forecasting on. So I’m not an expert in North Korea, I’m not an expert on the euro, I’m not an expert on Columbian narcoterrorism or Syrian Civil War and I’m not an expert on the climate either. Now I think there is an issue of people feeling that, especially on the climate activist side, that the only way for them to build up political momentum for getting people to make sacrifices in the long-term is to get them to believe that things are going to hell in a handbasket right now and that floods and tornadoes and hurricanes and whatnot are unprecedented. I know that there are other people who say, “Oh, that doesn’t look so unprecedented to us.”

Philip Tetlock: And that’s a debate that’s completely apart from the larger question of warming, the long-term warming trend as a function of greenhouse gases is a question of well, how rapidly would hurricanes increase, or have hurricanes increased at all over the last 150 years? I don’t know what the answer to that is, I know that there are people who disagree about it. It’s really important, I think, I’m a process guy, I don’t want to have too many opinions. It’s not useful for me to have too many opinions, who would care what my opinion on this is? Why should anyone care what my opinion is? But I do know that there are incentives for people to exaggerate and that happened over and over and that happened and you’re much more likely to exaggerate when you’re not in a forecasting tournament, you’re not playing an accuracy. When you’re playing a political power maximization media exposure game, exaggeration is the way to go. If you’re in a forecasting tournament and you play that way, you’re going to get creamed.

Robert Wiblin: Yeah. I guess I wasn’t so much asking about climate change specifically, but I suppose again, maybe I have a rule of thumb that when advocates on a topic are speaking then I’m a lot more cautious about believing anything that they say, and I suppose it’s true of climate change, it’s true of many different issues that it can be like… but that means, it means that they could be right and I was just like, it’s very hard for me to figure out, it means they could make some big mistakes potentially.

Philip Tetlock: I think exaggeration adds to the noise and I think it’s probably shortsighted for advocates, for activists to exaggerate, but I understand the temptation.

Robert Wiblin: Yeah. I think this, it seems to me that it’s kind of a growing number of issues or maybe it’s always been this way where it’s like believing that something is absolutely certain or absolutely that can’t be true is like a shibboleth for participating in a particular group and kind of anyone who expresses doubt is liable to be condemned. Anyone who’s just like, well, I’m not quite sure about something, that’s kind of not an acceptable view.

Philip Tetlock: Right. And since forecasting tournaments are almost the opposite of that, I mean forecasting tournaments, if you put a probability of one on something and it doesn’t happen, your credibility gets hit hard. Or a probability of zero and it does happen, your credibility gets hard. If you had been more moderate, you’d take a much lower hit. So forecasting tournaments incentivize people to do something that is not altogether cognitively natural, and that is put a priority on accuracy, accuracy, and only accuracy, and you really don’t care about whether you’re-

Robert Wiblin: Political loyalties or forming the right alliances.

Philip Tetlock: However the chips may fall. It’s just about accuracy. And in principle, there are government agencies that are supposed to do that, right? There’s the Congressional Budget Office which is supposed to be absolute straight shooter, nonpartisan. The intelligence community is supposed to be nonpartisan, straight shooter, just the facts man. There are lots of bodies… the courts are supposed to be… there a lot of bodies that are charged with serving a purely epistemic function and it’s extremely difficult and their credibility is often called into account

Robert Wiblin: Just carrying on with this, with this idea of people kind of rounding to 100% or 0%, I’m particularly interested in kind of risk management, like global catastrophic risks and trying to prevent them. And I guess I often encounter people who think that a risk is unlikely and then feel… and then they’re like, “So I don’t think it’s worth working on at all or at least I’m not worried about it at all”. And it seems in those cases they’re kind of rounding down from like a 1% probability to a 0% probability for whatever reason that I’m not quite sure about. And so they miss that something… if something is really significant, then even if it’s only 1% likely, it could nonetheless be something that deserves a lot of attention. And I think another thing that people miss when they do this rounding down to 0% is that they miss that something that’s like 3% likely, it’s like three times, might deserve three times as much attention as something that’s 1% likely and something that’s 1% likely might deserve 10 times as much attention as something that’s like 0.1% likely. And yeah, it’s just like not having these fine gradients of probability I think can lead to really, really big misjudgments on some of the issues that I care a lot about.

Philip Tetlock: Yeah, I think that’s right. And I think that in the original Kahneman-Tversky prospect theory paper, I think the original was 1979, I think they have an interesting line in there about how the probability weighting function in prospect theory is ill-defined at the extremes. Which means that people are going to do one of two things. They’re either going to ignore very small probabilities or they’re going to dramatically overweight them and they’re going to oscillate between those two mistakes, it’s not as though… they’re going to have a very hard time getting it right. And that when the low risk thing is very salient, you know where it’s going and when it’s not that salient, goodbye.

Robert Wiblin: Yeah. It seems like it’s one of the areas where it’s hardest for humans to act rationally and hardest to coordinate to act rationally because kind of particularly gripping things like terrorism really are salient that they get a lot of attention and there can be like other really big risks that people just don’t think about and so they get neglected a lot and it’s like things that are like 40% likely or 60% likely, people, through experience I guess, learn how often they happen, but things that are like out in the tails it’s just so hard as a society to kind of appropriately apportion out our attention to those things.

Philip Tetlock: Well, one of the great challenges here is extending assessments of accuracy to low probability events because when you’re… as the events descend into very low probabilities, you might expect them to occur only once every three or 400,000 years and that requires a lot of patience from the research sponsors. So the question is, if you don’t have accuracy criteria for some of these extreme tail risk sorts of events, what metrics do you have aside from faith or resort to the precautionary principle or something of that sort?

Robert Wiblin: Well, I mean you could just try to make, form an inside view, just try to have a good understanding of the world and try to assess the probability, which is difficult, but I think you can do better than random.

Philip Tetlock: Well, here’s one thing you can do, well, you can create categories of risks that have higher probability and you can assess those categories. You can decompose the categories and you can see how logically consistent people are with their judgments of sets and subsets. So if you think your likelihood of dying in a car accident is greater than your likelihood of dying in a car accident and all other causes, we know that there’s something wrong with your probability judgments even though we have no idea whether you had the probability right or wrong. So there are logical consistency checks on probabilities at the extreme range that can be implemented and I think it is useful to implement.

Robert Wiblin: Let’s turn to something a little bit more prosaic, which is kind of forecasting in one’s personal life or career. So because obviously anyone who kind of engages really seriously with 80,000 Hours’ advice and tries to apply it to their life at some point has to kind of make some potentially very difficult forecasts about how things might pan out for them. So you can imagine someone who’s just finishing their undergrad and they’re trying to decide whether to start a PhD, who kind of will probably want to focus on what’s the odds of me actually finishing the PhD versus dropping out? And then if I do finish the PhD, what are my odds of getting an academic position that’s actually worthwhile? What’s my probably of using it? Things like if I apply for a prestigious job in the civil service, what’s the likelihood that I’ll get it? And if I start a business and I try to make a lot of money to donate by starting a business, what’s the probability that this business will take off? And then how big will it be?

Robert Wiblin: Like all of these things are potentially very important or very decision- relevant, but quite hard to estimate. I was thinking possibly we could try to run through how someone might estimate the odds of successfully becoming an academic when they’re finishing an undergraduate degree. Just cause that’s kind of potentially the career path that you’re most familiar with as a professor yourself. Is that something that you feel game to try?

Philip Tetlock: Sure. But it’s gonna come with a big caveat. These forecasting tournaments, we have people forecasting things over which they have no control. So they’re strictly observers. That’s true for the simulation worlds. And it’s true in the real world too. I mean, if forecasters are making forecasts about the euro zone or North Korea or Subsaharan Africa and you know, bets about epidemics or financial panics or military clashes, whatever it is. They’re making forecasts about things they don’t control, that they’re in the role of dispassionate observers. When you talk about making predictions about your behavior and the behavior of people with whom you’re in frequent interaction, you know, your spouse and your coworkers and so forth. When you’re doing that, you’re no longer just a forecaster, you’re a player. So there’s a story I’m fond of which my coauthor of Superforecasting, Dan Gardner, told me about. It was… I think it was an NHL team in Canada that was…

Philip Tetlock: Ottawa Senators maybe who had fallen behind in- It was either the run up to the Stanley Cup, the championship. They’d fallen behind three to one in the best of seven series. And some reporter runs up to the coach after they just lost the last game, the most recent game and said, “Hey coach, you think you’ve got a chance?” And the coach says, he pauses and he actually thinks about it. Fatal fatal career mistake by this coach, he pauses and he thinks about it and says “Probably or probably not.”

Philip Tetlock: Coaches are not supposed to talk that way. They’re supposed to say, “Of course we’re going to win” because they need to infuse their team with enthusiasm because they’re not just making a probabilistic forecast. It serves a different function because they’re not playing an accuracy game. Now they’re playing a confidence and fusion game. They’re playing a political mobilization game or an action mobilization like the climate people too. I mean it’s mobilization, it’s not just about accuracy. So there are all these games that people play, you know, about how long is your marriage going to survive? And does your athletic team have a chance? Is your career and graduate school gonna go into the basement? And so there’s these countless questions that could either become self-negating or self-fulfilling prophecies in your life. And it’s a matter of how you as a human being and with your values, how you make decisions about what you are…

Philip Tetlock: And are not going to believe who you are or are not. And the forecasts are now existential statements about identity and who you are. I’m the kind of person who is really gritty and I’m gonna make this work against the- I’m gonna overcome the odds. I transform the odds. I’m not- I don’t or- and I’d put it a little bit differently. I’m going to make history. I’m not gonna- I’m not forecasting. I’m a maker of history. And Karl Marx had an amusing line also to that effect that he said that “The purpose, you know, of my work is not to understand the world, it’s to change it. To change it.”

Robert Wiblin: So I think it probably does pay to be a bit optimistic that the plans you’re making are going to work out. But perhaps before you decide what the plan is going to be, I guess you’re in this difficult spot where you want to do kind of dispassionate forecasting of the merits of different options before you go into them. And then once you commit to one, then you’re just like, you’re all in potentially and like, or at least you’re like somewhat overconfident about how it’s going to pan out so that- Cause that will just drive you forward and I guess convince other people to join you as well.

Philip Tetlock: I think that’s a very useful distinction. And there are some people who do work on that and they say that there’s a deliberation mindset where accuracy matters and there’s an implementation mindset when you just do it. And some organizations have a crisp distinction between deliberation and implementation. The militaries do and businesses and people for that matter. I think. So, you know, there’s a time to think there’s a time to act. Now of course the division isn’t that simple because at some point you have to review and reassess whether you made the right decision. So there’s gotta be some updating going on, unless you’re a complete fanatic. If you never returned to deliberation, you’re only in implementation mode. Henceforth, you have slipped over into the domain of fanaticism.

Robert Wiblin: So, we’ll come back to try to do the forecast in just a second. This is a little diversion, but are you familiar with that freakonomics experiment suggesting that people don’t quit as quickly as they should? That when they did a randomization experiment where they would flip a coin and encourage people who got heads to quit whatever thing they were thinking about quitting and people got tails to not. And they found that the people who got got heads and were told to quit did actually quit more often and their lives went better, or they reported that their welfare was higher three and six and maybe 12 months out. I guess it might be possible to explain this by the idea that people realize that they have to kind of overcommit or become overconfident about whatever, whatever track they’re on, which means that they are likely to kind of stick with it potentially a bit too long if it’s going badly, if things are like going below expectations that they are going to be a little bit blind to that and potentially deliberately.

Robert Wiblin: And so if they get kind of forced out of it or by something like a coin flip, then that is kind of beneficial. Although if they hadn’t, I guess if they’d never kind of been overconfident, maybe then they never would have had the chance of making it through something difficult.

Philip Tetlock: Well Steve Levitt’s a very clever guy and I think that’s a- I’ve heard about that result. I haven’t read the experiment. It’s intriguing.

Robert Wiblin: Yeah. All right, let’s get back to the ‘becoming an academic’ example. We will probably, we’ll use some probabilities here, but that’s, that’s not so much the point as to try to like demonstrate how you might go about what procedure you might use for estimating the likelihood. So someone’s finishing undergrad. Their ultimate goal is to become an academic who’s doing really valuable research and I guess there’s multiple different kind of folders that they have to get through. They’ve got to get into a good PhD program, probably at a university that has a record, ideally of kind of placing people. They’ve got to finish the PHD. Then I don’t know, probably end up getting a postdoc or probably end up getting an academic position. And then having done that, like what are the odds that they’ll have the freedom or the funding to do something that they actually think is useful for the world. Do you have any thoughts on how someone might try to put this together to try to estimate whether it’s worth setting out on this path to start with?

Philip Tetlock: Well, it’s a tough racket and it obviously matters a lot. What field you choose. So you know, your prospects are much better in computer science than they are in English or history. So do I have anything more perceptive than that to say?

Robert Wiblin: So I guess what we typically recommend that we do is start by looking at the base rate.

Philip Tetlock: And those are base rates.

Robert Wiblin: Yes. So kind of look at how many PhD graduates… We would try to find these numbers for some fields. So it’s actually surprisingly difficult to find nicely comparable data across different disciplines. But if you look at the number of PhDs that are being mentored in these different fields each year and then look at the number of the actual academic jobs like maybe in total or that are opening up, or positions that become available each year and kind of look at that ratio, very often it’s like a few percent potentially. So you can expect that only a relatively small fraction of PhD grads are, or at least getting tenure or like research-focused academic positions. So that I guess you kind of always recommend starting with kind of the outside view or base rates or at least usually doing that. Do you agree that’s probably the way to go in this case as well?

Philip Tetlock: Yes, I do. Graduate School is a risky life. Academic life can be very rewarding, but it’s a hard line of work to break into and the way the academic labor market has become stratified with adjuncts for example. I think that’s made it harder. I think there are fields in which there is robust demand still in STEM disciplines but elsewhere it’s increasingly a long shot.

Robert Wiblin: So I guess- I think it’s actually not entirely obvious that you always want to start with the base rate or at least not for such a broad reference class cause there’s this possibility that, and I think there’s probably is the case in some fields, that almost all of the probability is going to like a relatively small fraction of PhD students or PhD graduates. There’s some fields I guess like economics which seem very top-heavy or they will at least just one that I’m kind of familiar with where like if you’re not at one of the top 10 or 20 economics programs, then you’re probability of getting a research focused economics position drops pretty precipitously and so you can potentially get misled by it by starting with the base rate. That’s like looking at it from the big picture.

Robert Wiblin: I think there’s also been some calls among people I know for organizations in the effective altruism space to publish the number of people who are hired versus the number of applications they got for a job to try to help people to decide whether to go through the effort of actually applying for jobs. And I do worry there that sometimes just giving such a broad base rate could be misleading because the reality is for like you know half of the applicants their probability of getting a job is kind of close to zero. And for some other people they’re probability of getting the job might be really quite high and just giving this thing of like “Well, on average it was a 1% likelihood of getting hired” could lead people astray.

Philip Tetlock: Absolutely. Picking the right base rate is a very valuable forecasting skill and life skill and you know a lot more about yourself than the base rate. And you know a lot more about yourself even than your GRE scores or your undergrad institution. So yes, you’re going to update in response to that information. And once you’ve been in graduate school, you know how much you’ve published and most people in grad school I think have a fairly good sense for where they rank. I don’t think there are very many grad programs that actually rank-order their students. The department faculty get together and say, these are the five people we really want to push this year. But I think there probably is rough agreement often in departments about who the most likely-to-be hired people are. And there are surprises. But I’m going to think that there, maybe I’m an overconfident faculty member here, but I’m going to guess they can do very substantially better than chance.

Robert Wiblin: Yeah. I think when people try to narrow down from such a broad base rate though. they are in a little bit of a bind because you were saying that in a sense you know a lot more about yourself than the base rate does or unlike other people do. But there’s another sense in which it can be like very hard to judge your own abilities cause I just find-

Philip Tetlock: That’s also true.

Robert Wiblin: Yeah it’s like it can always be like hardest to look yourself in the mirror because well one thing is you see yourself from a different perspective. Then you see other people and there’s like all these biases that had been like- I just know so many people who seem like both extraordinarily overconfident and extraordinarily under confident about their own abilities.

Philip Tetlock: Yeah, that’s a very interesting question of how accurate our self perceptions are. A lot of the literature does indeed focus on biases like overoptimism or overconfidence and also even defensive pessimism and underestimating. So it’s a very interesting question how exactly how accurate people are. I’m going to wager and I don’t know the facts on this, but I’m going to bet that people are moderately accurate.

Robert Wiblin: Oh, on average.

Philip Tetlock: Yeah, moderately. Which means I think that it’s useful information to add in.

Robert Wiblin: Yeah, yeah.

Philip Tetlock: Let me put it this way, another way to put it. If you really are delusional, your chances of success in this domain are extremely small.

Robert Wiblin: This has been, it’s been a really difficult area to know exactly what to recommend that people do. I guess it seems like probably the best shot you might get is if you can kind of get close to a mentor or someone who’s already in the field who can get to know your abilities relatively well. And then if you can somehow get them to give you a really frank assessment of what your odds are, that might be among like the best forecast that you’re likely to get. If you do a research project with an academic as an undergrad and then they can give you some sense of whether you potentially have what it takes. I mean even there its gonna be very difficult cause you’re potentially young, maybe you’ve matured during the PhD and of course people hate to break people bad news.

Robert Wiblin: So, I think in some fields there is a bit of a habit of I think academics leading people on by saying, “Oh yeah, go do the PhD” because, well, it’s like, it’s nice to have students under you who can do some of the grunt work as PhD students. So it’s like, yeah, there’s a bit of a selfish motivation there, but also just you want to be positive to people and you want to encourage them. So you kind of have to wonder, yeah, you always have to kind of judge are people telling it to you straight.

Philip Tetlock: What an interesting idea of… I mean faculty do owe that to students to give them candid feedback. But on the other hand, they don’t want to demoralize people and graduate students are somewhat easy to demoralize.

Robert Wiblin: Yeah. It’s hard on your mental health because the feedback is often so weak.

Philip Tetlock: It is. So it’s a very difficult problem. But establish a social contract with a faculty member whose judgment you really trust. Or even better yet, two faculty members whose judgment you really trust and say to them, “Look, I know there’s a non-zero chance that… I’m going to work- I’m going to do my best. I’m really, I’m gonna do my best on this project. And if you conclude at the end of this that you know your best- You’re pessimistic about your prognosis is somewhat grim. I really would appreciate it. I’d be grateful for the rest of my life if you just give me that honest feedback.” That’s an interesting- I’ve never- No one’s ever approached me like that.

Philip Tetlock: But it’s an interesting thought experiment. You know, it reminds me of an old joke about Henry Kissinger and Alexander Haig who was kind of an underperforming underling at one point and under Kissinger, and he would deliver a report to Kissinger and the next day he’d come back and he’d say, “What did you think Mr Secretary?” And Kissinger would look at it at his desk and he’d throw it back at him, “Do it again”. He’d come back the next time they’d do the cycle three more times. “Do it again. Do it again.” Finally, he’d come in and would say, “I’ve done my absolute best I can’t do it anymore.” He’d say, “Okay, I might look at it.”

Robert Wiblin: Yeah, I hadn’t heard that one.

Philip Tetlock: That’s an interesting approach to teaching.

Robert Wiblin: Yeah. It’s a bit disappointing if people aren’t even coming to you for an honest forecast. I mean, at least they might at least try there. I guess another approach that people can take is rather than look at the very broad base rate… Or someone’s kind of qualitative judgment based on knowing them, try to get quantitative data, which in some fields is more available than others. But you can kind of look at your GRE scores, look at your grade point average, look at your SATs. And then maybe look at what’s the typical entry GRE for people entering this field. What is the typical GRE for people who eventually got jobs in that field, if you can potentially find data on that. I guess that has the disadvantage that sometimes those quantitative measures can be a little bit crude and can throw out important information.

Robert Wiblin: But on the other hand it means that it’s a little bit harder to kind of delude yourself based on feeling that you’re special. It has a little bit more firmness to it and also you might possibly be able to find some extra data on what were the scores that academics actually got.

Philip Tetlock: Yes, that’s a very interesting… I don’t know what exactly the data is on this. But most people who make the cut into elite graduate programs have pretty high test scores and they have intelligence test scores that are comparably high because the two really are very closely related. And so let’s say the average IQ of, you know, students at an elite school and an elite program is 125 or 130, how much of an advantage would it be if yours was 150 160 or you had 800 800 800 GRE’s. I don’t know how that translates onto the scale, but I’m not even sure how common 800 800 800 are now, and it used to be quite uncommon, but they may be more common now.

Philip Tetlock: How much of a performance boost do you get? How much of a difference is there in the career effectiveness of lawyers or doctors or professors whose IQ is 130 verse 160 and how much is really driven by character as opposed to intelligence at a certain point. I’ve heard expert psychometricians argue this out and there’s one school that says there actually is a difference between 130 and 160. And there’s another one that said it’s almost totally driven by character. I don’t know who’s right, but it is an interesting debate.

Robert Wiblin: Yeah. I’ve seen one paper on this looking at research output and correlating that with IQ, which suggested that IQ did predict research output and discoveries. I can dig that up and try to find it. I haven’t scrutinized it terribly much.

Philip Tetlock: There’s a professor in the UK, you may know Stuart Richie. Yeah, I think he might- He probably knows. He probably knows the answer on this.

Robert Wiblin: Yeah. I’ve never gotten an IQ test because I feel like either I would end up being really smug or I’d end up feeling disappointed in myself. And then I’m also not- I feel like I’d end up disappointed really badly, but I’m not sure that I would have learned anything really useful about myself either that I didn’t already know. It’s like, I already kind of know what I’m accomplishing and what I’m not.

Philip Tetlock: I can assure you people my age should not be taking tests like that because there’s a fairly well known pattern that fluid intelligence peaks around 25 or 30. And now crystallization can continue to increase up to- even up to my age. Until memory loss starts to take its inevitable toll. But I think it’s a losing game for people over 50.

Robert Wiblin: Yeah, I guess.

Philip Tetlock: And I’m considerably over 50.

Robert Wiblin: I’m over the hump too, so it’s too late for me to do an IQ test now. Another case that comes up a lot is people trying to predict the likelihood of their businesses succeeding. We’re in San Francisco, lots of people are doing startups and are trying to figure out if their business ideas are any good. Do you have any experience or have you seen any research on whether people can predict that and maybe how they can do a better job of it?

Philip Tetlock: It’s the same base rate problem because most startups fail. And it depends on how exactly you define the base rate. I mean what population of small businesses are able to attract VC funding in Silicon Valley. If they’re able to pass that initial screen, they’re much better than some person who just decides to set up a restaurant randomly in a neighborhood. But even then, even after they pass VC screening, I think the base rates are pretty low. I bet- I suspect the VC’s like to keep this data pretty confidential. But I would be surprised if there are any shops that were able to achieve a one-in-three hit rate.

Robert Wiblin: Another practical approach that one might take with businesses and academic careers, inasmuch as you somewhat despair of like figuring out whether your odds are like 3% or 6% because it’s just too hard, is to find something that has kind of a fat-tail distribution of outcomes. Something where like if it goes well, it will go really well and you’ll have an enormous impact. And then find one that’s plausible or find one that’s appealing out of that and then pursue it until you get evidence that in fact it’s not panning out, that you’re not going to end up out in the tail. And then try to find another thing that has a fat-tail distribution and then give that one a crack.

Philip Tetlock: Right. And that’s why many- A number of VC’s have quite low thresholds for funding and in the hope of getting the next Facebook or Google, they can afford many, many dozens, many hundreds, many thousands of misses and still do magnificently. Now, obviously they still have to balance false positives and false negatives and they’re still aspiring to accuracy. But when you think that there is a fat-tail of extremely lucrative possibilities out there, you really don’t want to miss them and you’re willing to tolerate a lot of false positives in the question.

Robert Wiblin: I guess it’s a little bit more challenging for an individual, a person who’s just finished their undergraduate degree, cause maybe they’re thinking, “Oh, do I want to go into government or do I want to become an academic or maybe I want to go into business or start a nonprofit.” It’s like maybe they only get kind of three of those before they’re starting to hit their mid thirties and perhaps they’re not willing to be as adventurous as they used to be and if all three of them don’t pan out then, then I guess they have to have a backup option that seems a little bit safer that they can potentially move into. Unless they’re very adventurous.

Philip Tetlock: Well, there’s the human life cycle and there’s the desire to reproduce and all sorts of things that kick in and obligations to other people. People find themselves locked into things they didn’t expect to be locked into.

Robert Wiblin: Yeah. But I guess if you choose three kind of options with a lot of upside then I guess you’re giving yourself a decent chance. As long as you have like something to move back into later then probably have a pretty good life.

Philip Tetlock: Yeah. You know, it’s an interesting sequencing strategy. I have to confess, you know, I guess I wasn’t as creative in my life. I found academia a pretty comfortable place very early on and I didn’t really start to make serious contact with the real world until I was in middle age. The world started to pay some degree of attention to what I was doing. But otherwise I would’ve just been completely in academia.

Robert Wiblin: Yeah, pushing onto a slightly different topic related to people improving their ability to predict what’s gonna happen in their life and making good decisions. To go along with releasing this episode, we’re actually going to put up on our site, this calibration training tool that was developed or funded by the Open Philanthropy Project, which people can get at 80,000 hours.org/calibration_training.

Robert Wiblin: And just remind everyone calibration is the ability to tell when something feels like it’s 90% likely it does actually happen nine times out of 10 and when something feels like it’s 20% likely it does happen two times out of 10 so it’s kind of one of the two measures of a good forecasting ability. The other one being able to get away from the 50-50 probability towards like actually making strong claims about things that definitely will and definitely won’t happen.

Philip Tetlock: A very important component.

Robert Wiblin: Yeah. A very important component.

Philip Tetlock: That second component is not to be understated. I would say resolution is every bit as important as calibration. Some people might say more.

Robert Wiblin: Yeah, so have you seen this tool or kind of any other tools like it?

Philip Tetlock: I’m familiar with it and I think Michael Mauboussin has created something somewhat similar. And yeah, Good Judgment, the private sector spin-off from the Good Judgment Project, Good Judgment Inc, I think has probabilistic reasoning training that includes that as well.

Robert Wiblin: Yeah, I think they spent quite a few years developing this one. I’m not sure how it compares to the others, but it’s got like different kinds of training. You’ve got like confidence intervals. I think PolitiFact questions for people who are more political. You can try to guess city populations, answers to math problems and like the confidence on them and I guess like various kind of correlations.

Philip Tetlock: Oh wonderful. So you can assess transferability across domains.

Robert Wiblin: Yeah.

Philip Tetlock: That’s a really big thing because transfer has always been one of the most difficult challenges for psychologists designing training to overcome. Transfer statistics have tended to be disappointing.

Robert Wiblin: Yeah. How valuable do you think doing this is? I mean it seems like it’s probably worth doing it for a couple of hours, but it’s possible that because like the transfer between doing things like PolitiFact questions or yeah, math problems or like correlations between different kinds of social statistics might be weak with like the other kinds of questions that you try to assess in life. Like “What are my chances of becoming an academic?” that maybe you hit declining returns after a couple of hours.

Philip Tetlock: It could be turned into a useful research instrument, potentially. I would be curious to know, for example, whether people who have been randomly assigned to do this and have actually done it can generate more accurate conditional forecasts and good judgment open or insights like that.

Robert Wiblin: Has this ever been experimented with? Giving people calibration training and then seeing whether they do better in these tournaments?

Philip Tetlock: A little bit. The probabilistic reasoning training that we developed in the original tournaments was able to deliver performance boosts in forecasting between six and 12% over each of the four years. We called it Champs Know, the training module. And there was a brief calibration exercise, but nothing as extensive as you’re describing. And you know, we described what calibration was, we emphasized its importance. We gave them some examples of how people can be miscalibrated and some practice questions. But we didn’t do it exhaustively and we were addressing a lot of other points also like base rates and belief updating and information search, how to be a creative information seeker. So there were a lot of things that were in there.

Philip Tetlock: In a tournament. The priority is not on doing precise experimentation its on winning. So you take everything you think might work and you’ve kind of put it all together and it’s like throwing the kitchen sink at it, right? You’re- So tournaments really require careful followup experimentation where you try to triangulate in on exactly what worked because the investigators are typically doing a lot of things to win the race. But the short answer is yeah, I think people should look at it and maybe there’s some interesting collaborative potential there.

Robert Wiblin: Yeah, I’ve used it for a bit and I felt like I was getting more calibrated over time, though I was like reasonably good to start with, but that might’ve just been luck. I actually do it in my day-to-day life, like assign probabilities to things just every hour, like every half hour because it’s just how I think about acting in life. And it’s possible that has helped to calibrate me over the years.

Philip Tetlock: You’re becoming a bookie. Essentially so is there… Here’s another question though, are you measuring resolution? Are you giving feedback on resolution as well? If you’re only giving feedback on calibration, is there a danger that…

Robert Wiblin: We’ll start like just pinning to 50-50?

Philip Tetlock: Well not that would be too extreme, but there may be some implicit base rates lurking in there.

Robert Wiblin: Yeah, that’s interesting. Yeah. I’ll maybe bring that up with the people who made it before we launch it and see what they have to say about that.

Philip Tetlock: Yeah. I think giving the feedback on both calibration and resolution is a good idea because they are- In real life when you look at them, they’re correlated with each other. People who are well calibrated also tend to get good resolution scores and that’s not too surprising. Almost has to be. It doesn’t have to be, but it almost has to be, but there is a degree of tension. There are different styles and there are some people- there are some managers I think that really value decisiveness in their employees and do look for extreme answers and really value that. And in the more nuance 30 40 50 50 those people in the middle, will get less recognition. It’s an interesting problem.

Philip Tetlock: I mean, one of the things we looked at in our early work on Expert Political Judgment was whether the well-calibrated forecasters were just being cowards, right? So it rains 60% of the time in Seattle. So you always predict 60%, so you get a perfect calibration score. Whereas what you really want are people who say it’s a 95% chance of rain when it rains and a 5% chance when it doesn’t rain. And that gets you a great resolution score as well as being well-calibrated because you’re right. But when you say 95% and it doesn’t happen, you take a big hit so there is a trade off in people’s minds. I think if you tell people you’re judging them on both properties, it’s going to force them to be more mentally agile and they’ll be making more trade-offs in their head. They’ll say, oh well, I don’t want to be overconfident. On the other hand, I don’t want to be a chicken.

Philip Tetlock: So one of the critiques of my early work was when fox and more ‘foxy’ forecasters are better than the ‘hedgehogy’ forecasters. Oh well, you know the foxes are just chickens.

Robert Wiblin: Was that the case?

Philip Tetlock: No, but we had to address it statistically though.

Robert Wiblin: Okay, that’s interesting. So yeah, so they weren’t just being cowards. They were, as you were saying that like calibration and discrimination tend to be together.

Philip Tetlock: Well they were more moderate but they also did better on resolution so they didn’t buy calibration at a cost of degrading resolution below that of the hedgehog’s.

Robert Wiblin: So some friends of mine have been trying to produce kind of other useful, ideally useful kind of training content for a broad audience to help improve their reasonableness and their forecasting ability, I think actually you might’ve met one of the one or two of them here at EA Global today. Do you have any ideas for what the best opportunities are for producing training content that people haven’t already seen, which actually might allow people to become more accurate at forecasting within a reasonable timeframe?

Philip Tetlock: Well, we’re hoping that the work we’re doing now will focus on helping people become more rigorous lesson extractors from history and will translate into improvements that haven’t been observed before. That’s a promissory note though. That’s not something that we can say we’ve demonstrated. I guess you’re asking me beyond the training protocol we developed in the ace tournament known as Champs Know, do we have anything new to report on the training side that works and that delivers systematic, replicable improvements?

Philip Tetlock: And I think we have some hints that something works, but I don’t think it’s replicable yet. So I’m going to… Since we’re in the age of replication, we’ll just say maybe. Stay tuned. But it’s not easy. It’s not easy to do this. There’s a ‘curse the darkness’ phase of my career and a ‘light the candles’ phase of my career. And the ‘curse the darkness’ phase of documenting the biases that other people blazed the trail on was much easier than lighting candles. Improving judgment, I call it Meliorism, you know, a commitment to making things better is, is really hard work and frustrating. The failure rate of studies has been somewhat discouragingly high.

Robert Wiblin: So like in business school people often do kind of case studies. Can you imagine incorporating a prediction element where people find out things about in business situations in the past and then try to figure out whether they succeeded or failed? Could you imagine that kind of helping with people’s ability to make good business decisions?

Philip Tetlock: Yes. I think that what we’re doing now actually with Civ5 could easily be adapted to many business simulations. So the kind of training we’re doing, the kind of learning that people are engaging in is very similar. So you get one of these Harvard business cases, you know, if the CEO had done this rather than that, what would have happened. Now if it’s a, depending on the kind of simulation it is, if it’s a simulation of Intel or it’s a simulation of an actual company, the answer’s unknowable. We don’t actually know what would’ve happened. Although we often have some reasonable hints because of the, the market is a corrective force. At any rate it has promise.

Robert Wiblin: Yeah. A friend of mine, Danny Hernandez made this interesting point that we’d like to be able to figure out who super forecasters are really easily and quickly and cheaply because then we could give more weight to their judgment. But as it is using normal tournaments, it takes quite a while, both for them and for time to pass in the world before you can figure out whether someone is a super forecaster. Do you imagine that just say, measuring someone’s performance on a calibration test could give some indication of whether they might be a super forecaster or not?

Philip Tetlock: It might. Yes. Although I’d want resolution as well. And the idea of being able to screen people much more rapidly than waiting for two years of tournaments and seeing who regresses toward the mean. I understand the appeal of that. That would be a faster way of identifying talent and it may be feasible. I think it’s certainly worth trying.

Robert Wiblin: Yeah. Maybe someone out in the audience can try to adapt one of these tools for that purpose.

Philip Tetlock: I think it’s very reasonable… Businesses are somewhat constrained by their human resources departments and their legal departments and the kinds of studies they’re allowed to perform on their employees and the kinds of criteria they’re allowed to use as screening employees. So they have to validate tests that are used for employment and things like that. So it’s a nontrivial matter for a business to adopt. You know, I say kind of glibly when I talk about how the earlier forecasting tournaments, where one. step one is get the right people on the bus and that sounds easy but it’s not. It took a long time to figure out who the right people were, and an organization or a business thinking of doing this would probably be well advised to talk to its legal department first.

Robert Wiblin: I guess, yeah, maybe that’s something that individuals out there can have a go at. The tool that I’ve mentioned, the calibration tool from the Open Philanthropy Project is written in this pretty easy programming language called GuidedTrack that was actually developed by Spencer Greenberg who’s also been a guest on the show. So it would be relatively easy, potentially, to take some of that and modify it for like for related purposes. Like yeah, measuring people’s performance.

Philip Tetlock: That’s very interesting. Okay. I would like to see that.

Robert Wiblin: So let’s dive into some more technical questions about the forecasting research that you’ve done over the years. I asked online what questions people had, difficult questions that people have come up with when they’ve been reading your book or your papers, that they are curious to get answers to. And actually, the philosopher Daniel Kokotajlo, I hope I’m pronouncing that correctly, wrote up this really beautiful summary of the practical findings from your work for an organization called AI Impacts. Which is trying to forecast progress in AI and try to figure out how that might affect society. Actually, I think if someone was only going to spend 20 minutes reading about forecasting and your work, that’s probably the link that I’d give them at this point.

Robert Wiblin: So I’ll definitely stick up a link to that piece in the blog post with the show notes.

Philip Tetlock: I’m curious already.

Robert Wiblin: Yeah. I’ll send it to you as well. I asked him for a couple questions, and so a few of the gems in here are because of Daniel, so thanks to him. So he noted that there used to be this page on the Good Judgment Project’s website, that used to break down various different ways to get better forecasts and suggested that you got a 40% boost from talent-spotting forecasters, as you just mentioned, and a further 10% boost from giving them training tools, 10% from putting them on teams and getting them to talk to one another. And then maybe a 25% boost in accuracy from using algorithms to process and then aggregate their various different predictions. Does that ring true to you still today?

Philip Tetlock: These I guess I would have to characterize these as stylized facts, that the baseline here is the unweighted average of the regular forecasters. It’s true that once you’ve identified the super forecasters, and you put them into teams, they have a big advantage over regular teams and individuals working alone. That is true and is in the vicinity of 40%, yes. The training number of 10% is approximately right, the teaming number of 10% is approximately right. The algorithm number really conflates a couple of things. I mean, the algorithm number could be larger or smaller depending on how you calculate it. Since the aggregation algorithms are piggybacking on the most recent forecast of the best forecasters, that means they’re drawing on super forecasters.

Philip Tetlock: So the question is, how much better can the aggregation algorithms do if you just put the super forecasters out of the equation? And I think that number is about right, 25%.

Robert Wiblin: Another one is, in Superforecasting, there’s this quote, “The strongest predictor of rising into the ranks of super forecasters is perpetual beta, the degree to which one is committed to belief-updating and self improvement, perpetual beta is roughly three times as powerful a predictor as its closest rival, raw intelligence”. How did you measure it or define perpetual beta? I think Daniel couldn’t find that anywhere in the book.

Philip Tetlock: It’s a very good question. And the self-report measure of perpetual beta does not do that work for us. What does the work for us is a measure of the frequency with which people engage in belief-updating, low magnitude frequent belief-updating is a powerful driver.

Robert Wiblin: So one of the key measures of this will just be how frequent people update their estimates?

Philip Tetlock: I said three times more than one?

Robert Wiblin: Intelligence, which was I think, the second most important.

Philip Tetlock: Yes, fluid intelligence and well, crystallized and fluid both played a role, but fluid intelligence was the most consistent predictor. But when you’re dealing with the population… these forecasters are all pretty smart. So there is some restriction of range. So the comparison to intelligence is not entirely fair.

Robert Wiblin: Do you think that if you just drew people randomly from the population that intelligence might seem like a more important factor?

Philip Tetlock: Yes, I’m pretty sure that’s true. In the same way that if you randomly admitted people into the Harvard Department of Economics, GRE’s have become a much better predictor of who does well then. And then GRE’s are now, GRE’s probably predicts almost nothing in performance in graduate school at Harvard.

Robert Wiblin: Something that’s a particular of interest to me is, in a lot of experiments you’ve run, the extrapolation algorithms, just various different kind of brute-force forecasting methods or mechanistic forecasting methods, seem to be quite a lot different than human predictors. But there seems to be surprisingly little detail about the nature of these algorithms in the books, maybe there are in papers. But it seems like algorithms are a lot easier to manage than people and a lot cheaper to run potentially than super forecasters who you would have to pay salaries to.

Robert Wiblin: So maybe we should put just a bit less effort in trying to identify super forecasters and just put more effort into training people on how to do these extrapolation algorithms? Extrapolation algorithms are better than most people that maybe that we should be focusing our attention on, making it possible for ordinary people just in life to use these extrapolation algorithms that are actually performing pretty well.

Philip Tetlock: Like predict no change or predict the most recent rate of change?

Robert Wiblin: Yeah. Well, I guess, I’m curious to know, what were the algorithms that-

Philip Tetlock: Well, those-

Robert Wiblin: Those are the ones that did work well?

Philip Tetlock: Yeah, they worked pretty well.

Robert Wiblin: Interesting. Just predicted no change?

Philip Tetlock: Especially for the shorter term forecast, because change is less likely in the short term. I think one finding when you cross both books. One finding is that the people somewhat exaggerate change in the short term, but they understate in the longer term. But that’s not entirely true even there. I think they’re exaggerating change, even in the five year range. Okay, I’m thinking out loud. I should be careful what I say. Okay, this isn’t in a journal, we’re-

Robert Wiblin: It’s a conversation.

Philip Tetlock: But it’s a good question. It doesn’t take a lot of training to do that. I mean, you don’t need to train people do it all, you just have the algorithm do it. So I don’t see a need for training there.

Robert Wiblin: Yeah, okay. So these weren’t quite complicated forecasting algorithms that would require an expert statistician or something to put them together very often, they were just brute-force, like simple rules.

Philip Tetlock: Oh, my definition of an algorithm is anything that can go on automatic pilot, so it doesn’t need any human intervention. I think of heuristics as more as something that requires human judgment.

Robert Wiblin: Could you imagine producing forecasting rules of … Rules just predict no change over this timescale and for longer time scales, and just look at the long term trend and forecast that forward that would allow people to mechanistically just become better forecasters without having to go through all this effort of becoming more ‘foxy’?

Philip Tetlock: There’s a guy, Spyros Makridakis, who does statistical forecasting tournaments in which people are trying to predict, all the competitors are algorithms. And there are 10s of thousands of time series from politics and economics and business and so forth, all sorts of time series. And the question is, which ones perform better across very, very disparate data sets. And he also has machine learning in the competition sometimes, too. And he finds that a fairly simple, damped smoothing exponential time series works pretty well across an enormous range of time series.

Philip Tetlock: I honestly don’t know. I mean, time series data often have a lot of bumps, ups and downs. So smoothing simply means you’re smoothing out the big bumps. It’s like what they show you on with these Wall Street summaries, what’s the hundred 180 day moving average sorts of numbers. Yeah. Would that be a good idea? I think for many situations, it would help people do pretty well with it, bring them up to superforecasting levels of performance. Sometimes it would mean, it’s not as though the superforecasters themselves don’t use algorithms. I mean, they do. A lot of them are very statistically savvy, they know more statistics than I do.

Robert Wiblin: So to some extent they’re like a superset of some of these methods.

Philip Tetlock: Yeah, a subset of them run their own Monte Carlos.

Robert Wiblin: You’ve got an excellent friend there.

Philip Tetlock: I mean, the superforecasters are already hybrids in and of themselves, it would be a mistake to think of them all as human. Because they are using statistical aids of various sorts. So it’s a very exciting question. I don’t know the answer. My guess is, it would help whether it would bring them all the way up to super performance, I suspect not, but it’s an empirical issue.

Robert Wiblin: Yeah, I did a course in time series forecasting in my undergraduate economics degree, which I think is relatively unusual. And I think it’s perhaps an underestimated thing to study in terms of changing how you think about the world and changing how you think about making predictions about the future. Just realizing that these mechanistic autoregressive moving average models are actually can actually be extremely good predictors of the future and will often spit out answers that don’t seem entirely intuitive to you, but kick your ass. It’s quite interesting.

Philip Tetlock: Yeah. And you don’t even need the full ARIMA, the full scale thing. A very simple crude time series extrapolation ala Spyros Makridakis can perform pretty well. He’s very concerned about data over-fitting. And when you’re dealing with 10s of thousands of data sets, the advantages of the smoothing approaches become apparent, because the bumps average out.

Robert Wiblin: Yeah, maybe we can try to get a link to that anything … Was it Spyros you said?

Philip Tetlock: M-A-K-R-I-D-A-K-I-S. Yeah, he ran a competition, I think it’s in the … Oh, gosh, International Journal of Forecasting, there was a special issue devoted to his M4 competition.

Robert Wiblin: Wow, that sounds super interesting. I’ll try to check up a link for that, and maybe get him on the show at some point. What finding in your research do you think is most likely to be wrong?

Philip Tetlock: What’s most likely to fall apart? Well, I think when you hit historical discontinuity, I’m not sure there are superforecasters. So you’re saying what we most, most need them, they disappear on us. That’s not very useful professor. So I mean, historical discontinuities are so hard when there’s another … I’m going to sound like I’m a Marxist because I keep going to quote Karl Marx twice in this interview. But apparently, Karl Marx also said that, I’m not a Marxist by the way, “With the train of history, it’s a curve the intellectuals fall off.”

Robert Wiblin: Yeah. And probably said as anyone else as well.

Philip Tetlock: Well, it’s ironic given how often the Marxists have fallen in the 20th century, but it’s an all the more apropos remark. There’s a lot of truth to that, predicting change is hard and predicting dramatic change is really, really, really hard. So I think I would be doing a disservice to the world, if I implied, “Oh, all you need to do is have the superforecasters stand vigilant and they’ll be able to sound the alarm on everything.” They too will get things wrong. But you can fine tune these things. And you can say to them, “If there are some categories of risk that you’re really, really concerned about missing, you can do what the VCs do.” And you can lower your threshold, and you could say, “Look, we’re going to highlight certain things, even though they’re very low probabilities.”

Philip Tetlock: The way that the current forecasting tournaments work, forecasters are incentivized on picking up on events that have between five and 95%. There’s not much gain anymore from getting really, really refined, distinguishing between one in 100,000 and 1 in a billion. But yet, there’s a huge magnitude of difference there. And if the consequences are huge, it’s super, super huge.

Philip Tetlock: And then the question is, what techniques can we use? And that is the great challenge right now. We work on it with these Bayesian question clusters, we work on it with consistency and coherence checks. But we don’t have a solution to it. I think we all have to be aware that we live in a world that is subject to potentially radical volatility. I mean, you just look at the 20th century from decade to decade, and you’ll see that who predicted World War One in 1910 was virtually nobody. So certainly nobody was remotely close to how intense it would be.

Robert Wiblin: And if someone did, I would probably guess that gotten lucky to us. And that’s a tricky-

Philip Tetlock: Yeah. Well, if you have enough people making enough predictions, there will be a few. But with that, of course, A) virtually nobody was anticipating it. I mean, the idea of a great power war to some degree, but a war of that degree of lethality, really not. And I mean, they knew it’s an unstable situation, but they didn’t expect that level, they expected it to be over pretty fast.

Philip Tetlock: And then out of that, goes the Soviet Union, Nazi Germany and World War Two. And so it puts you on a path. Now, that doesn’t mean that there wouldn’t have been some communist regimes. It doesn’t mean that might not have been a rise of fascism in Germany at some point. And almost certainly nuclear weapons would have been discovered, regardless of whether there were those wars. But the timing and the context would have been probably very different.

Robert Wiblin: I was having dinner with Nick Beckstead last night. I’m not sure whether you know Nick? But he had this question for you, which hopefully, I can accurately represent. Because I think he thinks that you are relatively pessimistic, that even superforecasters are going to be able to do better than other people or do better than random chance once we’re talking about very long term forecasts over like decades, or I guess, possibly even centuries. And he thought that might be a mistake, because it seems like superforecasters using whatever styles of thinking they have, they’re not just good in one domain, where they have particular expertise, they seem to be just better at making forecasts almost across the board everywhere that we’ve checked. And so maybe even though we don’t yet have the data to show that they are able to forecast things more accurately over a very long time spans, that would be a pretty reasonable expectation to bring to the table. What do you think of that?

Philip Tetlock: Interesting. So it is true when you look at the tournaments within which the best forecasters excel, the questions are just incredibly heterogeneous. So they have to be quick studies to do this well. You’re moving from Arctic sea ice mass, to Colombian narco terrorism to Spanish German bond yield spreads, to Syrian civil war, to South China Sea Island building, to North Korean missile testing, it’s just everything under the map. And it literally it’s all over the map, is geographically and functionally extremely diverse. So you say, what fraction of people can have expertise in more than about one or two of these topics? And the answer is, nobody.

Philip Tetlock: They have to be pretty fast generalists. And I guess that’s the basis for the notion that, if they can display that degree of cognitive dexterity in the IARPA tournaments the original IARPA tournaments, where the questions were extremely heterogeneous and they weren’t well defined base rates for many of the questions, and you had to improvise a lot. If they could do that there, why couldn’t they do it going further into time? And I think it’s a matter of how quickly you think chance compounds over time. And I don’t have the exact answer to this.

Philip Tetlock: The magician statistician, Persi Diaconis at Stanford, once asked the question: “How many times do you have to shuffle a deck of cards before all information is lost?” So you have a deck of cards, open up a new deck of cards, all the cards are perfectly ordered from deuces up through aces. And they’re all in exact order, same order. And how many times you have to shuffle, I mean do a proper shuffle? I guess there’s a definition of what a proper shuffle is, and how many proper shuffles do you have to do before all order is lost? I think the answer is five or six (ed: It’s 7). Okay, now the necessary changes being made, ask the same question about history.

Philip Tetlock: I mean, there are things that are happening that are random and how much randomness, and you’re not getting full card shuffles every day or every month or every year. There are substantial pockets of stability in history. But how fast is the randomness compounding, so the optimal forecasting frontier is going to be very, very close to chance when you reach a certain point. And looking back on 20th century history, my guesstimate is people … There are certain categories of things, I think farsighted people were anticipating. But it took a long time even with physicists. New nuclear weapons really wasn’t on the radar screen until about 1930 or so. I think Einstein thought it was a nonstarter initially and then he changed his mind when Fermi got the reactor going.

Robert Wiblin: There was someone who I think anticipated it several years ahead and move to America and tried to sound the alarm about this risk. I think in the very early days of World War Two, or maybe even before that-

Philip Tetlock: Oh, yeah. It was Fermi and others were doing that.

Robert Wiblin: It was someone else I think who was … I can’t remember his name, and maybe I’ll chime in and say what it is. But there’s actually a book about this person’s efforts to try to prevent Germany from getting nuclear weapons first.

Philip Tetlock: So there were pockets of farsightedness. Although there you’re talking about a time frame of five years, the letter to Roosevelt that Einstein was coaxed into writing.

Robert Wiblin: I think it was someone who persuaded them to write that letter. Yeah, yeah. I can’t remember the name.

Robert Wiblin: The name that was escaping my memory here was Leó Szilárd, who persuaded Einstein to write a letter to Franklin Roosevelt about the possibility of nuclear weapons in August 1939, one month before Germany invaded Poland. It’s quite a remarkable story. We’ll put up a link to the Wikipedia article about it, and you can find out more in the biography, ‘Genius in the Shadows: the Man Behind the Bomb’.

Philip Tetlock: But that’s not really far, we are talking about centuries here. We’re talking about, well, some of the basic technologies in place now, the theory is there, the technology’s there and there’s a potential threat, so then it gets galvanized. But that bomb probably would have been developed anyway in a competitive nation states system, you could expect something like that to happen. But it might not have happened for another 20 or 30 years and similarly we might not have had a man on the moon until later. Or conceivably earlier, if Germany had taken a more peaceful course, Wernher von Braun had been sending them rockets.

Robert Wiblin: The Manhattan Project was a colossal effort. I think I remember reading that 1 million people were indirectly involved in the Manhattan Project, even though almost 99.9% of them didn’t realize what they were doing. But-

Philip Tetlock: It was the thing that only a power like the United States was capable of doing in World War Two, the other powers were exhausted and stretched and the United States had this incredible surplus capacity for waging war on two fronts. So it was remarkable asymmetry of power. Yeah, I guess he’s right. I mean, if you want to call me a pessimist, because I wouldn’t think they’re going to do a very good job a century out- a generation out. Now, when they get to five to 10 years, maybe there’s going to be some advantage, but it’s going to be increasingly small.

Robert Wiblin: Yeah, I guess it just depends on the nature of the question. Because if you’re saying, yeah, who’s going to be prime minister of the UK in 50 years time? I mean no superforecasters are going to get that, everyone is just back to chance. It’s just like guessing names at that point. But something like, which party will be in power, maybe you can get a little bit of resolution there.

Robert Wiblin: So for example, if you’re trying to forecast progress in artificial intelligence, like forecasting at what point do you get transformative change? Like at what point will the algorithms reach the point at which you can get transformative change, is very, very hard. But trying to potentially forecast just the amount of computational ability that we will have or how fast will computer chips be? Seems like potentially we can have something to say about that even looking 50 or 100 years out. Just because we have enough of a historical record and adjusting to the trends there. So it gets gets much harder, no doubt. But I think, superforecasters might be able to do better than just chimps throwing at dartboards.

Philip Tetlock: For some things, yes. I don’t know, is Moore’s Law still alive and kicking?

Robert Wiblin: It changed. But the thing is, I think it would not have been unreasonable to … I think actually people did predict because there was engineering and practical reasons why they people expected it to slow down, and indeed, it did. So there we did have some knowledge and expertise can help you to forecast. Sometimes trends hold, and that might be worth having a crack at.

Philip Tetlock: And maybe I’m naive, but I think when astronomers and astrophysicists tell me that the sun is going to go supernova in three or 4 billion years, I think they’re probably right. It’s going to come close to the Earth’s orbit, it’s going to destroy all life on the planet.

Robert Wiblin: Somethings are kind of mechanistic.

Philip Tetlock: And yeah, there are some categories of things, right? There are timescales and there are levels of determinism and certain operating laws where you have enough confidence that we think we can extrapolate out. I mean, where’s climate on that continuum?

Robert Wiblin: Somewhere in between. It’s safe to say they’re not very useful. Yeah. You have this chapter. Maybe it’s a multiple chapters in Expert Political Judgment, which I just really love talking about how chaotic the world is and how the things that seem inevitable, are inevitable. I guess, this is a question that I had a lot when I was a teenager, it’s like is the world following mechanistic laws that can cause things to go the way they do no matter what I do? Or is it the case that the world’s incredibly chaotic and one person can make a difference?

Robert Wiblin: And I basically think just having thought about it a lot, I feel like you agree, that one person can change the course of history if they can get access to power or knowledge potentially, just because there’s so much randomness in what decisions people make. And yet it all spirals out. Do you agree? And can the decisions of one person making a concerted effort change what happens in the future? Or are they just trapped by structural forces that prevent them from ever really materially changing things in the long term?

Philip Tetlock: I think that most people, most of the time, are very much constrained by systemic forces. That we’re group living creatures and we do what we do because we like to get along with people around us. I think there are exceptional situations in which individuals have made a big difference. I mean, the favorite example, if we are to think of one person in 20th century who seemed to do things that were really out of the ordinary? Who would it be?

Robert Wiblin: I guess the H word? Is it? You think of Hitler?

Philip Tetlock: Yeah, I think the H word.

Robert Wiblin: I guess like Stalin and Mao as well.

Philip Tetlock: Right. Yeah. But to a lesser degree. I mean, I can certainly imagine that China, if it had a pragmatic leadership, like Deng Xiaoping from 1949 onwards would have a per capita income close to Taiwan and would be by far the largest economy, all other things being equal. Of course that assumes they’re not going to get involved in the Korean War, and they’re not going to be embargoed by the US and they’re doing a lot of things like that. I mean, it depends on how the Chinese civil war worked itself out. But China went into some unnecessary convulsions that seemed to be linked to the whims of one person, Mao Zedong. Yeah, the Great Leap Forward and the Cultural Revolution, which were very, very costly in lives and prosperity and growth.

Robert Wiblin: Okay, yeah. I expected that you might be a little bit more enthusiastic about the idea that the decisions that maybe even don’t seem so consequential over time, and maybe where it’s not possible to forecast exactly what effect they’re going to have. Like the action of just a minister in the UK, could end up having unforeseen and really quite important impacts in the long term like-

Philip Tetlock: Butterflies effects.

Robert Wiblin: Yeah, butterfly effects, basically. Because it seemed like that, as I recall that chapter in the book seemed to say, “No, the world is really quite chaotic. But butterfly effects are quite real and it’s something of an illusion that history is just deterministic and is predestined to go in a particular direction.”

Philip Tetlock: Well, you’re talking about Expert Political Judgment or Superforecasting, but I think we’re qualified in that. I don’t think we know how common butterfly effects are, it certainly conceptually very easy. I mean, the logical systems theory case for butterfly effects is overwhelming.

Robert Wiblin: I think you can also come up with case studies that seem kind of compelling, where you just look at one person, I guess I speak to people in government and they’re like, “This happened because this person did this thing and there’s no way it would have happened otherwise. It happens all the time.”

Philip Tetlock: So take the Danny Kahneman’s thought experiment of Hitler, Stalin and Mao if they had been born as girls rather than boys. Then 20th century history would have unfolded radically differently. Because everybody, when people think of names in the 20th century and say, “Well, who made a difference?” then they say, “Well, the H word, the S word, the M word.” Hitler, Stalin and Mao. They-

Robert Wiblin: Movers and shakers of their time.

Philip Tetlock: Right. They killed a lot of people and they were transformative in their very distinctive ways. And if they had been born girls, there’s not a snowball’s chance in hell that they would have ever risen to the leader of the Nazi Party, or the Soviet Communist Party or the Chinese Communist Party, because sex role norms in the early 19th century were such that just wasn’t going to happen.

Philip Tetlock: Now, there’s an interesting thought experiment for a very key reason. That world, before the fertilization event of the eggs was point 0.125 probable, one eight, right? The world in which they’re all male, is also 0.125 probable. So the counterfactual world of all female is equally probable to the actual world that did materialize. So you have a very nice case where the possible worlds ex ante are equally probable. And that seems to many people to say, “Oh, yeah. There really is something very fluky about the world we’re inhabiting.”

Philip Tetlock: Now of course there are people who say, they’re the very strong form of the actor dispense ability thesis say, “No, Hitler. Well, you could have another extreme right wing Germany, maybe an even smarter and very clever, he could do a much better job than Hitler, and Stalin, well, they were a natural sequel to Leninism and Mao, well, that’s just a natural sequence of Stalinism.

Robert Wiblin: Yeah. I guess with Hitler, I find that story not very plausible at all, that without Hitler it was inevitable that you’d have right wing extremism in Germany.

Philip Tetlock: Even after losing World War One and the Versailles Treaty?

Robert Wiblin: Yeah. To me, I guess I’m just engaging in counterfactual reasoning.

Philip Tetlock: And the Great Depression?

Robert Wiblin: Yeah. Other countries didn’t … Not every country had a right wing resurgence like this. The UK didn’t. I guess I don’t really believe that it was a fundamentally culturally different drive. I mean, I agree there was a certain probability, but I think it’s no way near certain that you get something like that. I feel with Stalin I was going to say, there’s a bit more of a stronger case that there were strong systemic reasons that were flowing out of Leninism, that the most brutal people would rise to the top of the Communist Party. And if it wasn’t Stalin, it might have been some other brutal person who was part of that whole revolution. But I guess, yeah. Now I’m really stuck in the weeds of counterfactual reasoning and making claims.

Philip Tetlock: Trotsky wasn’t exactly a nice guy.

Robert Wiblin: Yeah. There were a few nicer ones. But I feel like they systematically got etched out. So it’s a bit more deterministic, at least in my mind, this is the story that I want to tell. Yeah. Okay. I guess I’ll have to go back and read that chapter and see how qualified the claim about the butterfly effect was.

Philip Tetlock: I think we saw it as grounds for extreme epistemic pessimism about forecasting. I think we laid out five reasons why we thought forecasting would be extremely, why some people thought the whole idea of even looking for any forecasting accuracy at all was a bit of a fool’s errand. And we would obviously we don’t go that far. If we think there’s some room for improvement, and it requires concerted effort.

Robert Wiblin: Yeah. Do you watch the show, Rick and Morty? No, you haven’t seen it. I think you might like it. It’s a cartoon where it’s very clever, and they go off into like different multiverses, many different, it indulges the many-worlds hypothesis that all these things are possible and you can imagine that you can travel within them. Seems like I’ve convinced you to watch it.

Philip Tetlock: It sounds like the world I’m in with Focus and Civ5. Lots of possible worlds roles there.

Robert Wiblin: Speaking of fiction. Have you ever read the Foundation Series by Asimov?

Philip Tetlock: No, I haven’t. I’ve had it quoted at me many times before. And I’ve seen quotes from it.

Robert Wiblin: I guess. Yeah, that book originally starts out with this idea that someone’s invented a field of knowledge that allows them to accurately predict things over hundreds of thousands of years, very specific things. Which is probably not possible, even with very advanced technology, it probably is just impractical. Unless you managed to set up society in some very mechanistic and deterministic way that it’s very different from the present. But then over time, I think Asimov realized by third or fourth book, how ridiculous this premise was, and so he started throwing things in the wrench, that the predictions really start going wrong, because it’s like sui generis, very strange events. I guess you’ve probably heard this thing that Asimov ceased to believe in deterministic roles and things go off the rails in the series.

Philip Tetlock: I guess I’m waiting for the HBO series.

Robert Wiblin: Actually, I think they could pull that off. So Superforecasting, I guess you probably wrote about five years ago now. How would a new edition of Superforecasting differ if you’re writing it today?

Philip Tetlock: Oh, tough, tough question. I have to write it before I could answer that question. It’s like asking people to predict inventions, right? If I could predict whether something’s going to be invented, why wouldn’t I just invent it?

Robert Wiblin: Yeah, so people in the audience wanted to know, they’d heard … Last episode, we talked about extremizing a bunch, how if 10 people will predict that something is 80% likely, then maybe in aggregate, you should actually think it’s 90% likely, because they’re all hedging a little bit too much. Because they only have access to part of the knowledge that the group has.

Robert Wiblin: Some people had heard that maybe this is a variance increasing strategy that had helped the Good Judgment Project win some tournaments. But actually, what it was doing was increasing your probability of coming first and increasing your probability of coming last just by making stronger bolder claims, being willing to bet all of your chips on claims. Is there anything to that in follow up research that may be extremizing doesn’t look as good as it did a couple years ago?

Philip Tetlock: It’s an interesting question about whether you’d want to extremize in a world that was more volatile than the 2011 to 2015 timeframe. And there clearly are worlds in which extremizing will cause you to crash and burn. I think that’s fair.

Robert Wiblin: But it’s just like everything changes all at once.

Philip Tetlock: Yes.

Robert Wiblin: Some related reason.

Philip Tetlock: Yes. So whether or not you’re going to use extremizing I think hinges on the questions that you’re answering, and the expected value of positive and negative of correct and incorrect answers is. The answer is, it really it depends enormously on the context and a calculus. And the calculus we’re using inside the tournament was a pure Brier score calculus contoured around the events that IARPA was asking about within a particular timeframe. And if you had a different accuracy function, different utility function and a world with different predictive properties, you would want to adjust.

Philip Tetlock: That’s a boring answer, but I guess, I’m acknowledging that I was always nervous about extremizing. I mean, when I would compare my judgments, my intuitive judgments to the extremizing algorithm, I would say, “Oh my God. Is it going that far?”

Robert Wiblin: It’s incredible.

Philip Tetlock: But it worked remarkably well. And, bear in mind, I think the best extremizing algorithms are ones that are sensitive to the cognitive diversity of the crowd. So when people who disagree or suddenly start agreeing ’cause you’re increasing your confidence in extremizing, it’s not just because you’re taking the most recent forecasts or the best forecasters. You’re-

Robert Wiblin: They’re normally negatively correlated suddenly start becoming correlated on the one question.

Philip Tetlock: Yeah, yeah. And that does tell you something that you didn’t know before. And that gives you reason to be more extreme. But again, it hinges on whether if you have a deep aversion to a false negative, for example, or a false positive, you’re going to need to adjust your accuracy function.

Robert Wiblin: Yeah, one of your recent papers is … I think coming out this year is, “Are markets more accurate than polls”, the surprising informational value of just asking where there you suggest that perhaps we don’t need prediction markets, where we can just survey people on their probability estimates of different things. But why do you think it is that the motivation or the rewards of prediction markets aren’t as necessary as perhaps some listeners think they are?

Philip Tetlock: I think prediction markets are very valuable. I mean, you really … tournaments are really only beating prediction markets under certain conditions. And they’re beating prediction markets when you use the very best extremizing algorithms and you have a lot of data to fine-tune the algorithm. And the prediction markets themselves are not deep and liquid..

Philip Tetlock: So if you have a strong forecasting tournament and a prediction market that is a little bit anemic, I think the prediction market will indeed lose. The tough question is whether really strong forecasting tournaments can outperform deep liquid markets, and that’s something that still remains to be determined. But when you think about it, prediction markets in the sense are doing almost automatically a lot of the things that the aggregation algorithms are doing. Their upweighting the most recent forecasts of the most highly capitalized confident vetters.

Robert Wiblin: Have any sectors of society or governments started to embrace forecasting more in the last couple of years? And maybe a follow-up question would be if you could snap your fingers and just have one organization begin regularly using proper forecasting methods, which one might it be?

Philip Tetlock: Well, forecasting tournaments are a difficult sell because they challenge stale status hierarchies. Imagine that instead of being an academic, I started in the intelligence community as an analyst 40 years ago and I was a China specialist. And then I worked my way up the ranks and finally I made it out on the National Intelligence Council, became a big shot and maybe I get to participate in an occasional presidential daily briefing or maybe be close to the President when he’s meeting with Xi Jinping. After 40 years, that’s a lifetime of accomplishment, working and doing these reports and playing by the rules, playing by the epistemic ground rules of the intelligence community of drafting concise, compelling narratives that are sent up to the policy community people say, “Yeah, I think I learned something I didn’t know before… and promote that guy.”

Robert Wiblin: Yeah.

Philip Tetlock: And then someone comes along from the R&D branch of the Intelligence Community and says, “Hey, you know what? We want to run these forecasting tournament, and we want to see whether 25-year-olds can do a better job than 65-year-olds, at predicting the course of events in China. You know, what’s going on with the Chinese economy, Chinese domestic political system, how are they going to respond to the trade war? What are they going to do with Hong Kong? What do you get to do with Taiwan? What are you going to do with this island? What are they going to do with North Korea?”

Philip Tetlock: All those things and it turns out that the 25-year-olds are outperforming the 65-year-olds like me. It doesn’t take a lot of imagination to suppose that the more senior people are going to be decidedly unenthusiastic about this innovation.

Robert Wiblin: Yeah.

Philip Tetlock: This looks like a superficial idea that a bunch of nerdy computer programmer, neopositivist types might come up with. But this is not something serious people should entertain. So we talked about Tom Friedman and the superforecasting book as a prototype of a style of journalism that seems from a forecasting point of view, to put it charitably superficial and to put it more harshly, reckless. But at any rate, it’d be very difficult to figure out what he’s saying.

Philip Tetlock: So I mean, you’re not going to expect him to be very favorable, you’re not gonna expect most op-ed people to be very favorable, because this is perceived as an attack on how they do things. They’re very intelligent people there, they’re very verbal. They’re very knowledgeable. They have a career, they have a lot of reputational capital in the approach that they’ve taken. And the notion that these forecasting tournaments could even partly supplant or complement the expertise they have is dissonant. The notion that they would completely displace them is anathema. But even partial displacement, I think produces resistance.

Philip Tetlock: So, I think that’s why forecasting tournaments are hard sell, even though the intelligence community I think is advancing forecasting tournaments more than any other research sponsor to date. I don’t think they fully embrace them by any means. I think their status is very precarious. I think it’s quite possible that the intelligence community prediction market and forecasting tournaments could even disappear. It hinges on top management and funding and all sorts of things. So I don’t think it’s strongly incentivized, I don’t think it’s transformative yet. Do I think it will be? Yes, I do. I think the long term, in the long term we’ll all be dead.

Philip Tetlock: But it may be beyond my life expectancy, but I do see a trajectory toward more thoughtful efforts to understand uncertainty, to integrate more statistical and more narrative based modes of understanding the world. I think that’s going to continue, it’s going to be halting. It’s hard for AI to break into this area for the reasons we discussed earlier, the elusive base rates and for machine learning it’s hard for it to get traction on these kinds of questions. But I do believe that insofar as I believe there’s an arc of history, I do believe, I guess I’m enough of a Pinker optimist; I think there is a growth of knowledge, a growth of enlightenment and that we will become more circumspect about what our cognitive limitations are. We will become more aware of when we’re fooling ourselves and fooling other people, we will become more demagogue sensitive.

Philip Tetlock: I realize making those long-term forecasts in the current environment makes me sound almost delusional. But it’s how I see things. And I would think by 2030 or 2040, the forecast will age pretty well, even though it looks pretty stupid right now.

Robert Wiblin: Well, yeah, maybe dear listener, you can go into the US intelligence services and preserve the budgets of forecasting and reasonableness and quantitative accuracy. You had this idea of trying to make quantitative predictions out of the claims that people like Thomas Friedman make in op-ed pages in order to hold them to account for what they say. Has anyone embrace that idea yet?

Philip Tetlock: Oh, we’ve tried that and there are some people for whom it’s extremely difficult. I wouldn’t say he’s all that much harder than many other pundits though. Well, when someone says there’s a distinct possibility something’s going to occur, even something really dramatic like there’s a distinct possibility of a nuclear war on the Korean peninsula, or between India and Pakistan or there’s a distinct possibility of a euro meltdown. When people hear those words, it sounds as though they’ve learned something important, distinct possibility takes on a tinge. Oh, something dramatic, distinct possibilities, my goodness, I’m learning something I didn’t previously know.

Philip Tetlock: But distinct possibility on close inspection when you ask people what it means, and the meanings it can be given after the fact. It means anything well between 20 and 80%. It straddles both sides of maybe. And it’s very difficult to figure out where there’s someone who said distinct possibility something happening is well calibrated, or terribly calibrated. But what is clear, though, is that we resonate to the language, and we don’t resonate to the numbers in the same way. Persuading people that there are important categories of decisions where they should be more willing to go with quant estimates is going to be a big uphill struggle. I think it’s also true in medicine and other fields as well by the way, it’s not just in politics: try getting a probability out of your doctor.

Robert Wiblin: Yeah, I know. It’s incredible that they’re unwilling to do that even though it’s so important, I assume it’s for legal reasons or maybe they just don’t feel qualified even themselves to give estimates like that. But yeah, and lawyers as well, they refuse to give you any probability judgment

Philip Tetlock: . Well, yeah, and they obviously have base rates they could be using.

Robert Wiblin: Yeah, totally, a great source of frustration to me.

Philip Tetlock: But they think that you, the consumer are not sophisticated. If they do know what the answer is, they would argue that you would probably misuse it. And it’s quite possible they don’t know the answer.

Robert Wiblin: Yeah. A concept that I heard recently if I can verge perhaps on being slightly offensive for a second, is this idea of verbal land. I think sometimes you find people who are very good at speaking, very good at using language. But they don’t have any quantitative training really, and sometimes they say things that I imagine if they start, if they went into a quantitative mindset for a minute, and started trying to write this out in a more mathematical way, they’d see that it’s gibberish or inconsistent or just can’t possibly be justified.

Robert Wiblin: But people like that who are very good speakers, very good communicators, but don’t have the necessary quantitative training in at least some discipline that provide some rigor to their internal reasoning. I think that can be incredibly dangerous. Yeah, people need to get out of verbal land some time, and actually use the numbers, throw some numbers in there.

Philip Tetlock: Amen.

Robert Wiblin: Speaking of which, on a previous episode, Spencer Greenberg described how to do Bayesian updating on the fly using odds ratios and Bayes factors in response to evidence that you observe. When, if ever, do you think it’s sensible to use this kind of explicit Bayesian reasoning perhaps when you’re like say making adjustments away from the outside view and base rates and reference classes?

Philip Tetlock: That’s exactly where?

Robert Wiblin: Okay, yeah. Okay, cool. Do you think superforecasters do that? Do you do that?

Philip Tetlock: Yes, I think they do. And we encourage them to do it in Champs Know. I mean, it’s part of Champs Know. We introduced the concept of a likelihood ratio and we say, this is-

Robert Wiblin: Yeah, that’s how to update?

Philip Tetlock: Well, it’s not that the numbers are sacred and the ratio likelihood ratio, because it’s all guesstimates. But you’re making your guesstimates precise and it makes it easier to change them, because you know you shouldn’t be anchoring on them. That’s actually one of the other things, the peril of using numbers is not appreciating how noisy the numbers are. And then I guess that’s maybe part of the rationale why the doctors and so forth, don’t want you to have the numbers because they think you’re going to anchor too much on them. There’s probably some truth to that, and also they think that the doctor says there’s an 80% chance you’re going to take that as a certainty that you’re safe, right?

Robert Wiblin: Yeah, as we were discussing earlier. I guess we find-

Philip Tetlock: Then you’ll be very annoyed when you’re not.

Robert Wiblin: Yeah an economist I really like, Bent Flyvbjerg, I’m not sure quite how to pronounce his name, has done some really great work on the intense and consistent overoptimism in planning deadlines and cost forecast for megaprojects such as massive bridges and huge buildings-

Philip Tetlock: I think it’s lovely work, yes.

Robert Wiblin: For example his UK team that looks into this estimated that successful delivery was likely for only 20% of the UK government’s major infrastructure projects. Have you considered using forecasting teams to call bullshit on some of the megaproject overoptimism that you get from, I guess, contractors and government?

Philip Tetlock: I think he has.

Robert Wiblin: He’s taken that up.

Philip Tetlock: I know he’s been in a conversation with my coauthor, Dan Gardner. I don’t know if they’re going to do anything together. But it’s a domain right for application. Yes.

Robert Wiblin: I guess, in a broader sense, I see your work as like aiming to make people more rational, more reasonable, sometimes more moderate. I guess in particular it’d help big institutions that have a lot of power to do good or do harm, to make wiser decisions as a collective group, and avoid the traps that you get there. Is there any other research or training that shares this goal, which you think is kind of interesting, or promising that it might be worth drawing people’s attention to?

Philip Tetlock: Well, in some ways, the huge flourishing of error and bias research and judgment decision-making stimulated by Kahneman and Tversky is core to a lot of this. I remember that the beginning of “Thinking Fast and Slow”, Danny Kahneman says his goal is to improve the quality of watercooler conversation. I thought it was cute, but it was very serious. He wants to enrich the vocabulary we use for judging judgment. And in that sense, my research is in very much the same spirit. We want to enrich the vocabulary for judging judgment. And we want to make people more resistant to what I see as the most dangerous of all the heuristics that Kahneman and Tversky studied, which is the attribution-substitution heuristic, we nicknamed it the bait and switch heuristic.

Philip Tetlock: But it’s what you were pointing to earlier, it’s that you confront a really difficult question about climate change or a really difficult question about the economy or the Persian Gulf or God knows what. You can find a really difficult question and someone comes along and seems to have a lot of status and well dressed-

Robert Wiblin: Smooth talker.

Philip Tetlock: -smooth talker, well connected, well networked. And gives you a story, and you slip into answering another question. You’re no longer trying to answer the question at hand about China or the economy or climate. You’re trying to ask, does this person look like the sort of person who would know the answer to this question? And the answer there is a resounding, yes. This person clearly looks like it, and you run with it. You then act as though the answer to the easy question is also the answer to the hard question. You conflate the things in your mind.

Philip Tetlock: I think it’s that conflation we’re trying to fight against. I think that conflation is an enormously powerful force. It’s not something that you can eliminate by simply talking about it here and say, “Oh, watch out for that” because you’ll fall for it 10 seconds later, it’s very, very-

Robert Wiblin: It’s how we’re designed to some extent.

Philip Tetlock: We are wired up to look for credibility cues and status cues, and we’re deferential towards status, and we’re deferential toward the correlated markers of status. And that’s just the way we are. And forecasting tournaments that challenge the status hierarchy, you should expect will have a hard time.

Robert Wiblin: Out of curiosity. Are there any majors like I’m imagining perhaps, say engineering, that are associated with being a better forecaster? Is that something that you’ve looked into?

Philip Tetlock: So far, yes. To some degree, we’ve noticed correlations and it helps to have backgrounds in economics and finance and engineering, computer programming. These are positively related to very … I think those types of people are overrepresented in the superforecasting pool. I think there are people from all lines of work. I’m surprised that there aren’t more political scientists involved or more social scientists. I wish there were. I’m surprised that it is lopsidedly male, as it is. I wish there were more females. I wish there were a lot of people, a wider range of people were involved in this.

Philip Tetlock: Here’s a rule of thumb, forecasting tournaments appeal more to people who are out of power, who are lower status. If you have high status, if you’re already have it made, forecasting tournaments look risky and suspect and destabilizing. So it tends to appeal to younger, upwardly mobile, ambitious, more male than female, though not necessarily. I mean, there’s some females and people in STEM, STEM disciplines. It appeals to them-

Robert Wiblin: Simply because they’re not in government as often.

Philip Tetlock: Well, there are people in certain branches of the government, the Intelligence Community, DoD, in Treasury, in the Federal Reserve. There are many parts and the CBO, people who are very open to this. They’re not the majority, but the government’s a big place.

Robert Wiblin: So we’ve got this, what we call a problem profile on improving decision-making in big institutions as a career track. And one weakness of the article at present is that we don’t have that many success stories other than your work and your various disciples and other research associates. Do you know of any other success stories of people who’ve helped to make government or big organizations more reasonable that we could highlight as models of a career paths that other people might be able to mimic?

Philip Tetlock: I think these things hinge a lot on who is in charge at any given moment. I find it hard to imagine President Trump being all that open to changing his mind in response to forecasting tournaments. How likely would President Obama be to have done that? My guess is a bit more likely. Yeah, maybe quite a bit more likely. Do we think that someone is less of a leader because that leader is willing to change his or her mind in response to that?

Robert Wiblin: It’s weakness!

Philip Tetlock: We didn’t elect the forecasting tournament, we elected you. We expect you to make the decisions. We don’t expect you to delegate them to some deus ex machina here. So I think progress is halting. And it’s very specific to the particular executives in charge.

Robert Wiblin: I guess it’s like just the technocratic moves in general, the cost benefit analysis people, like law and economics people, are maybe examples of people who’ve tried to make bureaucracies more sensible?

Philip Tetlock: Yes, yes. I mean, I think that there are organizations like the Gates Foundation that try very explicitly, I think there are parts of the intelligence community to try very explicitly to base their recommendations on probability estimates. Yes, I think there are lots of organizations in Wall Street that try to do that. I think some central banks try to do that.

Robert Wiblin: Yeah, interesting. Okay. Bridgewater is this hedge fund that has this unusual culture that seems to, in some ways, echo directness and clarity of forecasting. Do you have a view on them? Are you aware of what their culture is like?

Philip Tetlock: Yeah, I am. In fact, a graduate student who joined Wharton a year or two ago, came from Bridgewater and gave us an interesting tutorial on the conversational norms there.

Robert Wiblin: It’s quite extreme, you can look it up.

Philip Tetlock: Ray Dalio’s explained it in his book, and it is quite well known. And many of the things he does, I think, are essentially good ideas. They take a real toll on human beings. They require a degree of rigor and transparency and candor that people find depressive. I gather the turnover rate is quite high there.

Robert Wiblin: At least among new people, definitely.

Philip Tetlock: Yeah, yeah. Right. Right. But once you’ve got a self-selected group, it seems to work pretty well. You could say, “Well, that’s true of superforecasters too, isn’t it? You get a lot of attrition there as well.” And yes, the answer is, it’s not everybody’s cup of tea. So yeah, I think the convergence is real. I think the idea of being very explicit about what the goal of the conversation is at the outset, making clear that there are lots of agendas people have when they have conversations, right? It’s not just getting to the truth, it’s looking good and avoiding not getting embarrassed and-

Robert Wiblin: Or getting people aside, which is a necessary thing sometimes.

Philip Tetlock: Right. And nowadays, with instant outrage culture, maybe people have to be very careful about what they say, you get a word out of place, and somebody’s going to get upset. So there’s a great deal of risk aversion and social posturing that gets in the way of truth seeking. The thing that’s so unusual about forecasting tournaments is that they ask people to play, and the thing that’s so unusual of Bridgewater is that they ask people to play a pure accuracy game. They only care about getting to the truth faster.

Robert Wiblin: And they seem to have made a lot of money out of it.

Philip Tetlock: And they seem to have been pretty darn successful at it. Yeah.

Robert Wiblin: Yeah, I guess it hadn’t occurred to me actually, until this conversation that we might be able to imagine Bridgewater as a kind of laboratory of … I mean, it’s actually very shocking to even me, and it seems very extreme the method that they use internally to try to increase accuracy. And obviously, they’re not going to transfer easily across to a lot of other organizations. But there might be things in that general direction or lessons that they learned that might be more transferable and you could view that as a research lab almost in its own way.

Philip Tetlock: Yeah. Very much so. And I think one of the things I really like about their model for superforecasting is, you want to reward people, not just for being good forecasters, but for helping other people become good forecasters. So you gain social capital, you gain reputation, not only by having the best Brier score, you get it by helping other people. So that’s very valuable and it can set up a virtuous learning cycle.

Robert Wiblin: Are there any important ways that institutional or collective rationality differs from individual or small group rationality in your view? Are there any implications that has for how we ought to try to improve decision-making in big organizations?

Philip Tetlock: I think that the best individual forecasters do often seem to talk to themselves as if they have more than one mind. As if there are … What Walt Whitman famously wrote, “I am large, I contain multitudes.” I’m not sure they contain multitudes. But they have interesting internal conversations with themselves.

Robert Wiblin: I guess possibly organizations can spread that across people, rather than-

Philip Tetlock: So you could say, the term I used in Expert Political Judgment which I borrowed from Harold Bloom, who is a Shakespeare scholar, and he felt that one of the things that made Shakespeare so special, he was a great lover of Shakespeare and was a professor at Yale for many years. And he said good judgment in Shakespeare requires mastering the art of self overhearing. That is, the characters who can listen to themselves, talk to themselves.

Philip Tetlock: That sounds a bit convoluted. But you listen to yourself, talk to yourself, and you decide whether you like what you’re hearing. Do I sound like I’m self-justifying to it? Or do I sound like a thoughtful person who’s actually trying to grapple with the truth? And I think you want to incentivize people to move in that direction. You’ll be a better contributor to the conversation, if you master the art of self overhearing. You’ll also be better at quantifying uncertainty in these exercises.

Robert Wiblin: What do you think it would look like to live in a world where elites around the world are better at predicting social and political trends? How confident are you that this world would be safer, especially from war and catastrophes?

Philip Tetlock: I think that in a competitive nation state system where there’s no world government, that even intelligent self-aware leaders will have serious conflicts of interest and that there’s no guarantee of peace and comity. But I think you’re less likely to observe gross miscalculations, either in trade negotiations or nuclear negotiations. I think you’re more likely to see an appreciation of the need to have systems that prevent accidental war and prevent, and put constraints on cyber and bio warfare competition as well as nuclear.

Philip Tetlock: So those would be things I think would fall out fairly naturally from intelligent leaders who want to preserve their power, and the influence of their nations, but also want to avoid cataclysms. So in that sense, yes. I think there … I’m not utopian about it. I think we would still live in a very imperfect world. But if we lived in a world in which the top leadership of every country was open to consulting competently technocratically run forecasting tournaments for estimates on key issues, we would, on balance, be better off.

Robert Wiblin: Yeah. One person was trying to convince me that this wasn’t the case recently. So one thing is that, inasmuch as you feel like you have a better grasp of what’s going to happen, then possibly you become a bit more reckless or become more willing to ride the line because the future seems less unpredictable and risky. But I’m not sure whether that actually … Inasmuch as they have an accurate perception of the risks that they’re creating. Maybe that isn’t so bad.

Philip Tetlock: Well, yeah, it’s an interesting question. I mean, so let’s say for sake of argument that a leader was more … The Trumpian theory of trade that the United States is better off doing bilateral deals than multilateral deals, because the United States has more bargaining power that way. I think that’s maybe one of his theories. I can’t read his mind. But I’m guessing and that it’s relatively easy for United States to beat up Canada or Mexico.

Robert Wiblin: Take them one by one.

Philip Tetlock: Harder on China, but he’s going to try. Easier on Japan and South Korea, because of their dependency on the US. I mean, there are categories of things that you could do if you had a purely transactional theory of the world and you had accurate probability estimates of the world, how far could you push the advantage of your country? And in the case of the United States you might be able to push it further in the short term, but you might trigger long-term resentment that would be to your detriment in the long term. That’s, I think, what people are worried about in … I think that’s the more thoughtful worry about Trump.

Robert Wiblin: I think part of what’s going wrong there is that he thinks of reducing your own tariffs as a bad thing, whereas I think of it as just an extra benefit, it’s good when they reduce their tariffs, and then it’s even better when you reduce your own tariffs, because you’re just trying to work in your own harbor.

Philip Tetlock: Right, but he looks at it more as a power game than a mutual benefit economic game. So there is a realpolitik view of trade which is quite different from the neoclassical economic trade.

Robert Wiblin: Yeah, as I learn every day. If you’re going to switch your focus, your primary focus away from research and towards getting institutions to leverage the things you’ve already figured out with a goal of making the world a better place. What do you think that that might look like? And I guess, imagining Pepsi or a slightly different person that’s the listener out there who’s thinking of taking on this challenge themselves?

Philip Tetlock: I’m not sure I understand you. So you say, “Stop being a professor and become something else. Stop doing research and being a professor.”

Robert Wiblin: Stop trying to learn new lessons and just get people to adopt the lessons that you’ve already learned.

Philip Tetlock: Okay.

Robert Wiblin: Things like that.

Philip Tetlock: So I’m no longer the knowledge production business, I’m a retailer.

Robert Wiblin: Exactly. Or someone else’s.

Philip Tetlock: I mean, I went a little bit into the retail business with Superforecasting. I’m not the world’s greatest retailer, but-

Robert Wiblin: It was pretty good, pretty successful.

Philip Tetlock: Okay, well that’s Dan Gardner. But that’s a great question because I am reaching the age where that’s a very reasonable question to ask me. Because at a certain age, I think you lose your research edge and you should be, if you have something to say that you think would be useful to the world, you should be doing retail.

Robert Wiblin: Because you have a lot of prestige, a big audience potentially.

Philip Tetlock: It’s worth a shot.

Robert Wiblin: Can you share any research ideas about forecasting or some other related topic that seem exciting to you that you’d like to see someone do, but that you don’t think you or any of your direct collaborators are going to get to any anytime soon? Possibly some listener could take it on.

Philip Tetlock: I think that we should be linking forecasting tournaments to policy. We have to bridge the rigor relevance gap, and the best way to do that is by running second generation forecasting tournaments that incentivize people to generate creative questions, as well as accurate answers. And creative questions are the things I think we want to put into Bayesian question clusters and link short-term and long-term. So we can figure out whether we’re on a historical trajectory toward nuclear war or strong AI dislocating labor markets or whatever.

Robert Wiblin: One of the strongest lessons is to take the outside view and use reference classes. Do you think it’s possible that the research overestimates the value of reference classes because questions where reference classes are possible or more easy, are ones where people can just in general be more accurate anyway? Is that something you’ve tried to control for?

Philip Tetlock: When I originally started out in this work, one of the pundits I wanted to recruit was William Safire, who was a famous New York Times columnist, who ran his own forecasting tournament called Office Pool. And he thought my forecasting tournament idea was pretty dumb. But one of the reasons he thought it was dumb is that, we would be asking questions about events he considered unique. He didn’t think they had reference classes and he was a real stickler for language and he thought unique meant unique. He thought unique meant one of a kind, period, full stop.

Philip Tetlock: Now I on the other hand, I’m a vulgarian and I see uniqueness as a matter of degree. And there are categories of questions. If anything, I’m not sure there’s anything truly unique under the sun. I think there are lots of … The universe of precedence is extremely large. And it’s possible to see virtually anything, it’s some combination of base rates. How useful it is, though, is another matter, how difficult it is to identify those base races is another matter. Sometimes it’s really easy to spot what the base rates are with loan applicants, and other times with civil wars, it’s gets harder. And then with other kinds of crises, it gets harder still.

Philip Tetlock: So I think the Kahneman insight on outside view and reference classes is profound. Do I think it sometimes breaks down? Sure. But I think if you had to bet, that’s that’s a pretty good place to start. But don’t stop there, just keep updating.

Robert Wiblin: You’ve been a moderately frequent user of Twitter over the last few years. Given the high standards of reasonableness and evenhandedness you promote, how do you find your experiences on social media?

Philip Tetlock: One of my great disappointments on Twitter was how my friend Nassim Taleb behaved. I mean, he and I had written the paper together and we’d known each other for years. And I came on to Twitter after the Superforecasting book came out, and I saw the things he was saying about a lot of people I respected, a long list of people actually, Cass Sunstein and Steve Pinker and Stuart Brant, and later Nate Silver. Just a long list of people, he calls them BS vendors. And a lot of obscenity, a lot of posturing. And when I think of someone as smart as he is, with whom I’ve had, I think reasonably intelligent conversations, and I look at the way he behaves on Twitter, I think it’s a cognitive sin. He clearly could be raising the collective intelligence of the Twitter verse, and he’s lowering the collective intelligence.

Philip Tetlock: So my biggest disappointment was Nassim and encountering him on Twitter. Otherwise, I found it a pretty good source of information. I mean, you hook into pretty reliable news sources and smart people and they generate a lot of interesting things. And even a few research ideas have come out of it. So I think it’s been fun by and large, with the exception of … Well, not just Nassim, he’s the person I knew the best. And he was the person who disappointed me the most.

Robert Wiblin: Okay, we’re coming up on time, but just a final question. Have you been making in, like tracking predictions yourself about your own life? How have you been doing? Has that affected any decisions that you’ve made?

Philip Tetlock: Yeah, Yeah, it does. My wife and I think about our mortality, we’re reaching that age where we’re approaching retirement and we look at resources, and we look at time. We look at how we want to spend the remainder of our time. So it’s a continuous calculator and we adjust in response to healthiness and changing preferences. So yeah we do. I mean, how long, so I’m 65, how long do I expect to live? Probably into my early 80s and that may be my best guess right now. But that can change.

Robert Wiblin: You look pretty healthy and sharp to me. We’ll see how long it lasts…

Philip Tetlock: And then there’s the issue of dementia, that thing is hanging over all the boomers. But that doesn’t seem to have struck quite yet.

Robert Wiblin: Well, fingers crossed, we’ll be able to do another interview and you’ll be just as sharp in 10 years time. It’s been a great pleasure. Thanks so much for coming back on the podcast Philip.

Philip Tetlock: Always good to talk.

Robert Wiblin: I hope you enjoyed that interview. Just a few reminders before we go. If you want to hear more on these topics check out episode 15, my first interview with Philip.

You can apply to attend EA Global where there are more conversations like this one at eaglobal.org.

If you want to participate in the counterfactual forecasting tournament involving the game Civilization 5, go sign up at 80k.link/civ.

Click on the link in the show notes or go to 80000hours dot org slash calibration hyphen training to try out or calibration training app.

We’ll put in a link to that 20 minute blog post summarising the evidence on good forecasting practices for you to read.

The 80,000 Hours Podcast is produced by Keiran Harris.

Thanks for joining – talk to you in a week or two.