James Snowden argued that expected value calculations based on predictions flung farther into the future, which depend on greater number of variables, and which are based on less concrete estimates rest on less a strong standard of evidence than interventionists are used to. For some causes, effective altruism is predicated on things we can’t just run RCTs on, and depend on predictions of what’s likely to happen. I believe this will be the case for more causes as time passes, and will become a more used method for the greatest opportunities to do good effective altruism has as it becomes more ambitious, robust, and bigger as a movement. I think much effective altruism, then, will unavoidably depend on arguments from prediction for the foreseeable future.
A good track record of correct predictions for whatever reference class of work an effective altruist prescribes for the rest of us, then, is the closest we can get to testing the value of interventions which can or will only happen in the future, or once. The more specific predictions are made, with the greatest frequency of turning out accurate, and a more robust fit to the closest reference class for the effective altruism predictions, the more confidence we can have in a forecaster. What I think is promising may be developing or using explicit models of forecasting, which we can test, rather than just relying on the intuitions of individual forecasters, no matter how super they are. This way, more effective altruists can also test or use promising models. I don’t know anything about this yet, but the possibility excites me.
I think it will take quite some time for any person or model to build a worthy track record for predictions in the reference class matching its class of domain-specific interventions. However, it seems the value of information could be very powerful, so I think it’s worth trying. To this end, I think it’s worth more of us to use prediction registries, build prediction markets for effective altruism, practice forecasting to learn and improve, and survey the academic literature to see if there are strategies or theory of forecasting better. Also, encourage other effective altruists to do the same, especially if they prioritize a more speculative or less concrete cause, and are or claim to be some sort of expert.
This sounds like a really great idea. I think as a community we tend to make loads of predictions; it seems likely we do this a lot more than other demographics. We do this for fun, as thought experiments and often as a key area of focus such as x-risk etc. It seems like a good idea to track our individual abilities on doing this sort of predicting for many reasons. Identifying who is particularly good at this, for improvement etc. It does make me concerned that we would become hyper-focused on predictions and lead us to potentially neglect current causes; getting too caught up in planning and looking forward and forgetting to actually do the thing we say we prioritize.
I also wonder about how well near-future prediction ability translates to far-future predictions. In order to test how well you are able to predict thing you predict near-future events or changes. You increase your accuracy at doing these and assume it translates to the far-future. Lots of people make decisions based around your far-future predictions based on your track record of being an accurate predictor. Perhaps, however, your model of forecasting is actually wildly inaccurate when it comes to long term predictions. I’m not sure how we could account for this. Thoughts?
To clarify, there is a class of persons known as “superforecasters”. I don’t know the details of the science to back it up, except their efficacy has indeed been validly measured, so you’ll have to look more up yourself to learn how it happens. What happens, though, is superforecasters are humans who, even though they don’t usually have domain expertise in a particular subject, predict outcomes in a particular domain with more success than experts in the domain, e.g., economics. I think that might be one layperson forecaster versus one expert, rather than the consensus of experts making the prediction, but I don’t know. I don’t believe there’s been a study about the prediction success rates of a consensus of superforecasters vs. a consensus of domain experts on predicting outcomes relevant to their expertise. That would be very interesting. These are rather new results.
Anyway, superforecasters can also beat algorithms which try to learn how to make predictions, which are in turn also better than experts. So, no human or machine yet is better than superforecasters at making lots of types of predictions. In case you’re wondering, no, it’s not just you, that is a ludicruous and stupendous outcome. Like, what? mind blown. The researchers were surprised too.
From the linked NPR article:
For most of his professional career, Tetlock studied the problems associated with expert decision making. His book Expert Political Judgment is considered a classic, and almost everyone in the business of thinking about judgment speaks of it with unqualified awe.
All of his studies brought Tetlock to at least two important conclusions.
First, if you want people to get better at making predictions, you need to keep score of how accurate their predictions turn out to be, so they have concrete feedback.
But also, if you take a large crowd of different people with access to different information and pool their predictions, you will be in much better shape than if you rely on a single very smart person, or even a small group of very smart people. [emphasis mine]
Takeaways for effective altruist predictions:
Track your predictions. Any effective altruist seeing value in prediction markets takes this as a given.
There are characteristics which make some forecasters better than others, even adjusting for level of practice and calibration. I don’t know what these characteristics are, but I’m guessing it’s some sort of analytic mindest. Maybe effective altruists, in this sense, might also turn out to be great forecaster. That’d be very fortuitous for us. We need to look into this more.
If, like me, you perceive much potential in prediction markets for effective altruism, you’d value a diversity of intellectual perspectives, to increase chances of hitting the “wisdom of the crowds” effect Tetlock mentions. Now, SydMartin, I know both you and I know what a shadow a lack of diversity casts on effective altruism. I emphasized the last paragraph because you just last week commented on the propensity of effective altruism to be presumptuous and elitist about its own abilities as well. I believe a failure to accurately predict future outcomes on the part of this community would be due more to a lack of intellectual diversity, i.e., everyone hailing from mostly the same university majors (e.g., philosophy, economics, computer science). I think this would more play a factor than sociopolitical homegeneity within effective altruism. Still, that’s just my pet hypothesis that’s yet to pan out in any way.
I also wonder about how well near-future prediction ability translates to far-future predictions. In order to test how well you are able to predict thing you predict near-future events or changes. You increase your accuracy at doing these and assume it translates to the far-future. Lots of people make decisions based around your far-future predictions based on your track record of being an accurate predictor. Perhaps, however, your model of forecasting is actually wildly inaccurate when it comes to long term predictions. I’m not sure how we could account for this. Thoughts?
I’d be concerned a successful track record of near-term predictions would tell us much about potential success with long-term predictions. First of all, for existential risks, I suspect predictions made in the near-term relating to the field of only a single existential risk, such as A.I. risks, should be counted toward expectations for their long-term track record[1]. Even if it’s more complicated than that, I think there is something near-term prediction track records can tell us. If someone near-future prediction track records are awful, that at least informs us the team or person in question isn’t great at predictions at all. So, we would not want to rely on their predictions further afield as well.
It’s like science. We can’t inductively conclude that there correct predictions of the past will continue on some arbitrary timescale, but we can rule out bad predictors from being reliable by process of elimination.
I think prediction markets might apply to all focus areas of effective altruism, though not always to the same extent. Running intervention experiments is difficult or expensive. For example, while GiveDireclty, IPA, and the Poverty Action Lab build on so many millions of dollars of development aid each year already, effective altruism itself has been responsible to inject the same empiricism into animal activism. Intervention experiments into animal activism have been expensive for organizations like Mercy For Animals, so these experiments aren’t carried out, or refined to try to find better methods, often. Also, there’s difficulty in getting the cooperation of animal activists on randomized control trials, as their community isn’t as receptive yet. Further, both due the low numbers of volunteers, like Peter Hurford, from effective altruism, and our lack of experience, it’s difficult to get experimental designs right the first time, and in as short a timeframe as, e.g., Animal Charity Evaluators, would hope.
However, after a first successful experiment, for whatever value of “success” effective altruism or others assign, other organizations could design experiments using the same paradigm and preregister their plans. Then, an EA prediction registry or market could look at the details of the experiment, or demand more details, and predict the chance it would confirm the hypothesis/goal/whatever. They could judge the new design on how it deviates from the original template, or how closely they expect it to replicate, or how biased they think it will be. If the most reliable forecasters weren’t confident in the experiment, that would inform to the rest of us whether it’s worth us funding it when organizations ask for funding. This way, we can select animal advocacy RCTs or other studies more efficiently, when we’re limited by how many we carry out because of a scarcity of resources.
Of course, this isn’t just for experiments, or animal activism. The great thing about a prediction market anyone can enter is nobody needs to centrally allocate the information to all predictors. They could have expertise, hunches, or whatever nobody else knows about, and as long as they’re confident in their own analysis or information, they’ll bet on it. I was discussing certificate of impact purchases on Facebook yesterday, and Lauren Lee came forward stating she might prefer prediction markets to predict the value and success of a project before it’s started, rather than a posterior evaluation based on impact certificates. I don’t see a reason there shouldn’t be both, though.
Presuming effective altruism becomes bigger and more ambitious in the future, the community will try more policy interventions, research projects, and small- and large-scale interventions we won’t have tested yet. Of course, some experiments won’t need to rely on prediction markets, but there is little reason forecasters couldn’t bet on their success as well to hone their prediction skills.
[1] Yes, this counts as predicting how successful predictions would be. Go meta!
James Snowden argued that expected value calculations based on predictions flung farther into the future, which depend on greater number of variables, and which are based on less concrete estimates rest on less a strong standard of evidence than interventionists are used to. For some causes, effective altruism is predicated on things we can’t just run RCTs on, and depend on predictions of what’s likely to happen. I believe this will be the case for more causes as time passes, and will become a more used method for the greatest opportunities to do good effective altruism has as it becomes more ambitious, robust, and bigger as a movement. I think much effective altruism, then, will unavoidably depend on arguments from prediction for the foreseeable future.
A good track record of correct predictions for whatever reference class of work an effective altruist prescribes for the rest of us, then, is the closest we can get to testing the value of interventions which can or will only happen in the future, or once. The more specific predictions are made, with the greatest frequency of turning out accurate, and a more robust fit to the closest reference class for the effective altruism predictions, the more confidence we can have in a forecaster. What I think is promising may be developing or using explicit models of forecasting, which we can test, rather than just relying on the intuitions of individual forecasters, no matter how super they are. This way, more effective altruists can also test or use promising models. I don’t know anything about this yet, but the possibility excites me.
I think it will take quite some time for any person or model to build a worthy track record for predictions in the reference class matching its class of domain-specific interventions. However, it seems the value of information could be very powerful, so I think it’s worth trying. To this end, I think it’s worth more of us to use prediction registries, build prediction markets for effective altruism, practice forecasting to learn and improve, and survey the academic literature to see if there are strategies or theory of forecasting better. Also, encourage other effective altruists to do the same, especially if they prioritize a more speculative or less concrete cause, and are or claim to be some sort of expert.
This sounds like a really great idea. I think as a community we tend to make loads of predictions; it seems likely we do this a lot more than other demographics. We do this for fun, as thought experiments and often as a key area of focus such as x-risk etc. It seems like a good idea to track our individual abilities on doing this sort of predicting for many reasons. Identifying who is particularly good at this, for improvement etc. It does make me concerned that we would become hyper-focused on predictions and lead us to potentially neglect current causes; getting too caught up in planning and looking forward and forgetting to actually do the thing we say we prioritize.
I also wonder about how well near-future prediction ability translates to far-future predictions. In order to test how well you are able to predict thing you predict near-future events or changes. You increase your accuracy at doing these and assume it translates to the far-future. Lots of people make decisions based around your far-future predictions based on your track record of being an accurate predictor. Perhaps, however, your model of forecasting is actually wildly inaccurate when it comes to long term predictions. I’m not sure how we could account for this. Thoughts?
To clarify, there is a class of persons known as “superforecasters”. I don’t know the details of the science to back it up, except their efficacy has indeed been validly measured, so you’ll have to look more up yourself to learn how it happens. What happens, though, is superforecasters are humans who, even though they don’t usually have domain expertise in a particular subject, predict outcomes in a particular domain with more success than experts in the domain, e.g., economics. I think that might be one layperson forecaster versus one expert, rather than the consensus of experts making the prediction, but I don’t know. I don’t believe there’s been a study about the prediction success rates of a consensus of superforecasters vs. a consensus of domain experts on predicting outcomes relevant to their expertise. That would be very interesting. These are rather new results.
Anyway, superforecasters can also beat algorithms which try to learn how to make predictions, which are in turn also better than experts. So, no human or machine yet is better than superforecasters at making lots of types of predictions. In case you’re wondering, no, it’s not just you, that is a ludicruous and stupendous outcome. Like, what? mind blown. The researchers were surprised too.
From the linked NPR article:
Takeaways for effective altruist predictions:
Track your predictions. Any effective altruist seeing value in prediction markets takes this as a given.
There are characteristics which make some forecasters better than others, even adjusting for level of practice and calibration. I don’t know what these characteristics are, but I’m guessing it’s some sort of analytic mindest. Maybe effective altruists, in this sense, might also turn out to be great forecaster. That’d be very fortuitous for us. We need to look into this more.
If, like me, you perceive much potential in prediction markets for effective altruism, you’d value a diversity of intellectual perspectives, to increase chances of hitting the “wisdom of the crowds” effect Tetlock mentions. Now, SydMartin, I know both you and I know what a shadow a lack of diversity casts on effective altruism. I emphasized the last paragraph because you just last week commented on the propensity of effective altruism to be presumptuous and elitist about its own abilities as well. I believe a failure to accurately predict future outcomes on the part of this community would be due more to a lack of intellectual diversity, i.e., everyone hailing from mostly the same university majors (e.g., philosophy, economics, computer science). I think this would more play a factor than sociopolitical homegeneity within effective altruism. Still, that’s just my pet hypothesis that’s yet to pan out in any way.
I’d be concerned a successful track record of near-term predictions would tell us much about potential success with long-term predictions. First of all, for existential risks, I suspect predictions made in the near-term relating to the field of only a single existential risk, such as A.I. risks, should be counted toward expectations for their long-term track record[1]. Even if it’s more complicated than that, I think there is something near-term prediction track records can tell us. If someone near-future prediction track records are awful, that at least informs us the team or person in question isn’t great at predictions at all. So, we would not want to rely on their predictions further afield as well.
It’s like science. We can’t inductively conclude that there correct predictions of the past will continue on some arbitrary timescale, but we can rule out bad predictors from being reliable by process of elimination.
I think prediction markets might apply to all focus areas of effective altruism, though not always to the same extent. Running intervention experiments is difficult or expensive. For example, while GiveDireclty, IPA, and the Poverty Action Lab build on so many millions of dollars of development aid each year already, effective altruism itself has been responsible to inject the same empiricism into animal activism. Intervention experiments into animal activism have been expensive for organizations like Mercy For Animals, so these experiments aren’t carried out, or refined to try to find better methods, often. Also, there’s difficulty in getting the cooperation of animal activists on randomized control trials, as their community isn’t as receptive yet. Further, both due the low numbers of volunteers, like Peter Hurford, from effective altruism, and our lack of experience, it’s difficult to get experimental designs right the first time, and in as short a timeframe as, e.g., Animal Charity Evaluators, would hope.
However, after a first successful experiment, for whatever value of “success” effective altruism or others assign, other organizations could design experiments using the same paradigm and preregister their plans. Then, an EA prediction registry or market could look at the details of the experiment, or demand more details, and predict the chance it would confirm the hypothesis/goal/whatever. They could judge the new design on how it deviates from the original template, or how closely they expect it to replicate, or how biased they think it will be. If the most reliable forecasters weren’t confident in the experiment, that would inform to the rest of us whether it’s worth us funding it when organizations ask for funding. This way, we can select animal advocacy RCTs or other studies more efficiently, when we’re limited by how many we carry out because of a scarcity of resources.
Of course, this isn’t just for experiments, or animal activism. The great thing about a prediction market anyone can enter is nobody needs to centrally allocate the information to all predictors. They could have expertise, hunches, or whatever nobody else knows about, and as long as they’re confident in their own analysis or information, they’ll bet on it. I was discussing certificate of impact purchases on Facebook yesterday, and Lauren Lee came forward stating she might prefer prediction markets to predict the value and success of a project before it’s started, rather than a posterior evaluation based on impact certificates. I don’t see a reason there shouldn’t be both, though.
Presuming effective altruism becomes bigger and more ambitious in the future, the community will try more policy interventions, research projects, and small- and large-scale interventions we won’t have tested yet. Of course, some experiments won’t need to rely on prediction markets, but there is little reason forecasters couldn’t bet on their success as well to hone their prediction skills.
[1] Yes, this counts as predicting how successful predictions would be. Go meta!