I agree this is quite different from the standard GJ forecasting problem. And that GJ forecasters* are primarily selected for and experienced with forecasting quite different sorts of questions.
But my claim is not âtrust them, they are well-calibrated on thisâ. Itâs more âif your reason for thinking X will happen is a complex multi-stage argument, and a bunch of smart people with no particular reason to be biased, who are also selected for being careful and rational on at least some complicated emotive stuff, spend hours and hours on your argument and come away with a very different opinion on its strength, you probably shouldnât trust the argument much (though this is less clear if the argument depends on technical scientific or mathematical knowledge they lack**)â. That is, I am not saying âsupers are well-calibrated, so the risk probably is about 1 in 1000â. I agree the case for that is not all that strong. I am saying âif the concerned groupâs credences are based in a multi-step, non-formal argument whose persuasiveness the supers feel very differently about, that is bad sign for how well-justified those credences are.â
Actually, in some ways, it might look better for AI X-risk work being a good use of money if the supers were obviously well-calibrated on this. A 1 in 1000 chance of an outcome as bad as extinction is likely worth spending some small portion of world GDP on preventing. And AI safety spending so far is a drop a bucket compared to world GDP. (Yeah, I know technical the D stands for domestic so âworld GDPâ canât be quite the right term, but I forget the right one!). Indeed âAI risk is at least 1 in 1000âł is how Greaves and MacAskill justify the âwe can make a big difference to the long-term future in expectationâ in âThe Case for Strong Longtermismâ. (If a 1 in 1000 estimate is relatively robust, I think it is a big mistake to call this âPascalâs Muggingâ.)
*(of whom Iâm one as it happens, though I didnât work on this: did work on the original X-risk forecasting tournament.)
**I am open to argument that this actually is the case here.
Why do you think superforecasters who were selected specifically for assigning a low probability to AI x-risk are well described as âa bunch of smart people with no particular reason to be biasedâ?
For the avoidance of doubt, Iâm not upset that the supers were selected in this way, itâs the whole point of the study, made very clear in the write-up, and was clear to me as a participant. Itâs just that âyour arguments failed to convince randomly selected superforecastersâ and âyour arguments failed to convince a group of superforecasters who were specifically selected for confidentiality disagreeing with youâ are very different pieces of evidence.
One small clarification: the skeptical group was not all superforecasters. There were two domain experts as well. I was one of them.
Iâm sympathetic to Davidâs point here. Even though the skeptic camp was selected for their skepticism, I think we still get some information from the fact that many hours of research and debate didnât move their opinions. I think there are plausible alternative worlds where the skeptics come in with low probabilities (by construction), but update upward by a few points after deeper engagement reveals holes in their early thinking.
Ok, I slightly overstated the point. This time, the supers selected were not a (mostly) random draw from the set of supers. But they were in the original X-risk tournament, and in that case too, they were not persuaded to change their credences via further interaction with the concerned (that is the X-risk experts.) Then, when we took the more skeptical of them and gave them yet more exposure to AI safety arguments, that still failed to move the skeptics. I think taken together, these two results show that AI safety arguments are not all that persuasive to the average super. (More precisely, that no amount of exposure to them will persuade all supers as a group to the point where they get a median significantly above 0.75% in X-risk by the centuries end.)
I agree this is quite different from the standard GJ forecasting problem. And that GJ forecasters* are primarily selected for and experienced with forecasting quite different sorts of questions.
But my claim is not âtrust them, they are well-calibrated on thisâ. Itâs more âif your reason for thinking X will happen is a complex multi-stage argument, and a bunch of smart people with no particular reason to be biased, who are also selected for being careful and rational on at least some complicated emotive stuff, spend hours and hours on your argument and come away with a very different opinion on its strength, you probably shouldnât trust the argument much (though this is less clear if the argument depends on technical scientific or mathematical knowledge they lack**)â. That is, I am not saying âsupers are well-calibrated, so the risk probably is about 1 in 1000â. I agree the case for that is not all that strong. I am saying âif the concerned groupâs credences are based in a multi-step, non-formal argument whose persuasiveness the supers feel very differently about, that is bad sign for how well-justified those credences are.â
Actually, in some ways, it might look better for AI X-risk work being a good use of money if the supers were obviously well-calibrated on this. A 1 in 1000 chance of an outcome as bad as extinction is likely worth spending some small portion of world GDP on preventing. And AI safety spending so far is a drop a bucket compared to world GDP. (Yeah, I know technical the D stands for domestic so âworld GDPâ canât be quite the right term, but I forget the right one!). Indeed âAI risk is at least 1 in 1000âł is how Greaves and MacAskill justify the âwe can make a big difference to the long-term future in expectationâ in âThe Case for Strong Longtermismâ. (If a 1 in 1000 estimate is relatively robust, I think it is a big mistake to call this âPascalâs Muggingâ.)
*(of whom Iâm one as it happens, though I didnât work on this: did work on the original X-risk forecasting tournament.)
**I am open to argument that this actually is the case here.
Why do you think superforecasters who were selected specifically for assigning a low probability to AI x-risk are well described as âa bunch of smart people with no particular reason to be biasedâ?
For the avoidance of doubt, Iâm not upset that the supers were selected in this way, itâs the whole point of the study, made very clear in the write-up, and was clear to me as a participant. Itâs just that âyour arguments failed to convince randomly selected superforecastersâ and âyour arguments failed to convince a group of superforecasters who were specifically selected for confidentiality disagreeing with youâ are very different pieces of evidence.
One small clarification: the skeptical group was not all superforecasters. There were two domain experts as well. I was one of them.
Iâm sympathetic to Davidâs point here. Even though the skeptic camp was selected for their skepticism, I think we still get some information from the fact that many hours of research and debate didnât move their opinions. I think there are plausible alternative worlds where the skeptics come in with low probabilities (by construction), but update upward by a few points after deeper engagement reveals holes in their early thinking.
Ok, I slightly overstated the point. This time, the supers selected were not a (mostly) random draw from the set of supers. But they were in the original X-risk tournament, and in that case too, they were not persuaded to change their credences via further interaction with the concerned (that is the X-risk experts.) Then, when we took the more skeptical of them and gave them yet more exposure to AI safety arguments, that still failed to move the skeptics. I think taken together, these two results show that AI safety arguments are not all that persuasive to the average super. (More precisely, that no amount of exposure to them will persuade all supers as a group to the point where they get a median significantly above 0.75% in X-risk by the centuries end.)