If I understand the proposed model correctly (I haven’t read thoroughly, so apologies if not): The model basically assumes that “longtermist interventions” cannot cause accidental harm. That is, it assumes that if a “longtermist intervention” is carried out, the worst-case scenario is that the intervention will end up being neutral (e.g. due to an “exogenous nullifying event”) and thus resources were wasted.
But this means assuming away the following major part of complex cluelessness: due to an abundance of crucial considerations, it is usually extremely hard to judge whether an intervention that is related to anthropogenic x-risks or meta-EA is net-positive or net-negative. For example, such an intervention may cause accidental harm due to:
Drawing attention to dangerous information (e.g. certain exciting approaches for AGI development / virology experimentation).
If a researcher believes they came up with an impressive insight, they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line.
Alexander Berger (co-CEO of OpenPhil) said in an interview :
I think if you have the opposite perspective and think we live in a really vulnerable world — maybe an offense-biased world where it’s much easier to do great harm than to protect against it — I think that increasing attention to anthropogenic risks could be really dangerous in that world. Because I think not very many people, as we discussed, go around thinking about the vast future.
If one in every 1,000 people who go around thinking about the vast future decide, “Wow, I would really hate for there to be a vast future; I would like to end it,” and if it’s just 1,000 times easier to end it than to stop it from being ended, that could be a really, really dangerous recipe where again, everybody’s well intentioned, we’re raising attention to these risks that we should reduce, but the increasing salience of it could have been net negative.
“Patching” a problem and preventing a non-catastrophic, highly-visible outcome that would have caused an astronomically beneficial “immune response”.
Nick Bostrom said in a talk (“lightly edited for readability”):
Small and medium scale catastrophe prevention? Also looks good. So global catastrophic risks falling short of existential risk. Again, very difficult to know the sign of that. Here we are bracketing leverage at all, even just knowing whether we would want more or less, if we could get it for free, it’s non-obvious. On the one hand, small-scale catastrophes might create an immune response that makes us better, puts in place better safeguards, and stuff like that, that could protect us from the big stuff. If we’re thinking about medium-scale catastrophes that could cause civilizational collapse, large by ordinary standards but only medium-scale in comparison to existential catastrophes, which are large in this context, again, it is not totally obvious what the sign of that is: there’s a lot more work to be done to try to figure that out. If recovery looks very likely, you might then have guesses as to whether the recovered civilization would be more likely to avoid existential catastrophe having gone through this experience or not.
Causing decision makers to have a false sense of security.
For example, perhaps it’s not feasible to solve AI alignment in a competitive way without strong coordination, etcetera. But researchers are biased towards saying good things about their field, their colleagues and their (potential) employers.
Causing progress in AI capabilities to accelerate in a certain way.
Causing the competition dynamics among AI labs / states to intensify.
Decreasing the EV of the EA community by exacerbating bad incentives and conflicts of interest, and by reducing coordination.
For example, by creating impact markets.
Causing accidental harm via outreach campaigns or regulation advocacy (e.g. by causing people to get a bad first impression of something important).
Causing a catastrophic leak from a virology lab, or an analogous catastrophe involving an AI lab.
All good points, but Tarsney’s argument doesn’t depend on the assumption that longtermist interventions cannot accidentally increase x-risk. It just depends on the assumption that there’s some way that we could spend $1 million that would increase the epistemic probability that humanity survives the next thousand years by at least 2x10^-14.
If I understand the proposed model correctly (I haven’t read thoroughly, so apologies if not): The model basically assumes that “longtermist interventions” cannot cause accidental harm. That is, it assumes that if a “longtermist intervention” is carried out, the worst-case scenario is that the intervention will end up being neutral (e.g. due to an “exogenous nullifying event”) and thus resources were wasted.
But this means assuming away the following major part of complex cluelessness: due to an abundance of crucial considerations, it is usually extremely hard to judge whether an intervention that is related to anthropogenic x-risks or meta-EA is net-positive or net-negative. For example, such an intervention may cause accidental harm due to:
Drawing attention to dangerous information (e.g. certain exciting approaches for AGI development / virology experimentation).
If a researcher believes they came up with an impressive insight, they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line.
Alexander Berger (co-CEO of OpenPhil) said in an interview :
“Patching” a problem and preventing a non-catastrophic, highly-visible outcome that would have caused an astronomically beneficial “immune response”.
Nick Bostrom said in a talk (“lightly edited for readability”):
Causing decision makers to have a false sense of security.
For example, perhaps it’s not feasible to solve AI alignment in a competitive way without strong coordination, etcetera. But researchers are biased towards saying good things about their field, their colleagues and their (potential) employers.
Causing progress in AI capabilities to accelerate in a certain way.
Causing the competition dynamics among AI labs / states to intensify.
Decreasing the EV of the EA community by exacerbating bad incentives and conflicts of interest, and by reducing coordination.
For example, by creating impact markets.
Causing accidental harm via outreach campaigns or regulation advocacy (e.g. by causing people to get a bad first impression of something important).
Causing a catastrophic leak from a virology lab, or an analogous catastrophe involving an AI lab.
All good points, but Tarsney’s argument doesn’t depend on the assumption that longtermist interventions cannot accidentally increase x-risk. It just depends on the assumption that there’s some way that we could spend $1 million that would increase the epistemic probability that humanity survives the next thousand years by at least 2x10^-14.