(I’m putting this as a comment and not an answer to reflect that I have a few tentative thoughts here but they’re not well developed)

A really useful source that explains a Bayesian method of avoiding Pascal’s mugging is this GiveWell Post. TL;dr much of the variation in EV estimates for situations that we know very little about comes from “estimate error”, so we’d have very low credence in these estimates. Even if the most likely EV estimate for an action seems very positive, if there’s extremely high variance due to having very little evidence on which to base that estimate, then we wouldn’t be very surprised if the actual value is net zero or even negative. The post argues that we should also incorporate some sort of prior about the probability distribution of impacts that we can expect from actions. This basically makes us more skeptical the more outlandish the claim is. As a result, we’re actually less persuaded to take an action if it is motivated by an extremely high but unfounded EV estimate versus an action that is equally unfounded but has a less extreme EV estimate and so falls closer to our prior about what is generally plausible. This seems to avoid Pascal’s mugging. (This was my read of the post, it’s completely possible that I misunderstood something and or that persuasive critiques of this reasoning exist and I haven’t encountered them so far).

I think that another point here is whether the very promising but difficult to empirically verify claims that you’re talking about are made with consideration of a representative spectrum of the possible outcomes for an action. As a bit of a toy example (and I’m not criticizing any actual view point here, this is just hopefully rhetorically illustrative), if you think that improving institutional decision making is really positive, your basic reasoning might look like: taking some action to teach decision makers about rationality has x small probability of reaching a person who now has y small probability of being in a position to decide something that has z hugely positive impact if decided with consideration of rational principles. Therefore the EV of me taking this action is xy * z = really big positive number. This only considers the most positive value direction that this action could unfold since it’s assumed that within the much bigger 1-xy probability there are only basically neutral outcomes. It’s at least plausible, however, that some of those outcomes are actually quit bad (say that you teach a decision maker an incorrect principle or that you present the idea badly and so through idea inoculation you dissuade someone from becoming more rational and this leads to some significant negative outcome). The likelihood of doing something bad is probably not that high, but say there’s k chance that your action leads to m very bad outcome, then the actual EV is (xy * z ) - (k * m), which might be much lower than if we only considered the positive outcomes of this action. This might suggest that the EV estimates of the types of x-risk mitigation actions you’re expressing some skepticism about could be forgetting to account for the possibility that they have a negative impact, which could meaningfully lower their EV. Although people may be already factoring such considerations in and just not necessarily making that explicit.

(I’m putting this as a comment and not an answer to reflect that I have a few tentative thoughts here but they’re not well developed)

A really useful source that explains a Bayesian method of avoiding Pascal’s mugging is this GiveWell Post. TL;dr much of the variation in EV estimates for situations that we know very little about comes from “estimate error”, so we’d have very low credence in these estimates. Even if the most likely EV estimate for an action seems very positive, if there’s extremely high variance due to having very little evidence on which to base that estimate, then we wouldn’t be very surprised if the actual value is net zero or even negative. The post argues that we should also incorporate some sort of prior about the probability distribution of impacts that we can expect from actions. This basically makes us more skeptical the more outlandish the claim is. As a result, we’re actually less persuaded to take an action if it is motivated by an extremely high but unfounded EV estimate versus an action that is equally unfounded but has a less extreme EV estimate and so falls closer to our prior about what is generally plausible. This seems to avoid Pascal’s mugging. (This was my read of the post, it’s completely possible that I misunderstood something and or that persuasive critiques of this reasoning exist and I haven’t encountered them so far).

I think that another point here is whether the very promising but difficult to empirically verify claims that you’re talking about are made with consideration of a representative spectrum of the possible outcomes for an action. As a bit of a toy example (and I’m not criticizing any actual view point here, this is just hopefully rhetorically illustrative), if you think that improving institutional decision making is really positive, your basic reasoning might look like:

taking some action to teach decision makers about rationality has x small probability of reaching a person who now has y small probability of being in a position to decide something that has z hugely positive impact if decided with consideration of rational principles. Therefore the EV of me taking this action is xy * z = really big positive number.This only considers the most positive value direction that this action could unfold since it’s assumed that within the much bigger 1-xy probability there are only basically neutral outcomes. It’s at least plausible, however, that some of those outcomes are actually quit bad (say that you teach a decision maker an incorrect principle or that you present the idea badly and so through idea inoculation you dissuade someone from becoming more rational and this leads to some significant negative outcome). The likelihood of doing something bad is probably not that high, but say there’s k chance that your action leads to m very bad outcome, then the actual EV is (xy * z ) - (k * m), which might be much lower than if we only considered the positive outcomes of this action. This might suggest that the EV estimates of the types of x-risk mitigation actions you’re expressing some skepticism about could be forgetting to account for the possibility that they have a negative impact, which could meaningfully lower their EV. Although people may be already factoring such considerations in and just not necessarily making that explicit.