Good post. I have two general themes I’d like to comment on:
Analogies for cause prioritization
Your analysis covers several perspectives on this phenomenon, if we focus on the “actual performance” perspective, this is pretty similar to multi-armed bandits. One pattern that I think is present in strategies for these types of problems is the idea of spreading out actions across the different possibilities (explore vs exploit and all that). It wouldn’t necessarily make sense to commit to one “arm” (or cause) early on when information is low. This “spreading out” across options is one way of dealing with uncertainty.
A similar idea comes up in another potential anology for cause prioritization, financial investing. We can think about optimizing a portfolio and its allocation to achieve good returns relative to risk, rather than trying to pick the single highest return asset. Thus we get concepts like disversification.
I find this stock-picking analogy helpful for thinking about how “neglectedness” is often treated in practice. I’ve often found myself skeptical of arguments for and from neglectedness, and I feel the way it is applied in practice doesn’t really align with the classic “diminishing returns” conception. I think the way neglectedness is treated in practice ends up being more like how an investor with a high risk tolerance might view a risky asset. Riskier assets are expected to have higher returns, investors with lower risk tolerance would staturate low-risk/high-return options quickly, leaving risker investments “neglected”. Thus an investor with high risk tolerance can find good opportunities that would be unappealing to other less risk tolerant investors by going to higher risk assets. I think this captures the spirit of what “neglected” cause areas have often looked like in EA, more speculative but where some EAs have a strong feeling that they caould have outsized impact.
If I can read between the lines a bit, under this anology EA pivoting more into AI is kind of like an investor who wants higher returns putting more of their portfolio in small cap growth stocks that are risker but which the investor thinks will result in higher return. One downside of this is decreased diversification. Another possible option would be to hold a more diversified portfolio but use leverage.
In-model vs Out-of-model robustness
The problem is not limited to cases with trials and noisy statistics, because the error does not have to arise from random chance. Problems with assumptions, bad guesses, even math errors will equally get you cursed. If anything, I would expect causes that lack empirical experimental data to be more cursed, not less.
I think this gets at a distinction that is worth calling out, in-model vs out-of-model robustness.
In my experience with cost-benefit analysis, both reading EA related ones and in industry, it is fairly common to propose a “median” scenario and also a “pessimistic” scenario, and provide estimates for these cases. The point is usually that since even the pessimistic scenario looks good, the analysis shows that the proposed intervention is robustly beneficial. This has a two-fold problem:
First, usually the reason to think that the “pessimistic” scenario is ’pessimistic is just that it uses parameter values that reduce the estimated benefit below the “median” scenario. It’s unclear sometimes why that means the estimate is robustly lower than the actual benefit. This is the in-model robustness.
Despite the fact that I think this is an issue, sometimes it may be perceived as (or actually be) a somewhat unfair critique. All models are wrong, we have to use what we have to make estimates. This can result in polarized views of what an estimate shows. For a person who likes the intervention and has a gut feeling it is good, the “median” estimate makes a ton of sense and this seems like a very reasonable approach. For a skeptic, it seems prone to over-estimation for the reasons you highlight in the post. Moving the parameters so that your estimate is 25% lower doesn’t turn garbage into non-garbage.
However, there is another source of error lurking in the background. What about costs that you haven’t included? The potential for the intervention to backfire that isn’t considered in any scenario? The hidden assumption that hasn’t been tested in the “pessimistic” scenario? This is out-of-model robustness.
I think the polarization when it comes to in-model robustness causes proponents or fans of an idea or intervention to over-estimate robustness even when in-model robustness is high, because they implicitly credit the (perceived) in-model robustness to the out-of-model robustness.
In my view, the whole “rule high stakes in, not out” idea in practice will result in systematically doing this a lot, which I think makes it a bad heuristic for approaching these types of situations. One way to think about this is it encourages us to focus on specific high-volatility “assets” and thus lacks diversification.
(crossposting my comment from lesswrong)
I’m somewhat confused by what people mean by “strategic” in these discussions. It seems to me like there are aspects of communication that I would call “stategic” but are uncontroversial. I would suggest the “grown not crafted” idea from IABIED. I think this is a extremely succinct and effective way to communicate the underlying technical idea, but of course expressing the idea in more technical language would also be truthful (arguably even more so). The difference isn’t that the “grown not crafted” way of communicating the idea is more truthful, but (I would infer) that it is predicted ot be more helpful in allowing others to understand the idea, even if they aren’t already famiiar with relevant technical knowledge. In other words, it is a more strategic way of communicating the idea.
I don’t think that example is particularly controversial, but I think these discussions are meant to also apply to statements that are highly contentious and subject to adversarial dynamics. When we consider statements that are more in those domains, I think we enter a kind of “messy middle” where the truthfulness of a statement is much more contested and it isn’t necessarily as easy to seperate the core content from the communication strategy.
Let’s take the “DEFUND THE POLICE” example from the post. I definitely believe that some people engage in the “window-stretching” as you describe. But I could also imagine proponents of the slogan saying something like this:
“I genuinely believe that current policing practices are so unjust that the police deserve to be defunded. It would be better for policing to be reformed to correct these injustices while still carrying out legitimate policing functions. But saying I want to “defund the police” is genuinely true under the current system. If I only advocated for more centrist reforms that would actually be misleading about what I believe and trying to fit my beliefs into the overton window, not the other way around. Its extremely important to use slogans like “DEFUND THE POLICE” when you actually believe in them because the powerful simply ignore centrist calls for reform. They need to know how strongly we actually feel about this. Calling for measures such as defunding is the only way to have our voices heard”.
I think such a view lives in this messy middle. People who agree are likely to see it as genuine while critics will be tempted to claim inappropriate use of stategy. The issue of whether the person who holds this view is being “truthful” essentially collapses back into the underlying object-level issues. Those who agree on the object-level will also agree of the meta question of whether what the person is doing is acceptable, and the reverse for someone who disagrees.
As I allude to here I think the best approach is to simply argue over the object-level question, and leave aiside the dicussions about whether someone is being stategic, how truth-seeking they are, if they are “gaslighting” etc.