I like this post. Some ideas inspired by it:
If “bias” is pervasive among EA organisations, the most direct implication of this seems to me to be that we shouldn’t take judgements published by EA organisations at face value. That is, if we want to know what is true we should apply some kind of adjustment to their published judgements.
It might also be possible to reduce bias in EA organisations, but that depends on other propositions like how effective debiasing strategies actually are.
A question that arises is “what sort of adjustment should be applied?”. The strategy I can imagine, which seems hard to execute, is: try to anticipate the motivations of EA organisations, particularly those that aren’t “inform everyone accurately about X”, and discount those aspects of their judgements that support these aims.
I imagine that doing this overtly would cause a lot of offence A) because it involves deliberately standing in the way of some of the things that people at EA organisations want and B) because I have seen many people react quite negatively to accusations “you’re just saying W because you want V”.
Considering this issue—how much should we trust EA organisations—and this strategy of trying to make “goals-informed” assessments of their statiments, it occurs to me that a question you could ask is “how well has this organisation oriented themselves towards truthfulness?”.
I like that this post has set out the sketch of a theory of organisation truthfulness. In particular “In worlds where motivated reasoning is commonplace, we’d expect to see:
Red-teaming will discover errors that systematically slant towards an organization’s desired conclusion.
Deeper, more careful reanalysis of cost-effectiveness or impact analyses usually points towards lower rather than higher impact.”
Presumably, in worlds where motivated reasoning is rare, red-teaming will discover errors that slant towards and away from an organisation’s desired conclusion and deeper, more careful reanalysis of cost-effectiveness points towards lower and higher impact equally often.
I note that you are talking about a collection of organisations while I’m talking about a specific organisation. I think you are thinking about it from “how can we evaluate truth-alignment” and I’m thinking about “what do we want to know about truth-alignment”. Maybe it is only possible to evaluate collections of organisations for truth-alignment. At the same time I think it would clearly be useful to know about the truth-alignment of individual organisations, if we could.
It would be interesting, and I think difficult, to expand this theory in three ways:
To be more specific about what “an organisation’s desired conclusion” is, so we can unambiguously say whether something “slants towards” it
Consider whether there are other indications of truth-misalignment
Consider whether it is possible to offer a quantitative account of (A) the relationship between the degree of truth-misalignment of an organisation and the extent to which we see certain indications like consistent updating in the face of re-analysis and (B) the relationship between an organisation’s truth-misalignment and the manner and magnitude by which we should discount their judgements
To be clear, I’m not saying these things are priorities, just ideas I had and haven’t carefully evaluated.
Thanks for your extensions! Worth pondering more.
I like that this post has set out the sketch of a theory of organisation truthfulness. In particular “In worlds where motivated reasoning is commonplace, we’d expect to see:Red-teaming will discover errors that systematically slant towards an organization’s desired conclusion.Deeper, more careful reanalysis of cost-effectiveness or impact analyses usually points towards lower rather than higher impact.”Presumably, in worlds where motivated reasoning is rare, red-teaming will discover errors that slant towards and away from an organisation’s desired conclusion and deeper, more careful reanalysis of cost-effectiveness points towards lower and higher impact equally often.
I think this is first-order correct (and what my post was trying to get at). Second-order, I think there’s at least one important caveat (which I cut from my post) with just tallying total number (or importance-weighted number of) errors towards versus away from the desired conclusion as a proxy for motivated reasoning. Namely, you can’t easily differentiate “motivated reasoning” biases from perfectly innocent traditional optimizer’s curse. Suppose an organization is considering 20 possible interventions and do initial cost-effectiveness analyses for each of them. If they have a perfectly healthy and unbiased epistemic process, then the top 2 interventions that they’ve selected from that list would a) in expectation be better than the other 18 and b) in expectation will have more errors slanted towards higher impact rather than lower impact. If they then implement the top 2 interventions and do an impact assessment 1 year later, then I think it’s likely the original errors (not necessarily biases) from the initial assessment will carry through. External red-teamers will then discover that these errors are systematically biased upwards, but at least on first blush “naive optimizer’s curse issues” looks importantly different in form, mitigation measures, etc, from motivated reasoning concerns.
I think it’s likely that either formal Bayesian modeling or more qualitative assessments can allow us to differentiate the two hypotheses.
Here’s one possible way to distinguish the two: Under the optimizer’s curse + judgement stickiness scenario retrospective evaluation should usually take a step towards the truth, though it could be a very small one if judgements are very sticky! Under motivated reasoning, retrospective evaluation should take a step towards the “desired truth” (or some combination of truth an desired truth, if the organisation wants both).