Minimal-trust investigations

Holden Karnofsky23 Nov 2021 18:02 UTC

163 points

Red teaming Independent impression Epistemic deference

This piece is about the single activity (“minimal-trust investigations”) that seems to have been most formative for the way I think.

Most of what I believe is mostly based on trusting other people.

For example:

I brush my teeth twice a day, even though I’ve never read a study on the effects of brushing one’s teeth, never tried to see what happens when I don’t brush my teeth, and have no idea what’s in toothpaste. It seems like most reasonable-seeming people think it’s worth brushing your teeth, and that’s about the only reason I do it.
I believe climate change is real and important, and that official forecasts of it are probably reasonably close to the best one can do. I have read a bunch of arguments and counterarguments about this, but ultimately I couldn’t tell you much about how the climatologists’ models actually work, or specifically what is wrong with the various skeptical points people raise.¹ Most of my belief in climate change comes from noticing who is on each side of the argument and how they argue, not what they say. So it comes mostly from deciding whom to trust.

I think it’s completely reasonable to form the vast majority of one’s beliefs based on trust like this. I don’t really think there’s any alternative.

But I also think it’s a good idea to occasionally do a minimal-trust investigation: to suspend my trust in others and dig as deeply into a question as I can. This is not the same as taking a class, or even reading and thinking about both sides of a debate; it is always enormously more work than that. I think the vast majority of people (even within communities that have rationality and critical inquiry as central parts of their identity) have never done one.

Minimal-trust investigation is probably the single activity that’s been most formative for the way I think. I think its value is twofold:

It helps me develop intuitions for what/whom/when/why to trust, in order to approximate the views I would hold if I could understand things myself.
It is a demonstration and reminder of just how much work minimal-trust investigations take, and just how much I have to rely on trust to get by in the world. Without this kind of reminder, it’s easy to casually feel as though I “understand” things based on a few memes or talking points. But the occasional minimal-trust investigation reminds me that memes and talking points are never enough to understand an issue, so my views are necessarily either based on a huge amount of work, or on trusting someone.

In this piece, I will:

Give an example of a minimal-trust investigation I’ve done, and list some other types of minimal-trust investigations one could do.
Discuss a bit how I try to get by in a world where nearly all my beliefs ultimately need to come down to trusting someone.

Example minimal-trust investigations

The basic idea of a minimal-trust investigation is suspending one’s trust in others’ judgments and trying to understand the case for and against some claim oneself, ideally to the point where one can (within the narrow slice one has investigated) keep up with experts.² It’s hard to describe it much more than this other than by example, so next I will give a detailed example.

Detailed example from GiveWell

I’ll start with the case that long-lasting insecticide-treated nets (LLINs) are a cheap and effective way of preventing malaria. I helped investigate this case in the early years of GiveWell. My discussion will be pretty detailed (but hopefully skimmable), in order to give a tangible sense of the process and twists/turns of a minimal-trust investigation.

Here’s how I’d summarize the broad outline of the case that most moderately-familiar-with-this-topic people would give:³

People sleep under LLINs, which are mosquito nets treated with insecticide (see picture above, taken from here).
The netting can block mosquitoes from biting people while they sleep. The insecticide also deters and kills mosquitoes.
A number of studies show that LLINs reduce malaria cases and death. These studies are rigorous—LLINs were randomly distributed to some people and not others, allowing a clean “experiment.” (The key studies are summarized in a Cochrane review, the gold standard of evidence reviews, concluding that there is a “saving of 5.6 lives each year for every 1000 children protected.”)
LLINs cost a few dollars, so a charity doing LLIN distribution is probably saving lives very cost-effectively.
Perhaps the biggest concern is that people might not be using the LLINs properly, or aren’t using them at all (e.g., perhaps they’re using them for fishing).

When I did a minimal-trust investigation, I developed a picture of the situation that is pretty similar to the above, but with some important differences. (Of all the minimal-trust investigations I’ve done, this is among the cases where I learned the least, i.e., where the initial / conventional wisdom picture held up best.)

First, I read the Cochrane review in its entirety and read many of the studies it referenced as well. Some were quite old and hard to track down. I learned that:

The original studies involved very intense measures to make sure people were using their nets properly. In some cases these included daily or weekly visits to check usage. Modern-day LLIN distributions don’t do anything like this. This made me realize that we can’t assume a charity’s LLIN distributions are resulting in proper usage of nets; we need to investigate modern-day LLIN usage separately.
The most recent randomized study was completed in 2001, and there won’t necessarily ever be another one.⁴ In fact, none of the studies were done on LLINs—they were done on nets treated with non-long-lasting insecticide, which had to be re-treated periodically. This made me realize that anything that’s changed since 2001 could change the results observed in the studies. Changes could include how prevalent malaria is in the first place (if it has fallen for other reasons, LLINs might do less good than the studies would imply), how LLIN technology has changed (such as moving to the “long-lasting” approach), and the possibility that mosquitoes have evolved resistance to the insecticides.

This opened up a lot of further investigation, in an attempt to determine whether modern-day LLIN distributions have similar effects to those observed in the studies.

We searched for general data on modern-day usage, on changes in malaria prevalence, and on insecticide resistance. This data was often scattered (so we had to put a lot of work into consolidating everything we could find into a single analysis), and hard to interpret (we couldn’t tell how data had been collected and how reliable it was—for example, a lot of the statistics on usage of nets relied on simply asking people questions about their bednet usage, and it was hard to know whether people might be saying what they thought the interviewers wanted to hear). We generally worked to get the raw data and the full details of how the data was collected to understand how it might be off.
We tried to learn about the ins and outs of how LLINs are designed and how they compare to the kinds of nets that were in the studies. This included things like reviewing product descriptions from the LLIN manufacturers.
We did live visits to modern-day LLIN distributions, observing the distribution process, the LLINs hanging in homes, etc. This was a very imperfect way of learning, since our presence on site was keenly felt by everyone. But we still made observations such as “It seems this distribution process would allow people to get and hoard extra nets if they wanted” and “A lot of nets from a while ago have a lot of holes in them.”
We asked LLIN distribution charities to provide us with whatever data they had on how their LLINs were being used, and whether they were in fact reducing malaria.
- Against Malaria Foundation was most responsive on this point—it was able to share pictures of LLINs being handed out and hung up, for example.
- But at the time, it didn’t have any data on before-and-after malaria cases (or deaths) in the regions it was working in, or on whether LLINs remained in use in the months or years following distribution. (Later on, it added processes for the latter and did some of the former, although malaria case data is noisy and we ultimately weren’t able to make much of it.)
- We’ve observed (from post-distribution data) that it is common for LLINs to have huge holes in them. We believe that the insecticide is actually doing most of the work (and was in the original studies as well), and that simply killing many mosquitoes (often after they bite the sleeper) could be the most important way that LLINs help. I can’t remember how we came to this conclusion.
We spoke with a number of people about our questions and reservations. Some made claims like “LLINs are extremely proven—it’s not just the experimental studies, it’s that we see drops in malaria in every context where they’re handed out.” We looked for data and studies on that point, put a lot of work into understanding them, and came away unconvinced. Among other things, there was at least one case in which people were using malaria “data” that was actually estimates of malaria cases—based on the assumption that malaria would be lower where more LLINs had been distributed. (This means that they were assuming LLINs reduce malaria, then using that assumption to generate numbers, then using those numbers as evidence that LLINs reduce malaria. GiveWell: “So using this model to show that malaria control had an impact may be circular.”)

My current (now outdated, because it’s based on work I did a while ago) understanding of LLINs has a lot of doubt in it:

I am worried about the possibility that mosquitoes have developed resistance to the insecticides being used. There is some suggestive evidence that resistance is on the rise, and no definitive evidence that LLINs are still effective. Fortunately, LLINs with next-generation insecticides are now in use (and at the time I did this work, these next-generation LLINs were in development).⁵
I think that people are probably using their LLINs as intended around 60-80% of the time, which is comparable to the usage rates from the original studies. This is based both on broad cross-country surveys⁶ and on specific reporting from the Against Malaria Foundation.⁷ Because of this, I think it’s simultaneously the case that (a) a lot of LLINs go unused or misused; (b) LLINs are still probably having roughly the effects we estimate. But I remain nervous that real LLIN usage could be much lower than the data indicates.
- As an aside, I’m pretty underwhelmed by concerns about using LLINs as fishing nets. These concerns are very media-worthy, but I’m more worried about things like “People just never bother to hang up their LLIN,” which I’d guess is a more common issue. The LLIN usage data we use would (if accurate) account for both.
I wish we had better data on malaria case rates by region, so we could understand which regions are most in need of LLINs, and look for suggestive evidence that LLINs are or aren’t working. (GiveWell has recently written about further progress on this.)

But all in all, the case for LLINs holds up pretty well. It’s reasonably close to the simpler case I gave at the top of this section.

For GiveWell, this end result is the exception, not the rule. Most of the time, a minimal-trust investigation of some charitable intervention (reading every study, thinking about how they might mislead, tracking down all the data that bears on the charity’s activities in practice) is far more complicated than the above, and leads to a lot more doubt.

Other examples of minimal-trust investigations

Some other domains I’ve done minimal-trust investigations in:

Medicine, nutrition, quantitative social science (including economics). I’ve grouped these together because a lot of the methods are similar. Somewhat like the above, this has usually consisted of finding recent summaries of research, tracking down and reading all the way through the original studies, thinking of ways the studies might be misleading, and investigating those separately (often hunting down details of the studies that aren’t in the papers).
- I have links to a number of writeups from this kind of research here, although I don’t think reading such pieces is a substitute for doing a minimal-trust investigation oneself.
- My Has Life Gotten Better? series has a pretty minimal-trust spirit. I haven’t always checked the details of how data was collected, but I’ve generally dug down on claims about quality of life until I could get to systematically collected data. In the process, I’ve found a lot of bad arguments floating around.
Analytic philosophy. Here a sort of “minimal-trust investigation” can be done without a huge time investment, because the main “evidence” presented for a view comes down to intuitive arguments and thought experiments that a reader can evaluate themselves. For example, a book like The Conscious Mind more-or-less walks a layperson reader through everything needed to consider its claims. That said, I think it’s best to read multiple philosophers disagreeing with each other about a particular question, and try to form one’s own view of which arguments seem right and what’s wrong with the ones that seem wrong.
Finance and theoretical economics. I’ve occasionally tried to understand some well-known result in theoretical economics by reading through a paper, trying to understand the assumptions needed to generate the result, and working through the math with some examples. I’ve often needed to read other papers and commentary in order to notice assumptions that aren’t flagged by the authors.
Checking attribution. A simple, low-time-commitment sort of minimal-trust investigation: when person A criticizes person B for saying X, I sometimes find the place where person B supposedly said X and read thoroughly, trying to determine whether they’ve been fairly characterized. This doesn’t require having a view on who’s right—only whether person B seems to have meant what person A says they did. Similarly, when someone summarizes a link or quotes a headline, I often follow a trail of links for a while, reading carefully to decide whether the link summary gives an accurate impression.
- I’ve generally been surprised by how often I end up thinking people and links are mischaracterized.
- At this point, I don’t trust claims of the form “person A said X” by default, almost no matter who is making them, and even when a quote is provided (since it’s so often out of context).

And I wish I had time to try out minimal-trust investigations in a number of other domains, such as:

History. It would be interesting to examine some debate about a particular historical event, reviewing all of the primary sources that either side refers to.
Hard sciences. For example, taking some established finding in physics (such as the Schrodinger equation or Maxwell’s equations) and trying to understand how the experimental evidence at the time supported this finding, and what other interpretations could’ve been argued for.
Reference sources and statistics. I’d like to take a major Wikipedia page and check all of its claims myself. Or try to understand as much detail as possible about how some official statistic (US population or GDP, for example) is calculated, where the possible inaccuracies lie, and how much I trust the statistic as a whole.
AI. I’d like to replicate some key experimental finding by building my own model (perhaps incorporating this kind of resource), trying to understand each piece of what’s going on, and seeing what goes differently if I make changes, rather than trusting an existing “recipe” to work. (This same idea could be applied to building other things to see how they work.)

Minimal-trust investigations look different from domain to domain. I generally expect them to involve a combination of “trying to understand or build things from the ground up” and “considering multiple opposing points of view and tracing disagreements back to primary sources, objective evidence, etc.” As stated above, an important property is trying to get all the way to a strong understanding of the topic, so that one can (within the narrow slice one has investigated) keep up with experts.

I don’t think exposure to minimal-trust investigations ~ever comes naturally via formal education or reading a book, though I think it comes naturally as part of some jobs.

Navigating trust

Minimal-trust investigations are extremely time-consuming, and I can’t do them that often. 99% of what I believe is based on trust of some form. But minimal-trust investigation is a useful tool in deciding what/whom/when/why to trust.

Trusting arguments. Doing minimal-trust investigations in some domain helps me develop intuitions about “what sort of thing usually checks out” in that domain. For example, in social sciences, I’ve developed intuitions that:

Selection bias effects are everywhere, and they make it really hard to draw much from non-experimental data. For example, eating vegetables is associated with a lot of positive life outcomes, but my current view is that this is because the sort of people who eat lots of vegetables are also the sort of people who do lots of other “things one is supposed to do.” So people who eat vegetables probably have all kinds of other things going for them. This kind of dynamic seems to be everywhere.
Most claims about medicine or nutrition that are based on biological mechanisms (particular proteins, organs, etc. serving particular functions) are unreliable. Many of the most successful drugs were found by trial-and-error, and their mechanism remained mysterious long after they were found.
Overall, most claims that X is “proven” or “evidence-backed” are overstated. Social science is usually complex and inconclusive. And a single study is almost never determinative.

Trusting people. When trying to understand topic X, I often pick a relatively small part of X to get deep into in a minimal-trust way. I then look for people who seem to be reasoning well about the part(s) of X I understand, and put trust in them on other parts of X. I’ve applied this to hiring and management as well as to forming a picture of which scholars, intellectuals, etc. to trust.

There’s a lot of room for judgment in how to do this well. It’s easy to misunderstand the part of X I’ve gotten deep into, since I lack the level of context an expert would have, and there might be some people who understand X very well overall but don’t happen to have gotten into the weeds in the subset I’m focused on. I usually look for people who seem thoughtful, open-minded and responsive about the parts of X I’ve gotten deep into, rather than agreeing with me per se.

Over time, I’ve developed intuitions about how to decide whom to trust on what. For example, I think the ideal person to trust on topic X is someone who combines (a) obsessive dedication to topic X, with huge amounts of time poured into learning about it; (b) a tendency to do minimal-trust investigations themselves, when it comes to topic X; (c) a tendency to look at any given problem from multiple angles, rather than using a single framework, and hence an interest in basically every school of thought on topic X. (For example, if I’m deciding whom to trust about baseball predictions, I’d prefer someone who voraciously studies advanced baseball statistics and watches a huge number of baseball games, rather than someone who relies on one type of knowledge or the other.)

Conclusion

I think minimal-trust investigations tend to be highly time-consuming, so it’s impractical to rely on them across the board. But I think they are very useful for forming intuitions about what/whom/when/why to trust. And I think the more different domains and styles one gets to try them for, the better. This is the single practice I’ve found most (subjectively) useful for improving my ability to understand the world, and I wish I could do more of it.

Footnotes

I do recall some high-level points that seem compelling, like “No one disagrees that if you just increase the CO₂ concentration of an enclosed area it’ll warm up, and nobody disagrees that CO₂ emissions are rising.” Though I haven’t verified either of those claims beyond noting that they don’t seem to attract much disagreement. And as I wrote this, I was about to add “(that’s how a greenhouse works)” but it’s not. And of course these points alone aren’t enough to believe the temperature is rising—you also need to believe there aren’t a bunch of offsetting factors—and they certainly aren’t enough to believe in official forecasts, which are far more complex. ↩
I think this distinguishes minimal-trust reasoning from e.g. naive epistemology. ↩
This summary is slightly inaccurate, as I’ll discuss below, but I think it is the most common case people would cite who are casually interested in this topic. ↩
From GiveWell, a quote from the author of the Cochrane review: “To the best of my knowledge there have been no more RCTs with treated nets. There is a very strong consensus that it would not be ethical to do any more. I don’t think any committee in the world would grant permission to do such a trial.” Though I last worked on this in 2012 or so, and the situation may have changed since then. ↩
More on insecticide resistance at https://www.givewell.org/international/technical/programs/insecticide-treated-nets/insecticide-resistance-malaria-control. ↩
See https://www.givewell.org/international/technical/programs/insecticide-treated-nets#Usage. ↩
See https://www.givewell.org/charities/amf#What_proportion_of_targeted_recipients_use_LLINs_over_time. ↩
I think this distinguishes minimal-trust reasoning from e.g. naive epistemology. ↩

What links here?

Holden Karnofsky23 Nov 2021 18:02 UTC

163 points

10 comments12 min readEA link

Red teaming Independent impression Epistemic deference

MichaelPlant 24 Nov 2021 12:44 UTC
39 points
0 ∶ 0
I have to say, I rather like putting a name to this concept. I know this wasn’t the upshot of the article, but it immediately struck me, on reading this, that it would be a good idea for the effective altruist community to engage in some minimal trust investigations of each other’s analyses and frame them as such.

I’m worried about there being too much deference and actually not very much criticism of the received wisdom. Part of the issue is that to criticise the views of smart, thoughtful, well-intentioned people in leadership positions might imply either that you don’t trust them (which is rude) or that you’re not smart and well-informed enough to ‘get it’; there are also the normal fears associated with criticising those with greater power.

These issues are somewhat addressed by saying “look, I have a lot of respect for X and assume there are right about lots of things, but I wanted to get to the bottom of this issue myself and not take anything they said for granted. So I did a ‘minimal-trust investigation’. Here’s what I found...”
- DirectedEvolution 27 Nov 2021 3:21 UTC
  12 points
  0 ∶ 0
  Parent
  I worry that, if adopted, an annoying fraction of people will use this term to mean “I looked at the citations for an article” rather than “I exhaustively looked at the evidence for X from multiple angles over a long period of time.”
  
  An “X-hour investigation” is a more precise claim. Including the references and sources they looked at, and a description of why they chose these, is a complement to saying how much time they’ve spent. In general, I like that this post illustrates what raising one’s research ambitions looks like.
  
  Holden: how many hours, roughly, do you think you spent on some of these minimal-trust investigations? And how many hours would you spend reading a given paper?
  - Holden Karnofsky 3 Jan 2022 19:55 UTC
    3 points
    0 ∶ 0
    Parent
    I wish I had a better answer, but it varies hugely by topic (and especially by how open-ended the question is). The example I give in the post was an early GiveWell investigation that played out over years, and took at least dozens of hours, maybe hundreds. Something like “checking attribution” can be under an hour. For a short-end “empirical social science” case, I can think of personal medical topics I’ve researched in a handful of hours (especially when I had previously researched similar topics and knew what I was looking for in the abstracts). I also don’t have a good answer to how long I spend on a particular study: I’ve definitely spent double-digit hours on an individual study before (and David Roodman has often gone much deeper, lowering the “trust” factor more than I ever have via things like reproducing someone’s calculations), but these are only for key studies—many studies can quickly be identified as having only small relevance to the question at hand.
    
    I don’t think I’ve defined “minimal-trust investigation” tightly enough to make it a hard term to abuse :) but I think it could be a helpful term nonetheless, including for the purpose Michael Plant proposes.
    - brb243 18 Jan 2022 15:15 UTC
      1 point
      0 ∶ 0
      Parent
      I would include the productivity of the reviewers and the scope of the investigations as factors of the time spent evaluating the evidence. For example, an investigator who analyzes the accuracy of key assumptions 10x faster and incorporates a 10x wider viewpoint can get 100x better conclusions than another reviewer spending the same time.
      I would also conduct an expected value cost-benefit analysis in deciding to what extent minimal-trust investigations’ insights are shared. For example, if EA can lose $1 billion because of outlining the questions regarding LLIN effectiveness with a 50% chance, because it loses appeal to some funders, but can gain $2 billion with 10% chance which can be used 3x more cost-effectively, then the investigation should be shared.
      If a better solution exists, such as keeping the LLIN cost-effectiveness as a cool entry point while later motivating people to devise solutions which generate high wellbeing impact across futures, then the LLIN questions can be shared on a medium accessible to more senior people while the impressive numbers exhibited publicly.
      Then, using the above example, EA can lose $1 billion invested in malaria with 90% likelihood, develop a solution that sustainably addresses the fundamental issues (astronomically greater cost-effectiveness than LLINs because of the scale of the future), and gain $10 billion to find further solutions.
      The question can be: can you keep speaking about systemic change intentions but difficulties with OPP while dropping questions so that the development and scale up of universally beneficial systemic solutions is supported?
MaxRa 26 Nov 2021 10:07 UTC
22 points
0 ∶ 0
Maybe a minimal-trust investigation hackathon could be a cool idea. For example a local EA chapter could spend a day digging into some claim together. Or it could be an online co-working investigation event.
Linch 1 Dec 2021 3:21 UTC
11 points
0 ∶ 0
I think minimum-trust investigations, red-teaming, and epistemic spot checks form a natural cluster. I’d be interested/excited to see more people draw an ontology of what this cluster looks like, what other approaches are in this cluster, and how people can prioritize between these options.
WilliamKiely🔸 29 Nov 2021 10:04 UTC
4 points
0 ∶ 0
I think the vast majority of people (even within communities that have rationality and critical inquiry as central parts of their identity) have never done one.

I think most people in such communities have done the low-time-commitment sort of minimal-trust investigations, such as:
- Checking attribution. A simple, low-time-commitment sort of minimal-trust investigation: when person A criticizes person B for saying X, I sometimes find the place where person B supposedly said X and read thoroughly, trying to determine whether they’ve been fairly characterized. This doesn’t require having a view on who’s right—only whether person B seems to have meant what person A says they did. Similarly, when someone summarizes a link or quotes a headline, I often follow a trail of links for a while, reading carefully to decide whether the link summary gives an accurate impression.
I do this sort of “checking attribution” minimal-trust investigation frequently and expect many others within the EA and rationality community do too.

I also sometimes dig a big deeper, e.g. when someone makes a claim about a study rather than a claim about what someone said. (E.g. I remember investigating some claims a guest on the Joe Rogan podcast made about the effects of plant agriculture on animal deaths.)

But in general, I think you are right that it’s quite rare for people to do the high-time-commitment versions of minimal trust investigations.

I can’t think of any examples of times that I’ve put in the enormous amount of work required to do more than a partial high-time-commitment minimal-trust investigation. I ~always stop after a handful of hours (or sometimes a bit longer) because of some combination of (a) it not seeming worth my time (e.g. because I have no training in evaluating studies and so it’s very time consuming for me to do so) and (b) laziness.
- Linch 1 Dec 2021 3:30 UTC
  2 points
  0 ∶ 0
  Parent
  Yeah I was surprised by that claim too. Here are just two of my comments on incidentally side-conversations of a single blog post, on unrelated topics (Warning: the main topic of that blog post is heavy+full of drama, and may not be worth people reading).
Joaquín Murcia 4 Jan 2025 12:13 UTC
1 point
0 ∶ 0
Great article! Useful term coined, rare and valuable real-case scenarios to understand how you do it. Nitpicking: to the last part on “ideal person to trust on topic X” I would add: (d) incentives to be truthful about the topic.
Example: An individual recommending a brand that sponsors him could be unreliable (unless you trust that individual to only choose sponsors with which he aligns).
To me this leads many times to (unintuitively) trusting someone who doesn’t work on topic X but has done such minimal trust investigation (e.g. a trusted science influencer cause of (a)(b)(c) giving advice on buying a home).
Seth Ariel Green 🔸 24 Aug 2023 14:40 UTC
1 point
0 ∶ 0
Hi there, really enjoying this piece (just discovered it). My grad school advisor often asks: “what evidence would convince a determined skeptic?” and I think that’s broadly in the same vein.
Incidentally, my entry to GiveWell’s Change Our Mind contest does for SMC what you did LLINs, though I came away much less convinced. I think the core difference between us is that I am, by default, skeptical of pre-replication-crisis research. I think that if you find papers from 20 years ago where the authors themselves say that their designs were underpowered to detect an effect, then the odds of successful replication (contingent on a new team getting all the implementation details right) are disquietingly low.
My beliefs on this were shaped by writing a pretty critical meta-analysis of the ‘contact hypothesis’. Lots of experts said that the salubrious effects of contact on prejudice had been proven beyond reasonable doubt, but when we zoomed in on the very strongest research, we just didn’t see it. Right around then, some political scientist ran some very nice intergroup contact experiments in post-conflict areas, and they found much less encouraging results (one, two).
Basically, I’ve come to believe that most published research findings are false, and I don’t give pre-replication-crisis studies the benefit of the doubt. But, as you say, if no IRB would give LLIN replication its approval, we’re kind of at a dead end.