Why we should be doing more systematic research

One of the big differences between Effective Altruism and other communities is our use of research to inform decisions. However, not all research is created equal. Systematic research can often lead to very different conclusions than more typical research approaches. Some areas of EA have been systematic in their research. Others have a lot to gain from improving in this area. In this article I’ll first give a quick definition of systematic research, then give some examples relevant to EA.

What is systematic research

The main thing that distinguishes systematic research from non-systematic is planning and thoroughness. An example of unplanned research is when you’re wandering page to page on Wikipedia. You think of a question that interests you or seems relevant, then a fact triggers another question which you then follow up with, etc. An example of planned research is a meta-analysis. Meta-analyses usually state in advance the question they are trying to answer, where they’ll look for answers, what studies will qualify and which won’t, etc. Another way to systematize research is to use spreadsheets to lay out all of the options and criteria you’ll consider in advance. You can see more examples here.

The other aspect is thoroughness. This usually happens because it’s planned, and if you follow a procedure, you are more likely to do the thing you already know you should do. For example, it’s a commonly accepted rationalist virtue to look for disconfirming evidence or look at more than two options/hypotheses, but it’s very easy to forget to do so if you haven’t explicitly added it to your system. Setting up a plan just makes you much more likely to follow best practices.

With that defined, below are some examples of where EA could benefit from more systematicness.

Places it’s under utilized in the EA movement

Relative to other communities, EA research is on average quite systematic. Charity evaluators in particular tend to be more systematic to avoid missing promising but less well known charities. However, there are several other areas that we could stand to be a lot more systematic. I will give three examples. A whole blog post or several could be written on each of these topics, but hopefully even with relatively little detail on each it will start to be clear there are areas EA could be a lot more systematic.

Historical research: A few EA organizations rely on historical evidence to support the claim that what they are doing is high impact, but history is a great example of an area where it’s easy to find support for or against any idea you’d like. For example, if you are making a case that protests have been successful in helping social movements succeed, it will be straightforward to find examples showing that is true. However, there will also be many examples where protests have either done nothing or even hurt a movement. A systematic way for the research to be done might be to compare the top 100 social movements (as decided by an unbiased selection criteria like a Wikipedia list) on an objective metric of success (e.g. passing a related law, like abolishing slavery or women getting the right to vote). Then you could list all of the different strategies they used, then pull correlations between the top 10 most used strategies and the success level of the movement. That would dramatically change the landscape on how seriously one could take a claim that X strategy is better than Y strategy based on historical evidence. Sadly, non-systematic research on this topic generally leads to results very similar to the researchers’ starting hypothesis, which makes it hard to weigh this evidence as a strong update in any direction.

Reference classes: A lot of EA debates come down to which reference class you choose, but this is very rarely done in a systematic way. Take for example a conversation I had where another EA and I were trying to determine the likelihood of a breakthrough happening in psychology. The discussion went back and forth with little progress, but then was largely helped when we did some systematic research with pre-set criteria. Specifically, we looked at how many psychologists have a Wikipedia page, which was generally seen as what we would expect to see at a minimum from a breakthrough, and compared this to how many people had a psychology PhD and were working in experimental psychology. Although this reference class was imperfect, it was objective, looked at both the hits and the misses instead of just the former, and helped inform our views.

Of course, specific reference classes might be hard to agree upon, but even when people do agree on them, very rarely does the research itself seem to be done, and often even if they are not perfectly agreed on, this can shed light on the issue. For example, systematic research could be done on hits based giving. There are certainly hits that can be listed, but what would be ideal is to look at the number and size of the hits compared to the number of attempts and the total resources put into gettings hits. This would definitely take time, but give a much more reliable result than intuition.

Criteria: EA has a pretty common set of criteria that several organizations have adopted: neglectedness, tractability and importance. The broad idea being that these criteria proxy the more difficult to directly quantify total good done per dollar spent metric. However, no systematic evaluation of how these criteria perform or what other criteria was considered and how well they fared has been put forward publicly. For example, what if evidence base correlates stronger with “good per $” than neglectedness does. What if there are several other criteria that should be used in combination to make a better model (maybe five criteria should be used instead of three)? Of course the words are sufficiently broad that much criteria you could list could fall under one or other (e.g., maybe evidence falls under tractability), but in practice, evaluating evidence base vs tractability leads to very different conclusions on the strength of many different cause areas. It seems unlikely that we have picked the best possible criteria for cause selection unless we have rigorously and systematically comparing several different options. If our criteria changes significantly, it seems very likely this could influence what is considered an EA cause area or not, so it certainly seems worth the additional time.

In summary, systematic research is slower and less fun than the alternative. However, I argue that the benefits of following best practices such as looking for disconfirming evidence and considering multiple options and hypotheses, more than make up for those costs, particularly for important issues like the examples above.