Hello! I’m Saloni Dattani and I work at Our World in Data.
I wanted to share an article I wrote recently for Our World in Data, where I explain how randomized controlled trials (RCTs) work and why (and when) they matter: http://ourworldindata.org/randomized-controlled-trials
Since RCTs are considered a high-quality source of evidence for our knowledge of the effects of treatments, policies, and interventions, I think it’s important to understand how they work to improve our ability to read scientific literature and to help us make better decisions.
I make two main arguments:
1. That RCTs are a powerful source of evidence because of the procedures that they are expected to follow. For each, I illustrate how they work and how they might affect the results of studies with examples.
A control group, which gives us the possibility to see a counterfactual (“what might happen otherwise”).
Placebos, which can account for placebo effects. For example, people may feel better from taking a pill because they expect it to make them feel better. More generally, these are changes that occur because of the procedure of the treatment rather than the treatment itself.
Randomization, which ensures that the two groups (control and treatment) have comparable risks (at the beginning of the study) of developing the outcome, and can enable us to attribute differences between them to whether they received the treatment.
Concealment and blinding, which prevent researchers and participants from knowing which group they are allocated to.
Pre-registration, a procedure where researchers declare in advance how they are going to carry out the study. This allows us see whether they’ve deviated from their plans, for example, because they found results that were disappointing. It can store research that is not published in a journal, because of ‘publication bias.’
Some other key features of RCTs that I mention but don’t detail: experimentation and intention-to-treat analysis.
Importantly, this means that when RCTs do not follow these procedures, this makes them less reliable. Sometimes other types of studies (apart from RCTs) follow some of these procedures, which strengthens them as sources of evidence.
2. That RCTs are particularly useful in some circumstances. I illustrate this with two examples.
We know that smoking has a large causal effect on the risk of lung cancer, without evidence from RCTs. This is because many lines of evidence converge on this conclusion, and other explanations fall short of accounting for the massive association that we see.[1]
In contrast, when scientists looked for treatments for HIV/AIDS, many candidate drugs they expected to work actually failed, while some that worked were unexpected and led to insights about how the virus caused disease.[2]
With this, I argue that RCTs matter when: we don’t have enough data from other lines of evidence, when we don’t know how to rule out other explanations, and when research is affected by biases (of the researcher, of participants, and publication bias). A catchier version is that they matter when we don’t know enough, when we’re wrong, and when we see what we want to see.
___
As I explain, I think understanding these ideas is very important because we point to evidence from RCTs when we want to evaluate treatments, policies and interventions.
Hopefully, the examples that I give are intuitive and help to apply these concepts more widely. If the points in the summary above are already familiar to you, hopefully there are some cool charts or examples that are still new to you.
I’m also happy to answer questions or correct errors, if you spot any. You can also contact me at saloni@ourworldindata.org or find me at @salonium on Twitter. Thank you!
- ^
If you are interested in reading more about the smoking/lung cancer debate, I highly recommend these two papers.
The first is a fascinating paper from 1960 that summarises the evidence that existed at the time, and why that led epidemiologists to be confident that the rise in lung cancer was a result of smoking cigarettes.
(It was highly influential at the time, and also the first example of ‘sensitivity analysis’ which is used in epidemiology to find out whether there might be remaining confounding in a study.)Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., & Wynder, E. L. (1959). Smoking and Lung Cancer: Recent Evidence and a Discussion of Some Questions. JNCI: Journal of the National Cancer Institute. Available as a PDF here.
The second is a paper summarising the history of the smoking/lung cancer debate, and the counterclaims made by some famous statisticians (such as Ronald Fisher).
Hill, G., Millar, W., & Connelly, J. (2003). “The Great Debate”: Smoking, Lung Cancer, and Cancer Epidemiology. Canadian Bulletin of Medical History, 20(2), 367-386. Available as a PDF here.
- ^
If you’re interested in the history of the discovery of HIV/AIDS treatments and the role of clinical trials, I strongly recommend this book chapter. It includes discussion about how activists pushed for these trials to be accelerated and de-regulated.
National Research Council (U.S.) (1993). The social impact of AIDS in the United States. 4. Clinical Research and Drug Regulation. National Academy Press. Available in full text here.
One indirect advantage of RCTs is that I’d guess (I’d imagine this has been tested somewhere) that they are easier to understand compared to other causal inference methods. Maybe that makes it easier to pitch to people who aren’t trained in statistics (often policy makers).
Not sure of this though...
Actually I could be incorrect. I think Eva Vivalt has a paper on this (no time to dig up right now).
If you do find it, I’d be interested to read that.
I would guess that it’s difficult for people to intuitively understand precisely why randomization is so useful, although other aspects of RCTs are probably easier to grasp – particularly, the experimental part of giving treatment A to one group and treatment B to another group and following up their outcomes. But overall I think I would agree with you; people need less understanding of confounders and selection bias to read an RCT than they’d need to read an observational study.
I think it’s this paper http://evavivalt.com/wp-content/uploads/Weighing-the-Evidence.pdf. Fwiw, all of Eva’s papers are worth reading!
Sidenote—love your work and WiP (I’m also part of the PS community). Hope to see you on the EAF again!
Oh, I remember reading this paper now! It’s great, thanks for sharing.
And thank you very much :) I will be here more often for sure.
Thanks for writing this—I think it’s accessible, informative, and interesting, which is difficult to pull off when writing about research methods!
I think it’s telling that all the examples of the effectiveness of RCTs in this article come from clinical trials. However, you don’t limit yourself to this domain in the headline or summary of the article (e.g. “How would we know about the effects of a new idea, treatment or policy?”).
Our World in Data is often used by people (including myself) to gather development data. So I think it could be worth adding a caveat that many of the strengths you discuss in the article don’t apply to RCTs conducted on social programs or policies. For example, it’s difficult or impossible to have double-blinding or a placebo group; it’s difficult to randomize effectively due to spillover effects; it’s harder to get a large sample size when you’re studying effects on villages or countries; and generalization is far more difficult (while a drug that works for a Brazilian is likely to work for an Indonesian, but a policy that works in Brazil is unlikely to have the same effect in Indonesia).
Hey Stephen, thanks very much!
I completely agree with you on the differences between clinical RCTs and development/public policy RCTs.
Part of the reason for that is that it was originally meant to be a longer piece, with some policy RCT examples, how clustering works, etc. but it was already fairly long, and those were harder to explain concisely. And secondly simply because I have a background in health/medicine, which meant it was easy to draw examples from the field.
Hopefully I signposted this a little by saying that the procedures I mention are those found in medicine / clinical RCTs, but from your comment I think it was probably not enough. I’ll think about this and clarify or add some caveats to the article that make it clearer. Thank you!
Big fan of OWID!
I liked the example in the article you linked about depression diagnosis—the way that after receiving a diagnosis about major depressive disorder, some number of patients over time no longer have depression, even when they do not receive any treatment. This seems especially true in medicine in general, it was at least true for medieval times where sometimes the best chances someone had for life were if they had the least amount of medical interventions—that is still true to an extent today.
Plus there is an effect that if you for example have cancer and you have a consultation about what the best move forward is, whatever expertise area the doctor consulting you has is going to be the likeliest treatment you’ll choose to take, probably related to their highest confidence level in that method working since they know it the best. But there are no doctors that specialise in just doing nothing, or doing something equivalent to that. Some solid RCT work here could shift the balance a bit more and result in healthier people with less unnecessary interventions.
Thanks for posting it here and for your work at OWID!
Do you have any thoughts on how to scale RCTs to larger, messier projects? By now, the EA community has more resources at their hands and the results for small RCTs might not scale to larger interventions.
Have you thought of ways in which RCTs could still be leveraged for large-scale interventions or are they just too hard to make work, e.g. on the policy level?
Hey Marius, thank you!
I wish I could answer this better, but I don’t know enough to have a good answer to how to scale policy RCTs, especially since they’re quite different from clinical RCTs (they often can’t administer the treatment in a standardised way, there’s usually no way to blind participants to what they’re receiving, they usually don’t track/measure participants as regularly, etc.) Though those are also factors that make them messier in larger projects.
I’ve read this blog post by Michael Clemens, which I found was a useful summary of two books on the topic: https://cgdev.org/blog/scaling-programs-effectively-two-new-books-potential-pitfalls-and-tools-avoid-them
But I think there are often situations where they can be leveraged for large-scale interventions. A good recent example is this experiment on street lighting and its effect in reducing crime. There are some features of the policy make it easier to study at scale. Crime data exists at the right scale (you don’t need to track individual participants to find out about crime rates), streetlighting is easy to standardise, you can measure the effects at the level of neighbourhood clusters rather than at the level of individuals. So maybe that’s a good way of thinking about how to scale up RCTs—to find treatments and outcomes that are easier to implement and measure at a large scale.
I enjoyed your post a lot. Lant Pritchett is a prominent critic of using RCTs for large scale social interventions—he might be worth reading.
Thank you very much!
Is there a paper by him you would recommend reading on the topic? I’ve seen this one, which I agree with in parts – with good theory and evidence from other research on which policies work, there’s less need for RCTs, but I think there’s a role for both to answer different questions.
most of his blogs for centre for global development are relevant. His recent paper on “randomizing development: method or madness?” contains most of his main arguments. He also has a blog called Lantrant where he frequently criticises the use of RCTs in economics. In my view, almost all of his critiques are correct.