Thanks for the post, Greg!
I think there are a couple of things worth mentioning that may allay some of the “small sample” cases somewhat. It’s true, for example, that we generally never have the ability to randomize countries into treatments, and that there simply aren’t enough countries to really test study designs that have a large number of conditions. However, we have some other options in our pocket for cases such as those. If we don’t really need to stay at the country level, we can do things like zoom in and randomize quite large groups of people within the country of interest. We can also use techniques like multilevel modeling to look at the effects at both levels at the same time. We also have quasi-experimental designs: in smoking cessation campaigns, for instance, researchers have used a pre-post design for each country or region: see what the baseline tobacco use is in that region, introduce your campaign, and see how smoking changes.
Now, this is not a perfect design. The country hasn’t been randomly assigned and there’s no control group. And we have the threats of maturation as well: what if people in that country would have cut back for another reason even without our campaign? That said, when we see a positive result in one country, and then two, and then five, ten, and fifteen different countries, we gain more confidence that it is not simply spurious. We can also compare what’s happening in the campaign countries to somewhat similar countries that did not get the intervention across the same timeframe. This is not a perfect control group. But it can help us to have more confidence in what we’re seeing.
So with this one counter-example, I’m basically arguing that we shouldn’t be thinking “RCT or bust.” Reality is simply too messy. But we still have a large number of tools at our disposal that will give us very good information. It’s not perfect information, but it’s the best we can do. And we can learn a whole lot from it.
(Another point worth making is that we can use meta-analyses to help determine what the true effect size may be across a number of smaller studies. Each study on its own may be under-powered, but if we have even five or ten of them, we can get a much, much better estimate. This approach can also help control for things like regional differences and failures of randomization.)
So as Will mentions, I think we should be working on a case-by-case basis to determine what the strongest possible research design would be in each case. We should also weigh the cost of collecting that best-case evidence against the cost of other possible research designs in order to find the right blend of methodological rigor and real-world practicality.