Reality is often underpowered
Introduction
When I worked as a doctor, we had a lecture by a paediatric haematologist, on a condition called Acute Lymphoblastic Leukaemia. I remember being impressed that very large proportions of patients were being offered trials randomising them between different treatment regimens, currently in clinical equipoise, to establish which had the edge. At the time, one of the areas of interest was, given the disease tended to have a good prognosis, whether one could reduce treatment intensity to reduce the long term side-effects of the treatment whilst not adversely affecting survival.
On a later rotation I worked in adult medicine, and one of the patients admitted to my team had an extremely rare cancer,[1] with a (recognised) incidence of a handful of cases worldwide per year. It happened the world authority on this condition worked as a professor of medicine in London, and she came down to see them. She explained to me that treatment for this disease was almost entirely based on first principles, informed by a smattering of case reports. The disease unfortunately had a bleak prognosis, although she was uncertain whether this was because it was an aggressive cancer to which current medical science has no answer, or whether there was an effective treatment out there if only it could be found.
I aver that many problems EA concerns itself with are closer to the second story than the first. That in many cases, sufficient data is not only absent in practice but impossible to obtain in principle. Reality is often underpowered for us to wring the answers from it we desire.
Big units of analysis, small samples
The main driver of this problem for âEA topicsâ is that the outcomes of interest have units of analysis for which the whole population (leave alone any sample from it) is small-n: e.g. outcomes at the level of a whole company, or a whole state, or whole populations. For these big unit of analysis/âsmall sample problems, RCTs face formidable in principle challenges:
Even if by magic you could get (e.g.) all countries on earth to agree to randomly allocate themselves to policy X or Y, this is merely a sample size of ~200. If youâre looking at companies relevant to cage-free campaigns, or administrative regions within a given state, this can easily fall another order of magnitude.
These units of analysis tend highly heterogeneous, almost certainly in ways that affect the outcome of interest. Although the key âselling pointâ of the RCT is it implicitly controls for all confounders (even ones you donât know about), this statistical control is a (convex) function of sample size, and isnât hugely impressive at ~ 100 per arm: it is well within the realms of possibility for the randomisation happen to give arms with unbalanced allocation of any given confounding factor.
âRoughlyâ (in expectation) balanced intervention arms are unlikely to be good enough in cases where the intervention is expected to have much less effect on the outcome than other factors (e.g. wealth, education, size, whatever), thus an effect size that favours one arm or the other can be alternatively attributed to one of these.
Supplementing this raw randomisation by explicitly controlling for confounders you suspect (cf. block randomisation, propensity matching, etc.) has limited value when donât know all the factors which plausibly âswampâ the likely intervention effect (i.e. you donât have a good predictive model for the outcome but-for the intervention tested). In any case, they tend to trade-off against the already scarce resource of sample size.
These âsmall sampleâ problems arenât peculiar to RCTs, but endemic to all other empirical approaches. The wealth of econometric and quasi-experimental methods (e.g. IVs, regression discontinuity analysis), still run up against hard data limits, as well those owed to in whatever respect they fall short of the âidealâ RCT set-up (e.g. imperfect instrumentation, omitted variable bias, nagging concerns about reverse causation). Qualitative work (case studies, etc.) have the same problems even if other ones (e.g. selection) loom larger.
Value of information and the margin of common-sense
None of this means such work has zero valueâbig enough effect sizes can still be reliably detected, and even underpowered studies still give us information. But we may learn very little on the margin of common sense. Suppose we are interested in âwhat makes social movements succeed or fail?â and we retrospectively assess a (somehow) representative sample of social movements. It seems plausible the results of this investigation is the big (and plausibly generalisable) hits may prove commonsensical (e.g. âSocial movements are more likely to grow if members talk to other people about the social movementâ), whilst the ânew lessonsâ remain equivocal and uncertain.
We should expect to see this if we believe the distribution of relevant effect sizes is heavy-tailed, with most of the variance in (say) social movement success owed to a small number of factors, with the rest comprised of a large multitude of smaller effects. In such case, modest increases in information (e.g. from small sample data) may bring even more modest increases in either explaining the outcome or identifying what contributes to it:
Toy example, where we propose a roughly pareto distribution of effect size among contributory factors. The largest factors (which nonetheless explain a minority of the variance) may prove to be obvious to the naked eye (blue). Adding in the accessible data may only slightly lower detection threshold, with modest impacts on identifying further factors (green) and overall accuracy. The great bulk of the variance remains in virtue of a large ensemble of small factors which cannot be identified (red). Note that detection threshold tends to have diminishing returns with sample size.
The scientific revolution for doing good?
The foregoing should not be read as general scepticism to using data. The triumphs of evidence-based medicine, although not unalloyed, have been substantial, and there remain considerable gains that remain on the table (e.g. leveraging routine clinical practice). The ârandomistaâ trend in international development is generally one to celebrate, especially (as I understand) it increasingly aims to isolate factors that have credible external validity. The people who run cluster-randomised, stepped-wedge, and other study designs with big units of analysis are not ignorant of their limitations, and can deploy these judiciously.
But it should temper our enthusiasm about how many insights we can glean by getting some data and doing something sciency to it.[2] The early successes of EA in global health owes a lot to this being one of the easier areas to get crisp, intersubjective and legible answers from a wealth of available data. For many to most other issues, data-driven demonstration of âwhat really worksâ will never be possible.
We see that people do better than chance (or better than others) in terms of prediction and strategic judgement. Yet, at least judging by the superforecasters (this writeup by AI impacts is an excellent overview), how they do is much more indirectly data-driven: one may have to weigh between several facially-relevant âbase ratesâ, adjusting these rates by factors where the coefficient may be estimated by role in loosely analogous cases, and so forth.[3] Although this process may be informed by statistical and numerical literacy (e.g. decomposition, âfermi-izationâ), it seems to me the main action going on âunder the hoodâ is developing a large (and implicit, and mostly illegible) set of gestalts and impressions to determine how to âweighâ relevant data that is nonetheless fairly remote to the question at issue.[4]
Three final EA takeaways:
Most who (e.g.) write up a case study or a small-sample analysis tend to be well aware of the limitations of their work. Nonetheless I think it is worth paying more attention to how these bear on overall value of information before one embarks on these pieces of work. Small nuggets of information may not be worth the time to excavate even when the audience are ideal reasoners. As they arenât, one risks them (or yourself) over-weighing their value when considering problems which should demand tricky aggregation of a multitude of data sources.
There can be good reasons why expert communities in some areas havenât tried to use data explicitly to answer problems in their field. In these cases, the âcalling cardâ of EA-style analysis of doing this anyway can be less of a disruptive breakthrough and more a stigma of intellectual naivete.
In areas where âbeing driven by the dataâ isnât a huge advantage, it can be hard to identify an âedgeâ that the EA community has. There are other candidates: investigating topics neglected by existing work, better aligned incentives, etc. We should be sceptical of stories which boil down a generalized âEA exceptionalismâ.
- â©ïž
Its name escapes me, although arguably including it would risk deductive disclosure. To play it safe Iâve obfuscated some details.
- â©ïž
And statistics and study design generally prove hard enough that experts often go wrong. Given the EA communityâs general lack of cultural competence in these areas, I think their (generally amateur) efforts at the same have tended to fare worse.
- â©ïž
I take as supportive evidence a common feature among superforecasters is they read a lotânot just in areas closely relevant to their forecasts, but more broadly across history, politics, etc.
- â©ïž
Something analogous happens in other areas of âexpert judgementâ, whereby experts may not be able to explain why they made a given determination. We know that this implicit expert judgement can be outperformed by simple âreasoned rulesâ. My suspicion, however, is it still performs better than chance (or inexpert judgement) when such rules are not available.
- ReÂsults from the First Decade Review by 13 May 2022 15:01 UTC; 163 points) (
- InÂterÂvenÂtion ReÂport: CharÂter Cities by 12 Jun 2021 13:54 UTC; 126 points) (
- If you value fuÂture peoÂple, why do you conÂsider near term effects? by 8 Apr 2020 15:21 UTC; 95 points) (
- What is the likeÂliÂhood that civÂiÂlizaÂtional colÂlapse would cause techÂnologÂiÂcal stagÂnaÂtion? (outÂdated reÂsearch) by 19 Oct 2022 17:35 UTC; 83 points) (
- EA UpÂdates for OcÂtoÂber 2019 by 1 Nov 2019 11:47 UTC; 52 points) (
- 29 Dec 2021 7:39 UTC; 38 points) 's comment on TechÂnocÂracy vs popÂulism (inÂcludÂing thoughts on the democratisÂing risk paÂper and its reÂsponses) by (
- 1 Jul 2020 2:12 UTC; 36 points) 's comment on Iâm Linch Zhang, an amÂaÂteur COVID-19 foreÂcaster and genÂerÂalÂist EA. AMA by (
- EvalÂuÂatÂing exÂperÂtise: a clear box model by 15 Oct 2020 14:18 UTC; 36 points) (LessWrong;
- PicÂture Frames, WinÂdow Frames and Frameworks by 3 Nov 2019 22:09 UTC; 35 points) (LessWrong;
- InÂterÂnaÂtional ReÂlaÂtions; States, RaÂtional AcÂtors, and Other ApÂproaches (Policy and InÂterÂnaÂtional ReÂlaÂtions Primer Part 4) by 22 Jan 2020 8:29 UTC; 27 points) (
- EA FoÂrum Prize: WinÂners for OcÂtoÂber 2019 by 11 Dec 2019 10:37 UTC; 23 points) (
- 19 Mar 2021 17:15 UTC; 21 points) 's comment on EA capÂiÂtal alÂloÂcaÂtion is an inÂner ring by (
- Book sumÂmary: Selfish ReaÂsons to Have More Kids by 7 Oct 2021 20:19 UTC; 21 points) (LessWrong;
- 17 Mar 2020 18:17 UTC; 19 points) 's comment on AMA: Leah EdgerÂton, ExÂecÂuÂtive DirecÂtor of AnÂiÂmal CharÂity Evaluators by (
- 31 Oct 2022 12:03 UTC; 6 points) 's comment on Map of BioseÂcuÂrity Interventions by (
- 24 Dec 2021 3:24 UTC; 1 point) 's comment on MarÂcel Dâs Quick takes by (
Thanks GregâI really enjoyed this post.
I donât think that this is what youâre saying, but I think if someone drew the lesson from your post that, when reality is underpowered, thereâs no point in doing research into the question, that would be a mistake.
When I look at tiny-n sample sizes for important questions (e.g.: âHow have new ideas made major changes to the focus of academic economics?â or âWhy have social movements collapsed in the past?â), I generally donât feel at all like Iâm trying to get a p<0.05 ; it feels more like hypothesis generation. So when I find out that Kahneman and Tversky spent 5 years honing the article Prospect Theory into a form that could be published in an economics journal, I think âwow, ok, maybe thatâs the sort of time investment that we should be thinking ofâ. Or when I see social movements collapse because of in-fighting (e.g. pre-Copenhagen UK climate movement), or romantic disputes between leaders (e.g. Objectivism), thenâinsofar as we just want to take all the easy wins to mitigate catastrophic risks to the EA communityâI know that this risk is something to think about and focus on for EA.
For these sorts of areas, the right approach seems to be granular qualitative researchâtrying to really understand in depth what happened in some other circumstance, and then think through what lessons that entail for the circumstance youâre interested in. I think that, as a matter of fact, EA does this quite a lot when relevant. (E.g. Grace on Szilard, or existing EA discussion of previous social movements). So I think this gives us extra reason to push against the idea that âEA-style analysisâ = âquant-y RCT-esque analysisâ rather than âwhatever research methods are most appropriate to the field at handâ. But even on qualitative research I think the âEA mindsetâ can be quite distinctiveâcertainly I think, for example, that a Bayesian-heavy approach to historical questions, often addressing counterfactual questions, and looking at those issues that are most interesting from an EA perspective (e.g. how modern-day values would be different if Christianity had never taken off), would be really quite different from almost all existing historical research.
Thanks, Will!
I definitely agree we can look at qualitative data for hypothesis generation (after all, n=1 is still an existence proof). But Iâd generally recommend breadth-first rather than depth-first if weâre trying to adduce considerations.
For many/âmost sorts of policy decisions although we may find a case of X (some factor) --> Y (some desirable outcome), we can probably also find cases of ÂŹX --> Y and X --> ÂŹY. E.g., contrasting with what happened with prospect theory, there are also cases where someone happened on an important breakthrough with much less time/âeffort, or where people over-committed to an intellectual dead-end (naturally, partisans of X or ÂŹX tend to be good at cultivating sets of case-studies which facially support the claim it leads to Y.)
I generally see getting a steer of the correlation of X and Y (so the relative abundance of (ÂŹ/â)X --> (ÂŹ/â)Y across a broad reference class as more valuable than determining whether in a given case (even one which seems nearby to the problem weâre interested in) X really was playing a causal role in driving Y. Problems of selection are formidable, but I take the problems of external validity to tend even worse (and worse enough to make the former have a better ratio of insight:resources).
Thus Iâd be much more interested to see (e.g.) a wide survey of cases which suggests movements prone to in-fighting tend to be less successful than an in depth look of how in-fighting caused the destruction of a nearby analogue to the EA community. Ditto the âmacroâ in macrohistory being at least partly about trying to adduce takeaways across history, as well as trying to divine its big contours.
And although I think work like this is worthwhile to attempt, I think in some instances we may come to learn that reality is so underpowered that thereâs essentially no point doing research (e.g. maybe large bits of history are just ultra-chaotic, so all we can ever see is noise).
I agree with your points, but from my perspective they somewhat miss the mark.
Specifically, your discussion seems to assume that we have a fixed, exogenously given set of propositions or factors X, Y, âŠ, and that our sole task is to establish relations of correlation and causation between them. In this context, I agree on preferring âwide surveysâ etc.
However, in fact, doing research also requires the following tasks:
Identify which factors X, Y, ⊠to consider in the first place.
Refine the meaning of the considered factors X, Y, ⊠by clarifying their conceptual and hypothesized empirical relationships to other factors.
Prioritize which of the myriads of possible correlational or causal relationships between the factors X, Y, ⊠to test.
I think that depth can help with these three tasks in ways in which breadth canât.
For instance, in Willâs example, my guess is that the main value of considering the history of Objectivism does not come from moving my estimate for the strength of the hypothesis âX = romantic involvement between movement leaders â Y = movement collapsesâ. Rather, the source of value is including âromantic involvement between movement leadersâ into the set of factors Iâm considering in the first place. Only then am I able to investigate its relation to outcomes of interests, whether by a âwide survey of casesâ or otherwise. Moreover, I might only have learned about the potential relevance of âromantic involvement between movement leadersâ by looking at some depth into the history of Objectivism. (I know very little about Objectivism, and so donât know if this is true in this instance; itâs certainly possible that the issue of romantic involvement between Objectivist leaders is so well known that it would be mentioned in any 5-sentence summary one would encounter during a breadth-first process. But it also seems possible that itâs not, and Iâm sure I could come up with examples where the interesting factor was buried deeply.)
My model here squares well with your observation that a âcommon feature among superforecasters is they read a lotâ, and in fact makes a more specific prediction: I expect that weâd find that superforecasters read a fair amount (say, >10% of their total reading) of deep, small-n case studiesâfor example, historical accounts of a single war, economic policy, or biographies.
[My guess is that my comment is largely just restating Willâs points from his above comment in other words.]
(FWIW, I think some generators of my overall model here are:
Frequently experiencing disagreements I have with others, especially around AI timelines and takeoff scenarios, as noticing a thought like âUh⊠I just think your overall model of the world lacks depth and detail.â rather than âWait, Iâve read about 50 similar cases, and only 10 of them are consistent with your claimâ.
Semantic holism, or at least some of the arguments usually given in its favor.
Some intuitive and fuzzy sense that, in the terminology of this Julia Galef post, being a âHayekianâ has worked better for me than being a âPlannerâ, including for making epistemic progress.
Some intuitive and fuzzy sense of what Iâve gotten out of âdeepâ versus âbroadâ reading. E.g. my sense is that reading Robert Caroâs monumental, >1,300-page biography of New York city planner Robert Moses has had a significant impact on my model of how individuals can attain political power, albeit by adding a bunch of detail and drawing my attention to factors I previously wouldnât have considered rather than by providing evidence for any particular hypothesis.)
Very much agree with the key points, which are related to what I wrote here.
My unsatisfying conclusion was that there are three approaches when facing an âappropriately underpoweredâ question:
Donât try to answer these questions empirically, use other approaches.
If data cannot resolve the problem to the customary âstandardâ of p<0.05, then use qualitative approaches or theory driven methods instead.
Estimate the effect and show that it is statistically non-significant.
This will presumably be interpreted as the effect having a small or insignificant practical effect, despite the fact that that isnât how p-values work.
Do a Bayesian analysis with comparisons of different prior beliefs to show how the posterior changes.
This will not alter the fact that there is too little data to convincingly show an answer, and is difficult to explain. Properly uncertain prior beliefs will show that the answer is still uncertain after accounting for the new data, but will perhaps shift the estimated posterior slightly to the right, and narrow the distribution.
I also strongly agree with Willâs comment that this doesnât (always) imply that we shouldnât do such work, it just means that weâre doing qualitative work, which as he suggests, can be valuable in different ways.
[NB: I talked briefly to Greg in person about this last weekend, but felt it might be valuable to put this up here anyway for the purpose of public discussion /â testing my beliefs.]
I have mixed feelings about this.
On the one hand, I really like the framing of reality being underpowered in certain contexts, and I think this post does a good job of explaining why this is often the case. I think the observation that we often have a lot of tacit data about the world that is hard to fit into explicit models but can nevertheless make non-data-driven expert predictions perform better than change is well-made and well-taken.
Nevertheless, I feel that in a great many cases, non-quantitative, intuitive, first-principles-heavy analyses of the world very often fail; that their rate of failure may often be poorly correlated with their apparent compellingness; that non-quantitative experts overestimate the explanatory power of their work at least as much as (and probably more than) more data-driven analysts; and that a shift towards more explicit, quantitative, data-driven approaches is often among the best ways to distinguish real knowledge about the world from the pseudo-knowledge that I think is rampant in many fields of human enquiry.
As an example: I have several friends who are academic historians, and from time to time weâve talked about cliometrics/âdata-driven approaches to history. The general attitude seems to be âyeah, seems cool if you can do it, but just try that in [my period of study]. Thereâs no way you could build a decent quantitative model with that little data.â While Iâve generally been too tactful to say this to their faces, my response to these sorts of claims has historically been that if the data is too sparse to do meaningful analysis on, itâs probably also too sparse to draw any other conclusions more general than a simple existence proof (âthis thing happened onceâ). Or, more pithily, âif you donât have enough data to know things, you should just admit itâ.
I now suspect this is too strong a stance, but I still think there is some important truth in it. My feeling is that there may well be âgood reasons why expert communities in some areas havenât tried to use data explicitly to answer problems in their fieldâ, but there are also many bad reasons, among the most common of which is that very few people in that field have strong quantitative skills. I suspect that experts in data-poor fields often lack the epistemic modesty or statistical know-how to admit the consequences of that paucity.
Great post; thanks for this, Greg.
Regarding what effect sizes we can observe using âcommon senseâ: this may vary with domain. Historically, people seem to have had better common sense-intuitions about human psychology than about medicine. That made the value of scientific studies greater in the latter domain. (Though Iâm less sure of the relative accuracy of contemporary expert intuitions in psychology and medicine.)
Thus the benefits of additional studies in a domain depends both on whether that domain is underpowered, and on how reliable our common sense-intuitions are in that domain.
âCommon senseâ is obviously infused with insights from past scientific studies. Some findings, like the âSystem 1-System 2âł-distinction, become part of common sense (at least in some social circles). (Obviously this is still more true of expertsâ intuitions.) Since this tendency should tend to increase the accuracy of common sense, it further reduces what we should expect to learn from additional data on a specific question.
Thanks for the post, Greg!
I think there are a couple of things worth mentioning that may allay some of the âsmall sampleâ cases somewhat. Itâs true, for example, that we generally never have the ability to randomize countries into treatments, and that there simply arenât enough countries to really test study designs that have a large number of conditions. However, we have some other options in our pocket for cases such as those. If we donât really need to stay at the country level, we can do things like zoom in and randomize quite large groups of people within the country of interest. We can also use techniques like multilevel modeling to look at the effects at both levels at the same time. We also have quasi-experimental designs: in smoking cessation campaigns, for instance, researchers have used a pre-post design for each country or region: see what the baseline tobacco use is in that region, introduce your campaign, and see how smoking changes.
Now, this is not a perfect design. The country hasnât been randomly assigned and thereâs no control group. And we have the threats of maturation as well: what if people in that country would have cut back for another reason even without our campaign? That said, when we see a positive result in one country, and then two, and then five, ten, and fifteen different countries, we gain more confidence that it is not simply spurious. We can also compare whatâs happening in the campaign countries to somewhat similar countries that did not get the intervention across the same timeframe. This is not a perfect control group. But it can help us to have more confidence in what weâre seeing.
So with this one counter-example, Iâm basically arguing that we shouldnât be thinking âRCT or bust.â Reality is simply too messy. But we still have a large number of tools at our disposal that will give us very good information. Itâs not perfect information, but itâs the best we can do. And we can learn a whole lot from it.
(Another point worth making is that we can use meta-analyses to help determine what the true effect size may be across a number of smaller studies. Each study on its own may be under-powered, but if we have even five or ten of them, we can get a much, much better estimate. This approach can also help control for things like regional differences and failures of randomization.)
So as Will mentions, I think we should be working on a case-by-case basis to determine what the strongest possible research design would be in each case. We should also weigh the cost of collecting that best-case evidence against the cost of other possible research designs in order to find the right blend of methodological rigor and real-world practicality.
I think this is a great post and makes a really important point. Thanks for posting
Some very interesting thoughts here. I think your final points are excellent, particularly #2. It does seem that experts in some fields have a hard-won humility about the ability of data to answer the central questions in their fields, and that perhaps we should use this as a sort of prior guideline for distributing future research resources.
I just want to note that I think the focus on sample size here is somewhat misplaced. N = 200 is by no means a crazily small sample size for an RCT, particularly when units are villages, administrative units, etc. As you note, suitably large effect sizes are reliably statistically distinguishable from zero in this context. This is true even with considerably smaller samplesâeven N = 20! Randomizations even of small samples are relatively unlikely to be unbalanced on confounders, and the p-values yielded by now-common methods like randomization inference express exactly this likelihood. To meâand I mean this exclusively in the context of rigorously designed and executed RCTsâthis concern can be addressed by greater attention to the actual size of resulting p-values: our threshold for accepting the non-null finding of a high-variance, small-sample RCT should perhaps be some very much lower value.
It is true that when there is high variance across units, statistically significant effects are necessarily large; this can obviously lead to some misleading results. Your point is well-taken in this context: if, for example, there are only 20 administrative units in country X, and we are able to randomize some educational intervention across units that could plausibly increase graduation rates only by 1%, but the variance in graduation rates across units is 5%, well, weâre unlikely to find anything useful. But it remains statistically possible to do so given a strong enough effect!
I think I would stick to my guns on the sample size point, although I think you would agree with it if I had expressed it better in the OP.
I agree with you sample sizes of 200 (or 20, or less) can be good enough depending on the context. My core claim is these contexts do not obtain for lots of EA problems: the units vary a lot, the variance is explained by other factors than the one weâre interested in, and the variance explained by the intervention/âfactor of interest will be much smaller (i.e. high variance across units, small effect sizes).
[My intuition driving the confounders point is the balancing these doesnât look feasible if they are sufficiently heavy-tailed (e.g. take all countries starting with A-C, and randomly assign to two arms, these arms will tend to have very large differences in (say) mean GDP), and the implied premise being lots of EA problems will be ones where factors like these are expected to have greater effect than the upper bound on the intervention. I may be off-base.]
Thanks for responding. Iâve now reread your post (twice) and I feel comfortable in saying that I twisted myself up reading it the first time around. I donât think my comment is directly relevant to the point youâre making, and Iâve retracted it. The point is well-taken, and I think it holds up.
Writing something brief to ensure this gets into the final stageâI recall reading this post, thinking it captured a very helpful insight and regularly recalling the title when I see claims based on weak data. Thanks Greg!
I read an article about using logic to fill in the gaps around sparse or weak data that reminded me of this post. The article is focused on health science, but I think the idea is relevant to development as well.
https://âânutritionalrevolution.org/ââ2019/ââ06/ââ30/ââthe-case-for-coming-to-conclusions-based-on-weak-evidence/ââ
Thanks for this, gavintaylor!
This post was awarded an EA Forum Prize; see the prize announcement for more details.
My notes on what I liked about the post, from the announcement:
I recently stumbled onto this article supporting the use of both serendipitous and planned case studies.
https://ââonlinelibrary.wiley.com/ââdoi/ââabs/ââ10.1046/ââj.1365-2753.1998.00011.x
This is related to clinical practice, but again the ideas may be relevant to development. The authors note that case-studies are particularly useful to clinicians who are might be in a good position to look for patients fitting into a specific population during their routine practiceâI wonder if the same concept could be applied to field staff in development projects. For instance, developmental âcase-studiesâ probably wonât generate generalizable results, but they could be helpful in tailoring an RCT validated intervention to a specific population.
Thanks for a great post, Greg! Loved this quote:
-Joshua