Dan Brown: Understanding the gaps between academia and policymaking

Aca­demic stud­ies don’t always es­ti­mate the pa­ram­e­ters that will be most use­ful to us as we try to un­der­stand the cost-effec­tive­ness of char­i­ties’ in­ter­ven­tions. Even when they do, it may be difficult to figure out how those es­ti­mates ap­ply to a spe­cific char­ity’s pro­gram. Dan Brown, a se­nior fel­low at GiveWell, uses ex­am­ples from his own pro­jects to demon­strate the depth of this challenge — and sug­gests ways to get prac­ti­cal value from aca­demic in­sights.

Below is a tran­script of Dan’s talk, which we’ve lightly ed­ited for clar­ity. You also can watch it on YouTube and read it on effec­tivealtru­ism.org.

The Talk

Thanks, Nathan, for the in­tro­duc­tion. As I men­tioned, I’m from GiveWell. We’re a non­profit or­ga­ni­za­tion that searches for the most cost-effec­tive giv­ing op­por­tu­ni­ties in global health and de­vel­op­ment. We di­rect ap­prox­i­mately $150 mil­lion per year to our recom­mended top char­i­ties, and we pub­lish all of our re­search on­line, free of charge, so that any po­ten­tial donor can take a look for them­selves.

In or­der to eval­u­ate the cost-effec­tive­ness of a char­ity’s in­ter­ven­tion, we need to ex­trap­o­late es­ti­mates of the effect of that in­ter­ven­tion from aca­demic stud­ies that were con­ducted in a differ­ent con­text. In this pre­sen­ta­tion, I’m go­ing to dis­cuss some of the challenges that arise when we try to do that. This is known as the prob­lem of ex­ter­nal val­idity or gen­er­al­iz­abil­ity.

I’m by no means the first per­son to talk about this prob­lem — not even the first at this con­fer­ence —but be­fore I started work­ing at GiveWell about 18 months ago, I don’t think I re­ally ap­pre­ci­ated what it looks like to try to tackle it in prac­tice, or quite how difficult it can be. So over the next 20 min­utes, I’m go­ing to run through two case stud­ies from pro­jects that I’ve worked on and [share] the kinds of ques­tions that come up when we’re try­ing to un­der­stand ex­ter­nal val­idity in prac­tice. In do­ing so, hope­fully I’ll also give you a sense for what some of the day-to-day work looks like on the re­search team at GiveWell.

Case Study 1: GiveDirectly

One of our recom­mended top char­i­ties, GiveDirectly, pro­vides un­con­di­tional cash trans­fers to house­holds in Kenya, Uganda, and Rwanda. And one of the po­ten­tial con­cerns with these cash trans­fers is they may cause a nega­tive spillover effect on the con­sump­tion lev­els of in­di­vi­d­u­als who don’t re­ceive the trans­fers them­selves. So we in­ves­ti­gated this pos­si­bil­ity.

Brown2
For the pur­poses of this pre­sen­ta­tion, I’m go­ing to fo­cus on across-village spillover effects only. In other words, do peo­ple ex­pe­rience a de­crease in their con­sump­tion in real terms be­cause the village next to them re­ceived cash trans­fers? And I’m go­ing to fo­cus on one of the main mechanisms for these across-village spillover effects on con­sump­tion: a change in prices.

Fol­low­ing a cash trans­fer, there will be an in­crease in de­mand and prices for goods in the treated villages, and that cre­ates an ar­bi­trage op­por­tu­nity. An in­di­vi­d­ual could pur­chase a good in an un­treated village, trans­port it to a treated village, sell it at a higher price, and make a profit. In do­ing so, they’re go­ing to in­crease de­mand for the good in the un­treated village, and so bid the price up there as well. And that’s go­ing to make house­holds in the un­treated villages worse off.

Brown3
A bit of con­text: Some aca­demic stud­ies have tried to es­ti­mate these spillover effects, of­ten us­ing some­thing that’s known as a ran­dom­ized sat­u­ra­tion de­sign. And that works broadly as fol­lows:

Villages are or­ga­nized into clusters, and for illus­tra­tion, I’ve drawn two clusters on this figure [see above slide]. In some clusters, one-third of villages are treated. Th­ese are known as low-sat­u­ra­tion clusters, like the one on the left, and in other clusters, two-thirds of villages are treated. They are known as high-sat­u­ra­tion clusters, like the one on the right. Which clusters are as­signed to low- and high-sat­u­ra­tion sta­tus is de­ter­mined ran­domly. Then, within each cluster, which villages are as­signed to the treat­ment group and the con­trol group is also de­ter­mined ran­domly. The treated villages here are the villages that are shaded in gray.

Us­ing this de­sign, we can take the con­sump­tion of a house­hold in an un­treated village in a high-sat­u­ra­tion cluster and com­pare it to the con­sump­tion of a house­hold in an un­treated village in a low-sat­u­ra­tion cluster. The differ­ence in their con­sump­tion is go­ing to [re­veal] the effect of be­ing sur­rounded by an ex­tra one-third of treated villages. And within this de­sign, we can also es­ti­mate spillover effects at differ­ent dis­tances.

The ques­tion, then, is how do we take an es­ti­mate of a spillover effect from an aca­demic study like this and ap­ply it to a spe­cific char­ity’s pro­gram, like GiveDirectly’s, in prac­tice and in some other con­text? I’m go­ing to dis­cuss two difficul­ties that come up when we try to do this.

Brown4
Let’s take a hy­po­thet­i­cal aca­demic study. Sup­pose we’ve been able to es­ti­mate the effects of be­ing sur­rounded by an ex­tra one-third of treated villages or hav­ing an ex­tra one-third of treated villages in your re­gion. As be­fore, the treated villages are shaded in gray and the black Xs in­di­cate the house­holds that live in the un­treated villages. The red stars in­di­cate the three mar­ket­places that ex­ist in this re­gion. So fol­low­ing a cash trans­fer, there’ll be an in­crease in prices in the mar­ket in the treated village, which [in the slide above] is la­beled as “Mar­ket 1.”

The ex­tent to which prices ar­bi­trage across mar­kets de­pends on how easy it is to trans­port goods be­tween them. Mar­ket 2 is very close to Mar­ket 1; it’s very easy to trans­port goods there, and so prices are go­ing to in­crease in Mar­ket 2, and the house­holds that are served by Mar­ket 2 are go­ing to ex­pe­rience a de­crease in their con­sump­tion in real terms. But Mar­ket 3 is much farther away from Mar­ket 1. It’s much harder to trans­port goods to Mar­ket 3, and so ac­cord­ingly, prices aren’t go­ing to in­crease much there, and the houses served by that mar­ket won’t be af­fected much by the cash trans­fers in the treated villages. So in this aca­demic study, we might ex­pect to see a fairly limited spillover effect on the con­sump­tion of house­holds in the un­treated villages.

Brown5
But let’s sup­pose that our char­ity op­er­ates in a differ­ent set­ting, [and we] speci­fi­cally as­sess where mar­kets are more in­te­grated eco­nom­i­cally. Sup­pose there’s now a train line that links Mar­ket 1 to Mar­ket 3 in a char­ity set­ting. It’s now much eas­ier to trans­port goods from Mar­ket 1 to Mar­ket 3, and so prices will in­crease ac­cord­ingly in Mar­ket 3, and the house­holds that are served by it are now go­ing to ex­pe­rience a de­crease in their con­sump­tion in real terms. So in this char­ity set­ting, we’d ex­pect to see quite a bit higher spillover effect on the con­sump­tion of house­holds in the un­treated villages than in the aca­demic study set­ting.

This is ob­vi­ously a very sim­plified ex­am­ple, but the point is that if we want to ex­trap­o­late a spillover effect from an aca­demic study to a spe­cific char­ity’s pro­gram, we need to take into ac­count the de­gree to which mar­kets are eco­nom­i­cally in­te­grated. And if we fail to do that, we’re go­ing to ar­rive at in­ac­cu­rate pre­dic­tions about the size of spillover effects.

Okay, so how do we deal with this in prac­tice? Un­for­tu­nately, amongst the aca­demic stud­ies that we’ve re­viewed, we’ve not seen em­piri­cal ev­i­dence to tell us the ex­tent to which the de­cline of spillover effects with dis­tance de­pends on the ex­tent to which mar­kets are eco­nom­i­cally in­te­grated. So ul­ti­mately, we’re prob­a­bly go­ing to need fur­ther aca­demic study of this.

For the time be­ing, what we’ve done is we have placed a much greater weight on stud­ies that were con­ducted in con­texts that are more like GiveDirectly’s con­text in terms of the de­gree of mar­kets’ in­te­grat­ed­ness. For ex­am­ple, we place very lit­tle weight on a study that was con­ducted in the Philip­pines, be­cause that study cov­ered mar­kets across mul­ti­ple is­lands, and we sus­pect that mar­kets across is­lands are prob­a­bly a lot less eco­nom­i­cally in­te­grated than mar­kets in GiveDirectly’s set­ting.

Brown6

Let’s go back to the origi­nal hy­po­thet­i­cal study set­ting to dis­cuss a sec­ond challenge that comes up. Let’s as­sume that prices ar­bi­trage as they did origi­nally. Prices in­crease in Mar­ket 2 fol­low­ing the cash trans­fer, but they don’t in­crease much in Mar­ket 3.

Brown7
Now let’s move to an­other char­ity set­ting, where the spa­tial dis­tri­bu­tion of house­holds in our un­treated villages is very differ­ent. In this figure, we can see that a much higher pro­por­tion of the house­holds in the un­treated villages live close to the treated villages, and are served by Mar­kets 1 and 2, where we know that prices have in­creased. So again, in this char­ity set­ting, we’d prob­a­bly ex­pect to see quite a bit higher spillover effects on the con­sump­tion of house­holds in the un­treated villages com­pared to the aca­demic study set­ting.

So if we want to ex­trap­o­late an es­ti­mate from the aca­demic study to this spe­cific char­ity’s pro­gram, we need to take into ac­count the differ­ences in the spa­tial dis­tri­bu­tion of house­holds be­tween these two differ­ent con­texts. And a failure to do that, again, is go­ing to lead us to quite in­ac­cu­rate pre­dic­tions about the size of spillover effects.

At an even more ba­sic level, we just need to know the lo­ca­tion of the house­holds in the un­treated villages. We need to know how many house­holds are af­fected by spillover effects, be­cause ul­ti­mately we want to ag­gre­gate a to­tal spillover effect across all un­treated house­holds.

One difficulty that arises here is that even though the aca­demic stud­ies do col­lect in­for­ma­tion on the lo­ca­tion of house­holds in un­treated villages, un­for­tu­nately this isn’t some­thing that GiveDirectly col­lects. This is fairly un­der­stand­able. We wouldn’t ex­pect a char­ity to spend a lot of ad­di­tional re­sources to col­lect in­for­ma­tion in ar­eas where it doesn’t even im­ple­ment its pro­gram. But it does make it very difficult for us to ex­trap­o­late these aca­demic study es­ti­mates to pre­dict the effect of GiveDirectly’s pro­gram.

We can’t do a lot to deal with this at the mo­ment. In the fu­ture, we’ll try to find some low-cost ways of col­lect­ing in­for­ma­tion on the lo­ca­tion of these house­holds in the sur­round­ing villages. But hope­fully, the big­ger point that you take from this first case study is that if we want to ex­trap­o­late the effect of an aca­demic study to a spe­cific char­ity’s pro­gram, we need to think care­fully about the mechanism that’s driv­ing the effect — and then we need to think about con­tex­tual fac­tors in a spe­cific char­ity set­ting that might af­fect the way that mechanism op­er­ates. And of­ten we go and find ad­di­tional data or in­for­ma­tion from other sources in or­der to make some ad­just­ments to our model.

Case Study 2: For­tify Health

Another ex­am­ple of this comes from a pro­ject that we un­der­took to eval­u­ate For­tify Health’s iron for­tifi­ca­tion pro­gram in In­dia. For­tify Health pro­vides sup­port to mil­lers to for­tify their wheat flour with iron in or­der to re­duce rates of ane­mia.

We found a meta-anal­y­sis of ran­dom­ized con­trol­led tri­als that has speci­fi­cally es­ti­mated the effect of iron for­tifi­ca­tion. And the ques­tion again for us is: how do we take this aca­demic es­ti­mate of the effect of iron for­tifi­ca­tion and use it to pre­dict the effects of For­tify Health’s pro­grams in some other con­text?

Brown8
The first step is to out­line ex­actly how this pro­gram works. I’ll give you a quick overview. For­tify Health pro­vides an iron for­tifi­cant to mil­lers. This is es­sen­tially a pow­der that con­tains the ad­di­tional iron. They also provide equip­ment and train­ing so that the mil­lers can in­te­grate that for­tifi­cant into their pro­duc­tion pro­cess and put the iron into the wheat flour. Millers will then sell this for­tified wheat flour on the open mar­ket.

Con­sumers pur­chase the wheat flour and use it to pre­pare food items like roti. They’ll then eat the food and in­gest the iron, which needs to be ab­sorbed into the blood be­fore it can have any health effect.

We make ad­just­ments in our cost-effec­tive­ness model for is­sues that come up in the first three steps. So it may be the case that some of the for­tified wheat flour is lost be­cause it goes un­sold on the shelves in the mar­ket, or be­cause some is wasted dur­ing food prepa­ra­tion.

Brown9
But let’s just zoom in on the last two steps — the point at which con­sumers are ac­tu­ally eat­ing the for­tified wheat flour. Can we take the aca­demic study, the meta-anal­y­sis es­ti­mate of the effect of iron for­tifi­ca­tion, and as­sume it’s a rea­son­able guess for the effects of For­tify Health’s pro­grams, speci­fi­cally?

There are a few rea­sons why it might not be. The first is that the quan­tity of iron in­gested through differ­ent iron for­tifi­ca­tion pro­grams can vary quite a bit. It may be that differ­ent pro­grams add a differ­ent amount of iron to a given quan­tity of food. Or it may be that in differ­ent con­texts, con­sumers eat a differ­ent amount of the for­tified food.

Brown10
So we went back to each of the 18 aca­demic stud­ies un­der­ly­ing the meta-anal­y­sis, and in each case we backed out the to­tal ad­di­tional amount of iron that con­sumers in­gest as a re­sult of the for­tified food. That re­quired tak­ing in­for­ma­tion on the quan­tity of iron that’s added per 100 grams of the food, the amount of the ad­di­tional for­tified food a per­son eats per day, the num­ber of days per week that they con­sume the for­tified food, and the du­ra­tion of the pro­gram. In this table, we’ve ag­gre­gated things an­nu­ally, but we’ve done this for differ­ent lengths of time as well. And what we found is that the av­er­age par­ti­ci­pant in the aca­demic stud­ies con­sumes an ad­di­tional 1,903 mil­li­grams of iron per year as a re­sult of this for­tified food.

We com­pared this to our best guess for the ad­di­tional iron that’s in­gested through For­tify Health’s pro­gram. And we be­lieve that the av­er­age con­sumer in For­tify Health’s pro­gram will only in­gest about 67% as much ad­di­tional iron through the for­tified food as the av­er­age par­ti­ci­pant in the aca­demic stud­ies. So at first glance, it seems like For­tify Health’s pro­gram is prob­a­bly go­ing to have quite a bit less of an effect than the effects in the aca­demic meta-anal­y­sis.

But what mat­ters is not the amount of iron that you in­gest, but the amount of iron that’s ab­sorbed into your blood. It’s only when it’s ab­sorbed into the blood that it can have an effect on your health. And the amount that you ab­sorb into the blood de­pends on sev­eral ad­di­tional fac­tors.

Brown11
The first of these is the for­tifi­ca­tion com­pound that’s used to de­liver the iron into the body. For­tify Health uses some­thing that’s known as sodium iron EDTA [ede­tate di­s­odium], but the aca­demic stud­ies use a range of differ­ent com­pounds — things like fer­rous sul­fate, fer­rous fu­marate, fer­ric py­rophos­phate, and a lot of other long sci­ence words.

So we went back to the aca­demic liter­a­ture and we asked, “Are there any stud­ies that have tried to es­ti­mate the differ­ence in ab­sorp­tion rates across for­tifi­ca­tion com­pounds?” And more speci­fi­cally, we looked at ev­i­dence from what are known as iso­topic stud­ies. Th­ese are stud­ies that provide par­ti­ci­pants with food that’s for­tified us­ing differ­ent com­pounds. They la­bel the iron. When you take a blood draw from par­ti­ci­pants af­ter they have con­sumed the for­tified food, you can trace ex­actly which iron for­tifi­ca­tion com­pound it came from. And us­ing that, you can back out an ab­sorp­tion rate for each for­tifi­ca­tion com­pound.

Brown12
This table [above] comes from a liter­a­ture re­view that was con­ducted by Both­well and McPhail. If you look at the third column in the top row, they es­ti­mate that the rate of iron ab­sorp­tion from sodium iron EDTA, which is For­tify Health’s com­pound, is about 2.3 times the rate of iron ab­sorp­tion from fer­rous sul­fate, which is one of the com­pounds that’s used in the aca­demic stud­ies in wheat prod­ucts. More gen­er­ally, we found that the rate of iron ab­sorp­tion from sodium iron EDTA is quite a bit greater than the rate of iron ab­sorp­tion from the com­pounds that are used in the aca­demic stud­ies.

When we make an ad­just­ment for this, in ad­di­tion to the ad­just­ment that we made pre­vi­ously for the differ­ences in quan­tities of iron, we pre­dict that the av­er­age con­sumer in For­tify Health’s pro­gram ac­tu­ally ab­sorbs about 167% as much iron into the blood as the av­er­age par­ti­ci­pant in the aca­demic stud­ies. And so ac­tu­ally, we’d ex­pect For­tify Health’s pro­gram to have quite a bit greater effect than the effect that was es­ti­mated in the meta-anal­y­sis in the aca­demic liter­a­ture.

Brown13
But we’re not done yet. There are at least two ad­di­tional fac­tors that af­fect the ab­sorp­tion of iron. The first is the sub­stances that you con­sume alongside the for­tified food. Some sub­stances will in­hibit the rate of ab­sorp­tion and oth­ers will en­hance the rate of ab­sorp­tion. We’re par­tic­u­larly wor­ried here about tea con­sump­tion. It’s been shown in sev­eral stud­ies that tea sig­nifi­cantly in­hibits the ab­sorp­tion of iron even when it’s de­liv­ered through the sodium iron EDTA com­pound. And sec­ondly, we’re wor­ried about the po­ten­tial differ­ences in baseline iron lev­els be­tween the par­ti­ci­pants in the aca­demic stud­ies and the con­sumers of For­tify Health’s pro­gram.

Take this with a pinch of salt, be­cause we’ve not in­ves­ti­gated it deeply yet, but there’s some sug­ges­tion in the aca­demic liter­a­ture that the rate of iron ab­sorp­tion is greater when your ini­tial baseline iron lev­els are lower. Un­for­tu­nately, we don’t have in­for­ma­tion on the diets of ei­ther the con­sumers of For­tify Health’s pro­gram or the par­ti­ci­pants in the aca­demic stud­ies. And we also don’t have in­for­ma­tion on the baseline iron lev­els of For­tify Health’s con­sumers. So for the time be­ing, we’ve not been able to make ad­di­tional ad­just­ments for these two fac­tors. We’d cer­tainly like to get more in­for­ma­tion on this if we can in the fu­ture, par­tic­u­larly for the con­sumers of For­tify Health’s pro­gram.

But hope­fully the big­ger point that you take from this case study is that if we want to ex­trap­o­late the effects from an aca­demic study — in this case, of iron for­tifi­ca­tion — to a spe­cific char­ity’s pro­gram, we need to think about the mechanism that’s driv­ing the effects of the pro­gram, and then con­sider spe­cific lo­cal con­di­tions that might af­fect how that mechanism op­er­ates in the char­ity set­ting. And then we po­ten­tially need to go and find ad­di­tional data or other aca­demic liter­a­ture in or­der to make some ad­just­ments to our model.

What’s more, this kind of is­sue arises even when we’re try­ing to un­der­stand the purely biolog­i­cal effects of what seems like a fairly straight­for­ward health in­ter­ven­tion like iron for­tifi­ca­tion. We might ex­pect these con­cerns and ques­tions to be a lot more difficult if we’re con­sid­er­ing a more com­pli­cated be­hav­ior in­ter­ven­tion — say, some­thing like an in­ter­ven­tion to in­crease ed­u­ca­tional at­tain­ment.

Brown14

How can we do bet­ter in the fu­ture? Hope­fully, you can see from these two case stud­ies that the is­sues and ques­tions that arise de­pend a lot on the in­ter­ven­tion and the mechanism driv­ing the effect of that in­ter­ven­tion. It’s hard to sug­gest gen­eral pre­scrip­tions. But we think there are at least two ways that we can be in a bet­ter po­si­tion in the fu­ture:

1. We’d like to bet­ter un­der­stand what other policy or­ga­ni­za­tions are do­ing to tackle these ex­ter­nal val­idity ques­tions. We’ve spo­ken to some other or­ga­ni­za­tions and are aware of a few other ap­proaches, such as J-PAL’s. But we’d like to see a few more con­crete ex­am­ples of ex­actly what steps other or­ga­ni­za­tions are tak­ing to make these ex­ter­nal val­idity ad­just­ments in prac­tice — per­haps some­thing like the case stud­ies that have been laid out here, but with a bit more de­tail.
2. We’d like to see aca­demics spend more time dis­cussing the con­tex­tual fac­tors that arose in the study set­tings they’ve worked in. Our un­der­stand­ing is that most aca­demics spend the vast ma­jor­ity of their time fo­cus­ing on rul­ing out threats to in­ter­nal val­idity. That’s un­der­stand­able. I don’t think that [ad­dress­ing some of the ques­tions I’ve raised here] will get your work ac­cepted by a bet­ter pub­li­ca­tion or a higher-im­pact jour­nal. So per­haps we need to think of ways to in­cen­tivize aca­demics to do a bit more [work] on ex­ter­nal val­idity.

But hope­fully, you have more of a feel for the kinds of ques­tions that come up when you’re try­ing to work out what ad­just­ments you need to make for ex­ter­nal val­idity if you’re work­ing for an or­ga­ni­za­tion like GiveWell. And if you have any thoughts on these two ques­tions or any other ques­tions about the pre­sen­ta­tion, I know we’ve got a lit­tle bit of time for Q&A now. Thanks.

Nathan Labenz: I guess [I’ll start with] a sim­ple ques­tion. I don’t think you men­tioned what the ac­tual re­sults are on the spillover, so I was just cu­ri­ous to get a lit­tle bit of in­for­ma­tion about that.

Dan: Yeah. This is tricky. What we wanted to do origi­nally was what I hinted at in the sec­ond slide, which was to work out a spillover effect at each dis­tance, then work out the to­tal num­ber of house­holds that live at each dis­tance, and then add up the spillover effects across all of the house­holds to get the to­tal spillover effect. We wanted to make this ad­just­ment quan­ti­ta­tively.

Un­for­tu­nately, be­cause we don’t have the lo­ca­tion of the houses in the un­treated villages, we just can’t do that. So for the time be­ing, we’ve had to do the sec­ond-best thing, which is to make a qual­i­ta­tive judg­ment based on our [in­ter­pre­ta­tion] of the most rele­vant stud­ies to get the right con­text. In terms of the ac­tual bot­tom line, that comes out at a 5% nega­tive ad­just­ment.

That is, as I say, much more of a qual­i­ta­tive, sub­jec­tive guess than what we would like to have when mak­ing these kinds of ad­just­ments. And that was ba­si­cally be­cause some of the stud­ies found a nega­tive effect, some found no effect, and some found a pos­i­tive effect. And we didn’t be­lieve there was strong ev­i­dence, [when con­sid­er­ing] the liter­a­ture as a whole, that there were large nega­tive or pos­i­tive spillover effects. So un­for­tu­nately, we couldn’t make the quan­ti­ta­tive ad­just­ments that we made in the For­tify Health case to the spillovers case. It wasn’t as ex­act as we would’ve liked.

Nathan: So your best guess for the mo­ment is that for ev­ery $1 given to a house­hold, some­body not too far away effec­tively loses $0.05 of pur­chas­ing power?

Dan: I think it’s more. We’ve built in the ad­just­ment by de­creas­ing the to­tal value — or cost-effec­tive­ness — of the pro­gram by 5%. I think it’s not quite as ex­plicit as hash­ing out the ex­act change in con­sump­tion, be­cause again, we don’t even know how many house­holds are af­fected per treated house­hold. It’s quite tricky.

Nathan: A strik­ing as­pect of your talk is that these is­sues are not ob­vi­ous at all from the be­gin­ning. So when you sit down to at­tack a prob­lem like this, how do you be­gin to iden­tify the is­sues that might cause this lack of trans­fer­abil­ity in the first place?

Dan: I think the first step is to out­line ex­actly how you think the pro­gram works. I guess some peo­ple call it a the­ory of change or what­ever — the causal mechanism. Liter­ally write out each of the steps that you think needs to op­er­ate for the pro­gram to work, and then ad­dress each one in turn. And think crit­i­cally about what might go wrong. Always ap­proach it from the per­spec­tive of “How do I think this [pro­gram] is go­ing to fall down? What about the lo­cal con­text might mean that this step in the mechanism just doesn’t op­er­ate?”

That’s ob­vi­ously quite hard to do if you’re a desk-based re­searcher like we are GiveWell. So some­thing that helps us quite a lot is to en­gage with the char­i­ties them­selves. For this case, we en­gaged very closely with For­tify Health and they were able to raise some of these is­sues as well. Then, we could go away and try to find some data for some of the is­sues. So, I think some col­lab­o­ra­tion with the char­i­ties them­selves [helps].

But that’s part of the rea­son why I’d like to see aca­demics dis­cuss this more, be­cause they may be in a bet­ter po­si­tion to be aware of the con­tex­tual fac­tors that re­ally mat­ter for the key mechanisms. They’re con­duct­ing the field work on the ground in a way that’s hard for us to do from a desk-based re­search po­si­tion.

Nathan: So this ques­tion will prob­a­bly ap­ply [in vary­ing de­grees] to differ­ent pro­grams, but I’m sure you’ve also thought about just go­ing out and at­tempt­ing to mea­sure these things di­rectly, as op­posed to do­ing a the­o­ret­i­cal ad­just­ment. What pre­vents you from do­ing that? Is that just too costly and in­ten­sive, and would you es­sen­tially be repli­cat­ing the stud­ies? Or would you be able to [fol­low] a dis­tinct pro­cess to mea­sure the re­sults of pro­grams in a nonaca­demic set­ting?

Dan: Yeah. Hope­fully, if you take this sort of ap­proach, you don’t need to repli­cate ev­ery sin­gle study in ev­ery differ­ent con­text in which you want to im­ple­ment a char­ity’s pro­gram. And it’s more a case of think­ing about which step might fall down in a par­tic­u­lar con­text and then maybe mea­sur­ing the in­for­ma­tion that’s di­rectly rele­vant to that step, as out­lined in the study.

I think we don’t do this our­selves be­cause we don’t have the in-house ex­per­tise to do a lot of that data col­lec­tion. But maybe we could out­source some of this work to other or­ga­ni­za­tions. We’d have to price it out us­ing a model of how cost-effec­tive the re­search fund­ing it­self would be, given that it might af­fect the amount of money donated to some of the char­i­ties. But I think it’s some­thing we could definitely con­sider.

Nathan: Okay. Very in­ter­est­ing. We have a bunch of ques­tions com­ing in from the app. Let me see how many we can get to. How do you ac­count for differ­ent de­grees of un­cer­tainty or com­plex­ity in the causal chains across the in­ter­ven­tions that you study? You showed one that had a five-de­gree causal chain. Seem­ingly you would pre­fer things that had two lev­els and dis­fa­vor those that had 10, but how do you think about that?

Dan: We don’t have a rule [stat­ing that] if there are too many steps, it’s too com­pli­cated and we wouldn’t con­sider it. In each case we would just lay out what [the steps] are, and then if we think that there’s one step that’s so un­cer­tain that we re­ally can’t get any in­for­ma­tion on it at all, and we feel that it’s go­ing to have a mas­sive im­pact on the bot­tom line, then we would just write that up and prob­a­bly re­frain from mak­ing any grants. We would just put a post out there say­ing, “We’re very un­sure about this at the mo­ment and we need fur­ther aca­demic study. We need to see this, this, or this be­fore we can take a step for­ward.”

As you say, it’d be nice if ev­ery­thing was just a one-step [pro­cess], but I don’t think we would ig­nore an in­ter­ven­tion be­cause it had quite a few steps and maybe more un­cer­tainty.

Nathan: Which aca­demics would you recom­mend as cur­rently do­ing good work on the ques­tion of ex­ter­nal val­idity?

Dan: Hon­estly, I’m not sure that I have spe­cific names of peo­ple who I think are con­sis­tently do­ing a lot of this. I should have a bet­ter an­swer, but no one’s jump­ing out.

Again, at the mo­ment, it seems to me that in gen­eral, these kinds of ques­tions are com­mented on in a para­graph or two in the dis­cus­sion at the end of a pa­per, and it feels more like a throw­away com­ment than some­thing that peo­ple have put a lot of thought into in aca­demic stud­ies. I’ll dig out a name for you. I’ll try and find some­one.

Nathan: You’ve maybe [cov­ered] this a bit already with my first ques­tion, but how do you de­cide how deeply to in­ves­ti­gate a given link in the causal chain as you’re try­ing to work out the over­all im­pact of a pro­gram?

Dan: With ev­ery­thing we do, it’s very hi­er­ar­chi­cal. We’ll try and do an ini­tial base overview — a very, very quick in­ves­ti­ga­tion. I think this ap­plies to stuff more gen­er­ally than to ex­ter­nal val­idity ad­just­ments at GiveWell. So we’ll try to take a quick view of things and then get a sense of whether we re­ally think it’s go­ing to have a big im­pact on the bot­tom line. The big­ger im­pact we think it’s go­ing to have on the bot­tom line, the more will­ing we are to spend staff time go­ing more deeply into the weeds of the ques­tion. So I think it’s a very step-by-step pro­cess with the amount of time that we spend.

Nathan: Okay. A cou­ple more ques­tions: Let’s say you have a rel­a­tively re­cent and seem­ingly di­rectly ap­pli­ca­ble RCT [ran­dom­ized con­trol­led trial] ver­sus a broad scope of RCTs. Cash trans­fers would be a good ex­am­ple of this; they’ve been stud­ied in a lot of differ­ent con­texts. But maybe you can point to one study that’s pretty close to the con­text of a pro­gram. How do you think about the rel­a­tive weight of that similar study ver­sus the broad base of stud­ies that you might con­sider?

Dan: I think in some cases we ac­tu­ally do try to put con­crete weights on each study. We don’t want to con­fuse a study that’s just hap­pen­ing in the same coun­try, or some­thing like that, [with one that’s highly rele­vant]. If there’s a study con­ducted in a differ­ent coun­try, but we think that the steps in the causal chain are go­ing to op­er­ate ex­actly the same be­cause there’s no real mas­sive differ­ence in the con­tex­tual fac­tors, then we wouldn’t mas­sively down­weight that study sim­ply be­cause it’s con­ducted in a differ­ent con­text.

In terms of hav­ing a lot of differ­ent stud­ies, if each of them has a lit­tle bit of weight, then that’s go­ing to drive some weight away from [a sin­gle] study that we think is very, very di­rectly rele­vant.

But [over­all], I think we just take it on a case-by-case ba­sis and try, in some in­stances, to con­cretely place differ­ent weights on each study in some in­for­mal meta-anal­y­sis. And that in­volves think­ing through each of the steps and what “similar­ity” re­ally means — what spe­cific lo­cal con­di­tions need to be the same for [us to deem] a study as hav­ing a similar con­text.

Nathan: We’re a lit­tle bit over time already, so this will have to be the last ques­tion. When you iden­tify these is­sues that seem to im­ply changes, I as­sume they’re nor­mally in the “less effec­tive” di­rec­tion. You did note one that was mov­ing things to­ward “more effec­tive.” [In cases like that,] do you then go back and talk to the origi­nal study au­thors and com­mu­ni­cate what you found? Has that been a fruit­ful col­lab­o­ra­tion for you?

Dan: That’s not some­thing we’ve done so far. It seems like some­thing that maybe we should do a bit more in fu­ture, but cer­tainly in the pro­jects that I’ve worked on, [we haven’t] so far. So that’s a good piece of ad­vice.

Nathan: Good ques­tion. Well, it is always a les­son in epistemic hu­mil­ity when we talk to folks from GiveWell. You do great work and think very deeply, so thank you for that.

No comments.