Economist @ IDinsight (SF). Working in international development but interested in all aspects of EA.
Be nice :)
Hey, thank you for the work you are doing! Here are my thoughts (I’m an economist at IDinsight and work on this type of research):
If you want to understand the impact of your program, I don’t recommend doing an RCT at this stage. This seems like a very small pilot and you won’t have enough power / sample size to detect an effect (more see below). You should only consider running an RCT if and when you plan to scale this up later to a sufficient scale.
Instead what I advise is trying to understand and improve your impact by doing some small sample survey + qualitative research. E.g. when you go to a village, talk to locals (ideally capture a good representation of different types of people in the community, not just leaders but also relatively marginalized groups; you could do a rigorous sampling but I’m not sure if that’s realistic or worthwhile at this stage given the trouble that involves) to understand their current knowledge, attitudes, and behavior around COVID (what knowledge they lack, what attitude needs changed, what rumors are around etc.) -- to better design your messages; also ask them what kind of information campaign would engage them, and after you do your program ask how they felt—whether they liked it, whether they found it useful, what they learned, what they’d do differently etc. Can also contact them some time later to see if they observe any behavioral change among people in the community (better than asking what they themselves do due to social desirability bias).
More technical details:
Since you’re doing a clustered RCT—treatment is at the village level and the outcomes of people within a village are likely positively correlated—you’ll need a larger sample size than if you were doing an individual-level RCT (for the math, see section 4.2 of this—generally a great resource for RCT design). You can do a power calculation for a clustered randomized controlled trial, e.g. using Stata’s “power twomeans” command. One parameter that’s missing is the intraclass correlation (correlation among individuals within a treatment unit). However, since your cluster size is SO small (3 and 3), when I try to do this calculation in Stata with any reasonable assumption Stata says you cannot have enough power (assuming you want all the standard -- 80% power, 5% significance level etc.). That’s why I recommend not doing an RCT unless you have a program at scale
To follow up on Michael’s last point here: Natalia, do you have any interest in collaborating with academics to feed results from your app to a new version of disability weighting? (He mentioned in his other comment that some academics were working on it but stopped.)
I also posted a comment on the other post outlining challenges you need to overcome to generate rigorous measures for disability weights (e.g. low take-up, unrepresentative sample).
Seems like the Gates Foundation which is funding the Global Burden of Disease study should be interested in funding a rigorous study since this is basically improving a component of DALY measures. (You likely need to partner with academics.)
A minor correction: GiveWell uses DALY to measure mortality and morbidity. (Well, for malaria they actually don’t look at the impact of prevention on morbidity, only mortality, since the former is relatively small—see row 22 here.) Maybe what you had in mind is their “moral weights” which they use to convert between life years and income.
Like cole_haus points out below, ESM’s results would enter disability weights (which are used to construct DALYs) to affect how health interventions are prioritized. Currently disability weights involve hypothetical surveys using methods described in cole_haus’ comment, with a major issue being most respondents haven’t experienced those conditions. ESM would correct that.
To use ESM results as inputs into disability weights though you’d want a representative sample. Looking at app users is a first step but you’d want to ideally do representative sampling or at least weighting. Otherwise you only capture people who would use the app. Having a large enough sample so you can break down by medical conditions is also a challenge. (For doing all these things properly, I suggest partnering with academics or at least professional researchers experienced in the relevant statistical analysis etc. Someone mentioned lack of demand from users being a potential issue—perhaps they can be incentivized.)
Another way to solve the hypothetical bias issue is to look at surveys that include happiness metrics and
have other characteristics of respondents
have nationally representative samples
such as the Gallup World Poll (whose results are used in the World Happiness Report) and the World Value Survey. (Both mentioned here.) The individual-level data can be used to examine the relationship between medical conditions and happiness (this paper uses similar data to look at income and happiness, and this paper on the impact of relatives dying on happiness). I believe you can access the individual-level data through some university libraries. Though again there’s the challenge of having a large enough sample size so you can break down by medical conditions, and they probably don’t have detailed information on medical conditions. (Perhaps one advantage of an app is you can track someone over time, e.g. before and after a medical condition occurs, which you won’t be able to do with these surveys if they don’t have a panel.)
Thanks Linch for the post!
A comment is that there are things that one probably doesn’t encounter in the first 10-20 hours that can be hugely useful (at least for me) in thinking about EA (both general and domain specific), e.g. this. (Perhaps that means things like that should work their way into key intro materials...)
In general I wish there were a better compilation of EA materials from intro to advanced levels. For intro materials, perhaps this is good. Beyond that, there are good content from
80,000 Hours career guides, problem profiles, and blog posts (some being domain specific, e.g. AI safety syllabus—not sure if such things exist for other cause areas)
Selected blog posts from EA orgs like GiveWell and Open Phil (there are many, but some are more meta and of general interest to EA, e.g. GiveWell blog post I mentioned above)
Selected blog posts from individual EAs or EA-adjacent people
Selected EA forum and Facebook group posts (there are too many, but perhaps the ones winning the EA forum prize are a good starting point)
David Nash’s monthly summaries of EA-related content (here is one)
It would be great if there exists one (for general EA as well as specific topics / cause areas). It should probably be a living document being updated. It should ideally prioritize—going down some order of importance so people with limited time could work their way through. Of course, selection is inherently subjective.
Perhaps the best way is to refer people to EA forum, newsletter, various blogs etc. But it seems nice to have a list of good articles from the past. Someone could work their way through it e.g. during their commute.
(Really not sure about the marginal value of this. Just thought of it as I keep seeing older posts, which are quite interesting, being referred to in EA forum posts; perhaps if a post were interesting enough I would come across someone citing it sometime, but there are definitely things I felt were pretty interesting and I could have missed. I’m not confident about the value, but worth thinking about perhaps part of our movement building work. Even partial work on this could be valuable—doing the first “20%” that has “80%” value, metaphorically.)
Thanks for the post John! Very informative. I know some people thinking of doing another RCT on this and will definitely point them to it.
Also agree that heterogeneities in the actual intervention as well as population under study are major challenges here in generalizing the effects (and they are common in studies on social science interventions which probably lead to lower generalizability than medical trials).
One minor and meta comment on section 2: “How over-optimistic should we expect the evidence to be?” I’m not sure how I feel about having a section on this in a post like yours. It’s totally reasonable as a way to form your prior before examining the literature, but after you do that (motivated by your skepticism based on these reasons) your learning from examining the literature “screens off” the factors that made you skeptical in the first place. (E.g. it may well be that the studies turn out to have super rigorous methodology, even though they are psychological studies conducted by “true believes” etc., and the former should be the main factor influencing your posterior on the impact of meditation—unless the reasons that gave you a skeptical prior makes you think they may have fabricated data etc.)
So while what you said in that section is true in terms of forming a prior (before looking at the papers), I would have put it in a less prominent place in this post (perhaps at the end on “what made me particularly skeptical and hence more interested in examining the literature”). (It’s totally fine if readers feel what’s in section 3 mostly “screens off” what’s in section 2, but if not it may unfairly bias their perception against the studies.)
(Digression: in a completely different situation, if one didn’t examine the literature at all but just put out a skeptical prior based on these reasons—I would say that is the correct way of forming a prior, but it feels slightly unfair or irresponsible. But I probably would feel it’s okay if people highly qualify their statement, e.g. “I have a skeptical prior due to X, Y, and Z, but I really haven’t looked at the actual studies” and perhaps even “if I did look, things like A, B, and C would convince me the studies are actually reliable / unreliable”. I’m not sure about this point and curious for others’ thoughts, since this is probably how a lot of people talk about studies that they haven’t fully read on social media.)
Also a minor and concrete point on section 2: the 2nd bullet point “Most outcome metrics are subjective”. Here are some reasons we may or may not think (ex ante) the results may be overestimated.
If there’s a lot of noise in self-reported outcomes alone it actually doesn’t lead to bias (though in a case where the outcome variable is censored, as many psychological outcomes are, and outcomes are bunched near one end, that could happen).
Some relevant sources of bias are
Social desirability bias (respondents saying what they consider is socially desirable, should affect treatment and control respondents equally and apply to other psychological studies looking at the same outcome)
Courtesy bias (applies to treatment respondents, who may feel obligated to report positive impact)
And since these are self-reported outcomes that can’t be verified, 1) people may be less deterred from lying, 2) we will never find out the truth—so the two biases are potentially more severe (compared to a case where outcomes can be verified).
(Please correct me if I’m wrong here!)
Hi Parth, thank you so much for this post, and for the great work you and your fellow EA organizers are doing at Microsoft!
I live in SF, and have been brainstorming with a few EAs re mobilizing EAs in tech companies (in addition to general EA movement building in the city). Will definitely try to learn from your experience and reach out for more questions if that’s ok.
I also wonder if you guys have a broader strategy for EA community building at Microsoft, and/or other EA meetups there (or directing people to EA Seattle)? Also, do you have a way to track your (estimated) impact?
(Also, this is Microsoft specific, but does Bill Gates do any speaking events on global health or effective giving there? Perhaps he stays away to avoid being seen as meddling in the company… If he’s willing to do it I can see it attracting a huge crowd.)
Rob, thank you so much for the work you and AMF are doing!
GiveWell has written here saying they think your monitoring practice could be improved, though they “continue to believe that AMF stands out among bed net organizations, and among charities generally, for its transparency and the quality of its program monitoring.”
I’d first like to applaud that you do have much better transparency and monitoring practices than the typical development NGO. It seems that one reason GiveWell selected AMF rather than other bed net charities as a top charity is due to this (I could be wrong).
However, given their comment, do you feel it is important for AMF to improve its monitoring practices? Or is that not a priority now? Also the post is from 2016 and may be outdated.
(I can understand how it’s difficult to invest more in monitoring given you have so few staff, and work with international partners on the ground and have less control over the process.)
I work at IDinsight, and am always curious how NGOs decide to spend more or less effort on monitoring. On the one hand it’s really important for improving operations and understanding your own impact, but on the other hand it does compete for resources with your core implementation work.
(Context: I’ve been engaging in “RD” research since my econ PhD focusing on development, and in my past 2.5 years working at IDinsight. All views are my own.)
Thanks a lot for the post. I agree that a more hits-based approach to development within EA is needed. GiveWell says they eventually want to look at economic growth, but they’re starting with health policy which is easier to evaluate and it’s unclear how long it will take them to look at policies aiming at increasing growth, so it seems valuable for other EAs to look at it in the meantime.
A few questions / comments (apology for the length):
(Perhaps answers to some questions here will only emerge after you do some more research. I wrote this before looking at other comments to avoid being influenced, and decided to just post it all to reflect the full set of my reactions even though some content overlaps, so feel free to not comment on what you already responded to.)
I’m curious what methodologies you have in mind in assessing donation opportunities on growth.
I’m not sure what methodologies GiveWell is using to assess policy interventions since they haven’t published an intervention or charity report on this—they have given grants to Center for Pesticide Suicide Prevention and JPAL’s Government Partnership Initiative but haven’t published reports as detailed as for their top charities or interventions.
Slightly less relevant, but in terms of what econ academia will do about it: I was initially pessimistic as development economists may not like methods that aren’t “rigorous” as RCTs as they like to be scientific and not very speculative, but I wonder if this is just because we are currently in a “randomista” paradigm in development econ, and there is a chance that it will shift to being more macro like before. And I don’t have a great sense of the track record of macroeconomics in shaping policy—clearly it’s a very hard field, but it seems to have had some positive influence.)
What do you think of growth diagnostics? Clearly it’s more macro and has lower level of certainty and rigor as RCTs, but I wonder 1) what you think of their theory, 2) what the track record has been in applying it, 3) what barriers there are barriers in applying it (e.g. governments being uninterested)? (I’m not very familiar with the specifics; I would appreciate if you could link some good intro material.)
Apart from knowing what specific policies help increase growth (which we don’t know very well yet), how to get them adopted is a major issue. Apart from knowing what China, India, and the Asian tigers did right, we need to understand why their leaders did the right thing at that time—how much of that is a function of the leaders’ characteristics (which we can’t change) and how much traction outside influence can have. I’m not sure what’s the best way to get them adopted: trying to replicate what economists did to influence China and India (though they did seek out advice unlike many other countries), understanding how governments work and finding effective ways to lobby governments that otherwise wouldn’t be receptive, promoting better institutions and governance (e.g. voter information interventions) to help select better leaders who are more inclined to do helpful reforms (could be political so more caution is needed) etc.
I’m glad you mention other aspects of welfare, and agree that overall “development”, for which GDP per capita is a main indicator / correlate, touches on all of them. It reminds me of what Esther Duflo said in this interview :”I think one should have a healthy respect for growth rates and treat them as useful companions and people that you have to make work for you. I think we should think of growth rate as chief of staff, not something I think we should fall in love with.” Pursuing growth is overall a good bet, but we should always keep in mind what we ultimately care about is a “social transformation” (as your Pritchett quote says) that improves human welfare.
In particular, environment and public health seem very important for welfare in developing countries (e.g. cost of air pollution in China, India, and Sub-Saharan Africa). How to address these issues also deserves attentions from EAs (GiveWell is looking into tobacco and lead regulations and may one day look into these; just like for growth, the non-EA development and philanthropy sector have worked on it, which doesn’t mean EA can’t add value). Agree with you that we should think separately about growth and climate change, but I also think if we figure out how to influence governments to adopt growth-friendly policies, it’s important to think about whether one can promote sustainable growth, environmental policies, climate change adaptation etc. with this opportunity.
Also, I strongly recommend you frame your message in a way that’s less antagonistic to the randomista development community in future work (e.g. something other than “against randomista development”). You may think a more controversial title can catch more attention, and some other RCT skeptics have done it (e.g. Lant Pritchett, Angus Deaton), but I don’t think this is the right strategy, and it just makes it harder for people to talk to each other (e.g. I have heard complaints about Pritchett’s rhetoric among the randomista community which probably makes them less likely to want to give his other ideas a serious look). Clearly you do see “RD” as useful in improving the huge amount of funding and many organizations in the development space and creating a nontrivial amount of positive impact in human welfare (e.g. GiveWell top charities, Evidence Action, some JPAL/IPA partners), and that randomistas are motivated by such impact potential in their work. I’m really glad you point out that we need to invest more in a higher risk and higher turns approach in our portfolio, in addition to the “safe assets” of “RD”. But I think economics academia and the EA movement are harmed by antagonistic feelings among people holding different opinions that want to achieve fundamentally the same goals. (No one is perfectly rational, so even if an “RD” economist—which currently many mainstream development economists are—tries to be rational they may at first find your message hard to stomach; we don’t need antagonistic-sounding headlines to make that even harder and create enemies in people who could become allies. Of course, they do potentially compete for human and monetary resources in the development field, but we don’t need to exacerbate whatever rivalry they already have.)(One example where growth-friendly policies and “RD” can complement one another: investing in education may be important for long-term growth as a country would want to upgrade from labor-intensive sector to human capital intensive sectors, and “RD” can help find the answer to what education interventions the government should invest in conditional on trying to improve education. Arguably Singapore etc. did this without advice from “RD”, but “RD” may be able to help with improving education in other developing countries like they already do.)Overall I am with you in thinking that more research is needed and am very excited that someone in EA is thinking of working on this, including proposing to research the neglectedness and tractability of the field from an EA perspective. (I’ve long felt the lack of hits-based approach in development in EA and not sure what can be done about it as GiveWell, the main EA development research org, is expanding into new territories at a slow-ish rate—which might well be the right choice given their capacity constraint—and Open Phil has largely deferred development research to GiveWell. I would guess some EAs interested in development and some others in the development sector have similarly thoughts, but feel unsure or pessimistic about the tractability of more speculative approaches like Banerjee, Duflo, Blattman, Glennerster etc. -- more research is definitely helpful in updating people’s views.)
This is speculative, but I suspect many of the things you mentioned fall in the category of things that seem pretty impactful, potentially on par with EA’s main cause areas (poverty, animals, x-risk), but it doesn’t seem like it makes sense to devote that much EA manpower or resources to it right now—so a small number of EAs who identify one such area can work on it, and it’s great, (and the EA movement should encourage that, with sufficient justification of the impact), but I can see why the EA movement doesn’t put them as a main cause.
(I don’t necessarily agree with all of the ideas you mentioned as belonging to theses categories, and I probably don’t know enough about them to do so, though I can see many of them being such an area.)
A digression, but I do wonder if people working on these smaller, niche areas with an EA spirit, (assuming they did make the right call on the impact and it’s just an area that can’t absorb a lot of EA resources) feel sidelined or dismissed by the EA movement. (Might be the case for climate for instance.) And I wonder if this were really the case how the EA movement can be better at encouraging such independent thinking and work.
Not sure if already mentioned but this post by Ben Kuhn is also relevant https://forum.effectivealtruism.org/posts/M9RD8S7fRFhY6mnYN/why-nations-fail-and-the-long-termist-view-of-global-poverty