Great that you’ve looked into this Akhil! Speaking as someone with a wife and daughter (and a mother, and other female family members, and female friends...) this is close to my heart.
A key problem with all of these is how to assess effectiveness. IPV typically occurs behind closed doors, which makes it hard to know what’s really happening.
Largely because of these considerations, I predict that on further analysis, I will probably be less positive than you.
While this sounds consistent with a generalised GiveWellian sceptical prior, I say this with some sadness, because I would very much like reducing VAWG to be a high impact cause area.
Also, thank you for asking me for comments before publishing.
---
My main reason for being more pessimistic than you is that your internal and external validity adjustments seem very generous:
For brevity, I’ll focus on Community based social empowerment, since it’s the one you’re most positive about.
You have adjustments of 95% internal validity (aka replicability) adjustment, and 90% external validity (aka generalisability) adjustment[1]. I’d consider these numbers to be high (i.e. more prone to lead to generous cost-effectiveness evaluations)[2].
Your model’s 95% internal validity adjustment is the same internal validity adjustment that GiveWell uses for bednets. For comparison…
… malaria nets do merit a 95% internal validity adjustment. We have seen plenty of positive evidence for the effectiveness of bednets, and I’m told that there is so much evidence that it’s difficult to get ethics approval for more RCTs because ethics boards argue that it’s unethical to do studies with controls on something that is such a robustly proven intervention.
… cash transfers do merit a 95% internal validity adjustment. They are a robustly effective way of reducing poverty.
… Community Based Social Empowerment does not merit a 95% internal validity adjustment, in my view. Gathering this sort of evidence from surveys is very difficult, and I’d be surprised if the protocols are robust enough to give us the same confidence we have about the effect of malaria nets on mortality (deaths are relatively easy to count).
I also suspect the external validity adjustment is too generous. The intervention relies heavily on cultural context; several GiveWell external adjustments are high too, but human bodies are pretty consistent from one place to the next, whereas cultures vary a lot with geography.
Therefore I predict that:
in 90% of worlds where I (or someone from SoGive) sat down and reviewed this carefully, we would have validity adjustments lower than yours (i.e. lower than 95% and 90%).
in 50% of worlds where I (or someone from SoGive) sat down and reviewed this carefully, we would have validity adjustments substantially lower than yours (i.e. lower than 50%).
In summary, I think there’s a 75% chance that we conclude with a >2x worse cost-effectiveness than you, and a 25% chance of a greater than >4x worse cost-effectiveness than you for Community Based Social Empowerment.
This would be unlikely to be at the levels of cost-effectiveness where we would deem the intervention high impact.
I haven’t thought enough about the other interventions apart from Self-defence (IMPower, which has been done by No Means No). As Matt has alluded to, SoGive has done some work on this topic, and received some information which is not in the public domain. I can’t say too much about this, but I can discuss privately and guide you to the relevant researchers. SoGive’s plans are to press for permission to publish on this, and finalise within the next few months.
---
For clarity, I’ve alluded to SoGive in this comment, but this is not an official SoGive comment. Content written in a SoGive capacity has to gone through a certain level of review which has not happened here, so this is written in a personal capacity.
For those less familiar with these models, they are applied in a straightforward, intuitive way. It’s roughly equivalent to (Step 1) Calculate the benefit assuming full trust in the evidence; (Step 2) Multiply the benefit by the validity adjustments; (Step 3) divide by costs.
For those who want access to data to help them form their own view on whether these adjustment are high are not: In SoGive, we have pulled together a spreadsheet with GiveWell’s internal and external validity adjustments (we’re supposed to also add in SoGive’s own adjustments at the bottom, not just GiveWell’s, but have been less diligent at doing that). It’s meant to be a (not-rigorously vetted) internal resource, but I’m sharing it here in case it helps. It’s also probably a couple of years out of date now, but I’d from memory I don’t think there are changes material enough to matter in the last couple of years.
On surveys- To summarise, the concern you raise is that the programs aim to reduce social acceptability of VAWG, and mostly use local interviewers to assess incidence of violence, which may introduce social acceptability bias. Although this seems plausible, interviews were trained, conducted privately and used validated questionnaires. To quote one paper
The study was conducted in accordance with WHO guidelines for the safe and ethical collection of data on violence against women [[24]]. These guidelines seek to minimize reporting biases and risk of harm to both respondents and interviewers. At both baseline and follow-up, interviewers received at least three weeks of training on the ethical and methodological issues surrounding the conduct of a survey relating to IPV and HIV, as well as ongoing support during the course of the survey. Interviewers were all from the local area, and interviewed respondents of the same sex as themselves. Interviews were conducted in private settings, in Luganda or English, and were concluded by providing information on additional support services in the area. At baseline, interviewers conducting the baseline survey were blinded as to the allocation of the intervention. It was not, however, possible to keep follow-up interviewers blinded.
As a result, I think this risk of bias is quite low. Also, I think that inherently, any impact evaluation of interventions in this space would require surveys.
3. External validity- A valid concern. I have two comments: (1)There are a number of studies in different settings which show positive results, suggesting external validity. (2) Although cultural and social drivers of violence vary, the intervention is co-designed with community and quite locally tailored, which mitigates some of the concern around external validity.
4. Meta-comment- I think that some of my estimates of the persistence of effects were quite conservative, which may counterbalance slightly smaller discounts for external and internal validity
Re item 4, it’s fair to note that I haven’t checked how conservative you’ve been on other assumptions, so if I did a replication of your work and it ended up being similar, then I agree that could be a reason.
Great that you’ve looked into this Akhil! Speaking as someone with a wife and daughter (and a mother, and other female family members, and female friends...) this is close to my heart.
A key problem with all of these is how to assess effectiveness. IPV typically occurs behind closed doors, which makes it hard to know what’s really happening.
Largely because of these considerations, I predict that on further analysis, I will probably be less positive than you.
While this sounds consistent with a generalised GiveWellian sceptical prior, I say this with some sadness, because I would very much like reducing VAWG to be a high impact cause area.
Also, thank you for asking me for comments before publishing.
---
My main reason for being more pessimistic than you is that your internal and external validity adjustments seem very generous:
Source: your model
For brevity, I’ll focus on Community based social empowerment, since it’s the one you’re most positive about.
You have adjustments of 95% internal validity (aka replicability) adjustment, and 90% external validity (aka generalisability) adjustment[1]. I’d consider these numbers to be high (i.e. more prone to lead to generous cost-effectiveness evaluations)[2].
Your model’s 95% internal validity adjustment is the same internal validity adjustment that GiveWell uses for bednets. For comparison…
… malaria nets do merit a 95% internal validity adjustment. We have seen plenty of positive evidence for the effectiveness of bednets, and I’m told that there is so much evidence that it’s difficult to get ethics approval for more RCTs because ethics boards argue that it’s unethical to do studies with controls on something that is such a robustly proven intervention.
… cash transfers do merit a 95% internal validity adjustment. They are a robustly effective way of reducing poverty.
… Community Based Social Empowerment does not merit a 95% internal validity adjustment, in my view. Gathering this sort of evidence from surveys is very difficult, and I’d be surprised if the protocols are robust enough to give us the same confidence we have about the effect of malaria nets on mortality (deaths are relatively easy to count).
I also suspect the external validity adjustment is too generous. The intervention relies heavily on cultural context; several GiveWell external adjustments are high too, but human bodies are pretty consistent from one place to the next, whereas cultures vary a lot with geography.
Therefore I predict that:
in 90% of worlds where I (or someone from SoGive) sat down and reviewed this carefully, we would have validity adjustments lower than yours (i.e. lower than 95% and 90%).
in 50% of worlds where I (or someone from SoGive) sat down and reviewed this carefully, we would have validity adjustments substantially lower than yours (i.e. lower than 50%).
In summary, I think there’s a 75% chance that we conclude with a >2x worse cost-effectiveness than you, and a 25% chance of a greater than >4x worse cost-effectiveness than you for Community Based Social Empowerment.
This would be unlikely to be at the levels of cost-effectiveness where we would deem the intervention high impact.
I haven’t thought enough about the other interventions apart from Self-defence (IMPower, which has been done by No Means No). As Matt has alluded to, SoGive has done some work on this topic, and received some information which is not in the public domain. I can’t say too much about this, but I can discuss privately and guide you to the relevant researchers. SoGive’s plans are to press for permission to publish on this, and finalise within the next few months.
---
For clarity, I’ve alluded to SoGive in this comment, but this is not an official SoGive comment. Content written in a SoGive capacity has to gone through a certain level of review which has not happened here, so this is written in a personal capacity.
For those less familiar with these models, they are applied in a straightforward, intuitive way. It’s roughly equivalent to (Step 1) Calculate the benefit assuming full trust in the evidence; (Step 2) Multiply the benefit by the validity adjustments; (Step 3) divide by costs.
For those who want access to data to help them form their own view on whether these adjustment are high are not: In SoGive, we have pulled together a spreadsheet with GiveWell’s internal and external validity adjustments (we’re supposed to also add in SoGive’s own adjustments at the bottom, not just GiveWell’s, but have been less diligent at doing that). It’s meant to be a (not-rigorously vetted) internal resource, but I’m sharing it here in case it helps. It’s also probably a couple of years out of date now, but I’d from memory I don’t think there are changes material enough to matter in the last couple of years.
Hey Sanjay, thanks for your comment
Internal validity-I think it is important to bear in mind there are a number of high quality RCT’s conducted with low quality of bias- given that, although this might not feel like it has as strong an evidence as bednets or vaccines, it does have a strong evidence base. You can see the cRCTs here- Abramsky et al (2014), Dunkle et al (2020), Leight et al (2020), Wagman et al (2015), Ogum Alangea et al (2020), Le Roux et al (2020), Chatterji et al (2020).
On surveys- To summarise, the concern you raise is that the programs aim to reduce social acceptability of VAWG, and mostly use local interviewers to assess incidence of violence, which may introduce social acceptability bias. Although this seems plausible, interviews were trained, conducted privately and used validated questionnaires. To quote one paper
As a result, I think this risk of bias is quite low. Also, I think that inherently, any impact evaluation of interventions in this space would require surveys.
3. External validity- A valid concern. I have two comments: (1)There are a number of studies in different settings which show positive results, suggesting external validity. (2) Although cultural and social drivers of violence vary, the intervention is co-designed with community and quite locally tailored, which mitigates some of the concern around external validity.
4. Meta-comment- I think that some of my estimates of the persistence of effects were quite conservative, which may counterbalance slightly smaller discounts for external and internal validity
Re item 4, it’s fair to note that I haven’t checked how conservative you’ve been on other assumptions, so if I did a replication of your work and it ended up being similar, then I agree that could be a reason.