I just thought I’d flag some initial skepticism around the claim:
Our estimates indicate that next year, we will become 20 times as cost-effective as cash transfers.
Overall I expect it may be difficult for the uninformed reader to know how much they should update based on this post (if at all), but given you have acknowledged many of these (fairly glaring) design/study limitations in the text itself, I am somewhat surprised the team is still willing to make the extrapolation from 7x to 20x GD within a year. It also requires that the team is successful with increasing effective outreach by 2 OOMs despite currently having less than 6 months of runway for the organisation.[1]
I also think this pilot should not give the team “a reasonable level of confidence that [the] adaptation of Step-by-Step was effective” insofar as the claim is that charitable dollars here are cost competitive with top GiveWell charities / have good reason to believe you will be 2x top GiveWell charities next year) (though perhaps you just meant from an implementation perspective, not cost-effectiveness). My current view is that while this might be a reasonable place to consider funding for non-EA funders (or e.g. specifically interested in mental health or mental health in India), I’d hope that the EA community who are looking to maximise impact through their donations in the GHD space would update based on higher evidentiary standards than what has been provided in this post, which IMO indicates little beyond feasibility and acceptability (which is still promising and exciting news, and I don’t want to diminsh this!)
I don’t want this to come across as a rebuke of the work the team is trying to do—I am on the record for being excited about more people doing work that use subjective wellbeing on the margin, and I think this is work worth doing. But I hope the team is mindful that continued overconfident claims in this space may cause people to negatively update and less likely to fund this work in future, and for totally preventable communication-related decisions, and not because wellbeing approaches are bad/not worth funding in principle.
A very crude BOTEC based only on the increased time needed for the 15min / week calls with 10,000 people indicates something like 17 additional guides doing the 15min calls full time, assuming they do nothing but these calls every day. The increase in human resources to scale up to reaching 10,000 people are of course much more intensive than this, even for a heavily WhatsApp based intervention.
Hi Bruce! Our minimum target for this year is to help 1K people, so we’d be moving from at least 1K this year to 10K participants next year. Based on our budget projections, it should be feasible to help 10K people for a budget of approximately $300K per year. We believe it is feasible to raise this amount. If I understand correctly, I think when you say effective outreach, you’re referring to participant acquisition. If we maintain the acquisition cost of $0.96 per participant, it would cost around $10K to acquire 10K participants; spending this amount would be feasible. However, in addition, we are beginning to build out other recruitment pathways such as referrals from external organizations and partners, which can bring in many people without additional costs besides some staff time. We’re also optimistic that we’ll start to see organic growth beginning this year.
We’ve used a more conservative estimate for effect size next year. The estimated effect size for the pilot was 0.54 standard deviations but we believe this is an upper bound and don’t expect to maintain it, so we’ve used the WHO’s effect size of 0.48 standard deviations. We chose this effect size because it’s the best evidence we have. It may prove to be an overly optimistic estimate. Maybe the pilot results were a fluke and we won’t even get close.
It is a fair point that the statement about our confidence level might be too high; I’ve revised it to more accurately reflect the meaning I intended to get across (“The pilot outcomes suggest that this program can be implemented successfully on WhatsApp and may indicate that our adaptation of Step-by-Step has been effective, although much more data is required to confirm this”).
I agree with you that the main takeaway should be that the pilot demonstrates acceptability and feasibility, not that it’s a highly cost-effective intervention; there is not enough evidence for this. The purpose of the pilot was always to test acceptability and feasibility, while collecting data on end impacts to generate some early indicators of its effectiveness. On the note of donors- for funders who want to focus their charitable donations on interventions with well-established cost-effectiveness, I would not advise them to support us at this time. An organization in its second year is highly unlikely to be able to meet this bar. Supporting our work would be a prospect for donors who are interested in promising interventions that could potentially be very cost-effective, but need to grow to a point where there is enough data to confirm this.
I focused a lot of the content of the post on the tentative results we have on the end impact of the intervention because I know that is the primary interest of forum readers and the EA community in general. However, in retrospect, perhaps I should have focused the post mostly on acceptability and feasibility, with a lesser focus on the impact, given that testing A&F was the primary purpose of the pilot.
Thanks for the comments, these are all reasonable points you’ve made. Cheers!
Congratulations on the pilot!
I just thought I’d flag some initial skepticism around the claim:
Overall I expect it may be difficult for the uninformed reader to know how much they should update based on this post (if at all), but given you have acknowledged many of these (fairly glaring) design/study limitations in the text itself, I am somewhat surprised the team is still willing to make the extrapolation from 7x to 20x GD within a year. It also requires that the team is successful with increasing effective outreach by 2 OOMs despite currently having less than 6 months of runway for the organisation.[1]
I also think this pilot should not give the team “a reasonable level of confidence that [the] adaptation of Step-by-Step was effective” insofar as the claim is that charitable dollars here are cost competitive with top GiveWell charities / have good reason to believe you will be 2x top GiveWell charities next year) (though perhaps you just meant from an implementation perspective, not cost-effectiveness). My current view is that while this might be a reasonable place to consider funding for non-EA funders (or e.g. specifically interested in mental health or mental health in India), I’d hope that the EA community who are looking to maximise impact through their donations in the GHD space would update based on higher evidentiary standards than what has been provided in this post, which IMO indicates little beyond feasibility and acceptability (which is still promising and exciting news, and I don’t want to diminsh this!)
I don’t want this to come across as a rebuke of the work the team is trying to do—I am on the record for being excited about more people doing work that use subjective wellbeing on the margin, and I think this is work worth doing. But I hope the team is mindful that continued overconfident claims in this space may cause people to negatively update and less likely to fund this work in future, and for totally preventable communication-related decisions, and not because wellbeing approaches are bad/not worth funding in principle.
A very crude BOTEC based only on the increased time needed for the 15min / week calls with 10,000 people indicates something like 17 additional guides doing the 15min calls full time, assuming they do nothing but these calls every day. The increase in human resources to scale up to reaching 10,000 people are of course much more intensive than this, even for a heavily WhatsApp based intervention.
10000 * 0.25 * 6 * 0.27 / 40 / 6 = 16.875
(number reached * hours per week * weeks * retention / hours per week / week)
Hi Bruce! Our minimum target for this year is to help 1K people, so we’d be moving from at least 1K this year to 10K participants next year. Based on our budget projections, it should be feasible to help 10K people for a budget of approximately $300K per year. We believe it is feasible to raise this amount. If I understand correctly, I think when you say effective outreach, you’re referring to participant acquisition. If we maintain the acquisition cost of $0.96 per participant, it would cost around $10K to acquire 10K participants; spending this amount would be feasible. However, in addition, we are beginning to build out other recruitment pathways such as referrals from external organizations and partners, which can bring in many people without additional costs besides some staff time. We’re also optimistic that we’ll start to see organic growth beginning this year.
We’ve used a more conservative estimate for effect size next year. The estimated effect size for the pilot was 0.54 standard deviations but we believe this is an upper bound and don’t expect to maintain it, so we’ve used the WHO’s effect size of 0.48 standard deviations. We chose this effect size because it’s the best evidence we have. It may prove to be an overly optimistic estimate. Maybe the pilot results were a fluke and we won’t even get close.
It is a fair point that the statement about our confidence level might be too high; I’ve revised it to more accurately reflect the meaning I intended to get across (“The pilot outcomes suggest that this program can be implemented successfully on WhatsApp and may indicate that our adaptation of Step-by-Step has been effective, although much more data is required to confirm this”).
I agree with you that the main takeaway should be that the pilot demonstrates acceptability and feasibility, not that it’s a highly cost-effective intervention; there is not enough evidence for this. The purpose of the pilot was always to test acceptability and feasibility, while collecting data on end impacts to generate some early indicators of its effectiveness. On the note of donors- for funders who want to focus their charitable donations on interventions with well-established cost-effectiveness, I would not advise them to support us at this time. An organization in its second year is highly unlikely to be able to meet this bar. Supporting our work would be a prospect for donors who are interested in promising interventions that could potentially be very cost-effective, but need to grow to a point where there is enough data to confirm this.
I focused a lot of the content of the post on the tentative results we have on the end impact of the intervention because I know that is the primary interest of forum readers and the EA community in general. However, in retrospect, perhaps I should have focused the post mostly on acceptability and feasibility, with a lesser focus on the impact, given that testing A&F was the primary purpose of the pilot.
Thanks for the comments, these are all reasonable points you’ve made. Cheers!