coreyvernot comments on Deworming and decay: replicating GiveWell’s cost-effectiveness analysis

coreyvernot 28 Jul 2022 13:14 UTC
4 points
0 ∶ 0
I love this analysis, and I think it highlights how important model choice can be. Both constant and decaying treatment effects seem plausible to me. Instead of choosing only one of the two models (constant effect or decaying effect) and estimating as though this were the truth, a middle option is Bayesian model averaging:

https://journals.sagepub.com/doi/full/10.1177/2515245919898657

Along with the prior over the parameters in the models, you also have a prior over the models themselves (eg ⁵⁰⁄₅₀ constant vs decaying effect). The data will support some models more than others and so you get a posterior distribution over the models based on the data (say ³³⁄₆₇ constant vs decaying effect). The posterior probability that each model is true is the weight you give each model when you estimate the treatment effect you’re interested in. It’s a formal way of incorporating model uncertainty into the estimate, which would allow others to adjust the analysis based on their priors on the correct model (presumably GiveWell would start with a larger prior on constant effects, you would start with a larger prior on decaying effects).

One (small) reason one might start with a larger prior on the constant effects model is to favor simplicity. In Bayesian model averaging when researchers don’t assign equal priors to all models, I think the second most common decision is to penalize the model with more parameters in favor of the simpler model, which would mean having a larger prior on the constant effects model in this case. I share your prior that decaying effects of economic programs in adults is the norm; I think it’s less clear for early childhood interventions that have positive effects in adulthood. This paper has a review of some qualifying interventions (including deworming) - I’d be interested to see if others have multiple long-run waves and whether those also show evidence of decaying effects.

https://www.nber.org/system/files/working_papers/w25356/w25356.pdf
- Karthik Tadepalli 28 Jul 2022 23:53 UTC
  7 points
  0 ∶ 0
  Parent
  I’m unclear on whether this works since a constant effects model is a decay model, with a decay parameter set to zero. So you’re just setting hyperparameters on the distribution of the decay parameter, which is normal bayesian modelling and not model averaging.
  - coreyvernot 29 Jul 2022 3:54 UTC
    5 points
    0 ∶ 0
    Parent
    Thanks for this point, I didn’t think clearly about how the models are nested. I think that means the BMA I describe is the same as having one model with a decay parameter (as you say) but instead of a continuous prior on the decay parameter the prior is a mixture model with a point mass at zero. I know this is one bayesian method for penalizing model complexity, similar to lasso or ridge regression.
    https://wesselb.github.io/assets/write-ups/Bruinsma,%20Spike%20and%20Slab%20Priors.pdf
    So I now realize that what I proposed could just be seen as putting an explicit penalty on the extra parameter needed in the decay model, where the penalty is the size of the point mass. The motivation for that would be to avoid overfitting, which isn’t how I thought of it originally.
- Samuel Dupret 28 Jul 2022 20:03 UTC
  2 points
  0 ∶ 0
  Parent
  Thank you for sharing this and those links. It would be useful to build a quantitative and qualitative summary of how and when early interventions in childhood lead to long-term gains. You can have a positive effect later in life and still have decay (or growth, or constant, or a mix). In our case, we are particularly interested in terms of subjective wellbeing rather than income alone.
  
  One (small) reason one might start with a larger prior on the constant effects model is to favor simplicity
  
  I am a bit rusty on Bayesian model comparison, but—translating from my frequentist knowledge—I think the question isn’t so much whether the model is simpler or not, but how much error adding a parameter reduce? Decay probably seems to fit the data better.
  - Karthik Tadepalli 28 Jul 2022 23:51 UTC
    3 points
    0 ∶ 0
    Parent
    Any model with more degrees of freedom will always fit the data (that you have!) better. A decay model nests a constant effects model, because the decay parameter can be zero.