Gregory Lewis comments on EA Survey 2019 Series: Donation Data

Gregory Lewis 14 Feb 2020 18:24 UTC
12 points
0 ∶ 0
Minor:
Like last year, we ran a full model with all interactions, and used backwards selection to select predictors.
Presuming backwards selection is stepwise elimination, this is not a great approach to model generation. See e.g. this from Frank Harrell: in essence, stepwise tends to be a recipe for overfitting, and thus the models it generates tend to have inflated goodness of fit measures (e.g. R2), overestimated coefficient estimates, and very hard to interpret p values (given the implicit multiple testing in the prior ‘steps’). These problems are compounded by generating a large number of new variables (all interaction terms) for stepwise to play with.
Some improvements would be:
1. Select the variables by your judgement, and report that model. If you do any post-hoc additions (e.g. suspecting an interaction term), report these with the rider it is a post-hoc assessment.
2. Have a hold-out dataset to test your model (however you choose to generate it) against. (Cross-validation is an imperfect substitute).
3. Ridge, Lasso, elastic net or other approaches to variable selection.
- david_reinstein 27 Oct 2021 19:34 UTC
  3 points
  0 ∶ 0
  Parent
  (Repost comment from response to 2018):
  
  In this years’ post—or, better yet, see the dynamic document here, in our predictive models we use elastic-net and random-forest modeling approaches with validation (cross-fold validation for tuning on training data, predictive power and model performance measured on set-aside testing data).
- david_reinstein 13 Jan 2021 22:23 UTC
  1 point
  0 ∶ 0
  Parent
  I’m involved with doing this analysis this year, andI hope we can go in this direction. Perhaps not in the first iteration, but as we refine it.
  - david_reinstein 10 Sep 2021 17:25 UTC
    1 point
    0 ∶ 0
    Parent
    Coming soon...