Like last year, we ran a full model with all interactions, and used backwards selection to select predictors.
Presuming backwards selection is stepwise elimination, this is not a great approach to model generation. See e.g. this from Frank Harrell: in essence, stepwise tends to be a recipe for overfitting, and thus the models it generates tend to have inflated goodness of fit measures (e.g. R2), overestimated coefficient estimates, and very hard to interpret p values (given the implicit multiple testing in the prior ‘steps’). These problems are compounded by generating a large number of new variables (all interaction terms) for stepwise to play with.
Some improvements would be:
1. Select the variables by your judgement, and report that model. If you do any post-hoc additions (e.g. suspecting an interaction term), report these with the rider it is a post-hoc assessment.
2. Have a hold-out dataset to test your model (however you choose to generate it) against. (Cross-validation is an imperfect substitute).
3. Ridge, Lasso, elastic net or other approaches to variable selection.
In this years’ post—or, better yet, see the dynamic document here, in our predictive models we use elastic-net and random-forest modeling approaches with validation (cross-fold validation for tuning on training data, predictive power and model performance measured on set-aside testing data).
Minor:
Presuming backwards selection is stepwise elimination, this is not a great approach to model generation. See e.g. this from Frank Harrell: in essence, stepwise tends to be a recipe for overfitting, and thus the models it generates tend to have inflated goodness of fit measures (e.g. R2), overestimated coefficient estimates, and very hard to interpret p values (given the implicit multiple testing in the prior ‘steps’). These problems are compounded by generating a large number of new variables (all interaction terms) for stepwise to play with.
Some improvements would be:
1. Select the variables by your judgement, and report that model. If you do any post-hoc additions (e.g. suspecting an interaction term), report these with the rider it is a post-hoc assessment.
2. Have a hold-out dataset to test your model (however you choose to generate it) against. (Cross-validation is an imperfect substitute).
3. Ridge, Lasso, elastic net or other approaches to variable selection.
(Repost comment from response to 2018):
In this years’ post—or, better yet, see the dynamic document here, in our predictive models we use elastic-net and random-forest modeling approaches with validation (cross-fold validation for tuning on training data, predictive power and model performance measured on set-aside testing data).
I’m involved with doing this analysis this year, andI hope we can go in this direction. Perhaps not in the first iteration, but as we refine it.
Coming soon...