Did you run additional robustness checks? I like to see a multiverse analysis, aka specification curve, (see here). This involves running all combinations of control variables, since the order in which the controls are added matters, and the authors could have selected only the significant ones. (See also.)
(epistemic status: possibly dumb question by someone learning causal inference) shouldn’t you test only combinations of controls which are good conditioning strategies for a plausible causal DAG?
Showing these various specifications, with, as you point out, wonky or strange non-final specifications is sort of a “sanity check”.
(Or alternatively, the authors are “selling” their work by trying to show a consistent story by incremental showing increasingly complex specifications, carefully embedded in a performative narrative).
So say, if you’re measuring the effect of “education on wages”, you might start with your raw, nearly simple regression of wage on years of schooling. Then, as the beginning step of many, you might proceed by showing a specification that includes controls for gender or parental SES (even though this isn’t the actual favored specification).
This gives a sanity check.
For example, you expect that the effect of education to go down once you add parental social economic status. So if it goes up, something is probably very wrong. At the same time, the parameter values for controls have content in themselves which is useful.
In the end, most empirical economists are skeptical of any single specification, or entire paper really (in some cases, comically, ruthlessly skeptical, exceeding the most negative comment made by this account), so showing these specifications are good in this culture.
I like to see a multiverse analysis, aka specification curve, (see here). This involves running all combinations of control variables, since the order in which the controls are added matters, and the authors could have selected only the significant ones.
The parent comment here is recommending all “combinations of control variables”. That’s different than the normal aesthetic in most empirical econ papers. That might not be ideal in some situations. I haven’t heard of this before[1]. Also, no one I know has ever used the term “multiverse analysis”, that really gives the data, specification and model too much credit.
Also, in OLS and most variants, order doesn’t matter. I think maybe he’s talking about something else, but it’s not clear from the paper. I find this confusing and not useful to most readers.
Yes, but there are often many plausible sets of control variables that (hopefully) get you conditional independence. I find it easier to plot everything, with the understanding that some specifications are better than others.
Did you run additional robustness checks? I like to see a multiverse analysis, aka specification curve, (see here). This involves running all combinations of control variables, since the order in which the controls are added matters, and the authors could have selected only the significant ones. (See also.)
(epistemic status: possibly dumb question by someone learning causal inference)
shouldn’t you test only combinations of controls which are good conditioning strategies for a plausible causal DAG?
Showing these various specifications, with, as you point out, wonky or strange non-final specifications is sort of a “sanity check”.
(Or alternatively, the authors are “selling” their work by trying to show a consistent story by incremental showing increasingly complex specifications, carefully embedded in a performative narrative).
So say, if you’re measuring the effect of “education on wages”, you might start with your raw, nearly simple regression of wage on years of schooling. Then, as the beginning step of many, you might proceed by showing a specification that includes controls for gender or parental SES (even though this isn’t the actual favored specification).
This gives a sanity check.
For example, you expect that the effect of education to go down once you add parental social economic status. So if it goes up, something is probably very wrong. At the same time, the parameter values for controls have content in themselves which is useful.
In the end, most empirical economists are skeptical of any single specification, or entire paper really (in some cases, comically, ruthlessly skeptical, exceeding the most negative comment made by this account), so showing these specifications are good in this culture.
The parent comment here is recommending all “combinations of control variables”. That’s different than the normal aesthetic in most empirical econ papers. That might not be ideal in some situations. I haven’t heard of this before[1]. Also, no one I know has ever used the term “multiverse analysis”, that really gives the data, specification and model too much credit.
Also, in OLS and most variants, order doesn’t matter. I think maybe he’s talking about something else, but it’s not clear from the paper. I find this confusing and not useful to most readers.
However, my PhD is from Nipissing Technical College of Agriculture, so if some Harvard dude comes in, they are probably right.
See the Gelbach paper linked above.
Yes, but there are often many plausible sets of control variables that (hopefully) get you conditional independence. I find it easier to plot everything, with the understanding that some specifications are better than others.