Estimating long-term treatment effects without long-term outcome data (David Rhys Bernard, Jojo Lee and Victor Yaneng Wang)

Link post

Summary

The surrogate index method allows policymakers to estimate long-run treatment effects before long-run outcomes are observable. We meta-analyse this approach over nine long-run RCTs in development economics, comparing surrogate estimates to estimates from actual long-run RCT outcomes. We introduce the M-lasso algorithm for constructing the surrogate approach’s first-stage predictive model and compare its performance with other surrogate estimation methods. Across methods, we find a negative bias in surrogate estimates. For the M-lasso method, in particular, we investigate reasons for this bias and quantify significant precision gains. This provides evidence that the surrogate index method incurs a bias-variance trade-off.

Introduction

The long-term effects of treatments and policies are important in many different fields. In medicine, one may want to estimate the effect of a surgery on life expectancy; in economics, the effect of a conditional cash transfer during childhood on adult income. One way to measure these effects would be to run a randomised controlled trial (RCT) and then wait to observe the long-run outcomes. However, the results would be observed too late to inform policy decisions made today.

A prominent solution to this issue is the surrogate index, a method for estimating long-run effects without long-run outcome data, which was originally proposed by Athey, Chetty, Imbens, and Kang (2019). Our paper contributes to the evolving literature on this method by examining its empirical performance in a wide range of RCT contexts. We also extend the discourse initiated by LaLonde (1986) on the bias of non-experimental methods, extending the set of estimators studied to those focused on long-term effects. Our findings and recommendations aim to guide practitioners intending to use the surrogate index method, thereby aiding in the development of effective long-term treatment strategies.

We test the surrogate approach on data from nine RCTs in development economics. These RCTs are selected on the basis of being long-running and having a sufficiently large sample size.

In each RCT, we first produce an unbiased estimate of the standard experimental average treatment effect by regressing long-term outcomes on treatment status. Next, we reanalyse the data using the surrogate index approach. If the surrogate estimate is close to the unbiased estimate from the experimental approach, then the surrogate index method is working well. We run meta-analyses on the difference between these estimates to understand how well the surrogate index method performs under different conditions.

We test many different implementations of the surrogate index estimator, varying (1) the set of surrogates used, (2) the first-stage prediction method used, and (3) the observational dataset used to construct the surrogate index. Notably, we introduce a new estimator called the M-lasso, which is specifically designed for use with the surrogate method.

When meta-analysing our results, we find that the surrogate index method is consistently negatively biased and underestimates positive long-term treatment effects by 0.05 standard deviations on average. This is the case regardless of which estimation method we use. We suggest that this is due to missing surrogates, as well as bias in the first-stage predictive model of the surrogate procedure.

While it is important to understand this negative bias as a potential shortcoming of the surrogate approach, we would not necessarily take it to dissuade researchers from this method altogether. Instead, one could interpret surrogate estimates as a reasonable lower bound on the true long-term treatment effect. Furthermore, there is often no better alternative for estimating the true effect.

We also study potential determinants of the surrogate bias for the M-lasso estimator. In particular, we find suggestive evidence that M-lasso bias is smaller for simpler interventions. However, we do not find that this bias depends on the predictive accuracy of the first-stage model in the observational dataset. Our evidence is also inconclusive about how bias is affected by longer time horizons between the surrogates and the outcomes.

We further show that despite the potential bias from using the surrogate index method, it results in significant precision gains, with standard errors on average 52% the size of those from the long-term RCT estimates. Hence, even if researchers had access to long-term outcomes, they might still choose to use the surrogate index, depending on their willingness to trade off bias and variance.

The rest of this paper proceeds as follows. Section 2 discusses related literature. Section 3 summarises the econometric theory behind the surrogate index approach, and section 4 describes in more detail the data we use. Section 5 explains the methods we use to estimate comparable long-term RCT and surrogate index estimates. Section 6 presents results of the meta-analysis over 9 RCTs for different implementations of the surrogate index. In it, we empirically characterise the bias and standard errors for the surrogate method, as well as examine which surrogates are selected by the M-lasso. Finally, section 7 concludes.

Read the rest of the paper