I thought it might be useful to highlight a couple of questions I personally find interesting and didn’t see on your research agenda. I don’t think these are the most important questions, but I haven’t seen them discussed before and they seem relevant to your work.
Writing this quickly so sorry if any of it’s unclear. Not necessarily expecting an answer in the short term; just wanted to flag the questions.
(1) How should self-reporting bias affect our best guess of the effect size of therapy-based interventions on life satisfaction (proxied through e.g. depression diagnostics)?
My understanding is that at least some of the effect size for antidepressants is due to placebo (although I understand there’s a big debate over how much).
If we assume that (i) at least some of this placebo effect is due to self-reporting bias (rather than a “real” placebo effect that genuinely makes people happier), and (ii) It’s impossible to properly blind therapeutic interventions, how should this affect our best guess of the effect size of therapy relative to what’s reported in various meta-analyses? Are observer-rating scales a good way to overcome this problem?
(2) How much do external validity concerns matter for directly comparing interventions on the basis of effect on life satisfaction?
If my model is: [intervention] → increased consumption → increased life satisfaction.
And let’s say I believe the first step has high external validity but the second step has very low external validity.
That would imply that directly measuring the effect of [intervention] on life satisfaction would have very low external validity.
It might also imply a better heuristic to make predictions on the effect of future similar interventions on life satisfaction would be:
(i) Directly measure the effect of [intervention] on consumption
(ii) Use the average effect of increased consumption on life satisfaction from previous research to estimate the ultimate effect on life satisfaction.
In other words: when the link between certain outcomes and ultimate impact differs between settings in a way that’s ex ante unpredictable, it may be better to proxy future impact of similar interventions through extrapolation of outcomes, rather than directly measuring impact.
What evidence currently exists around the external validity of the links between outcomes and ultimate impact (i.e. life satisfaction)?
I remember we discussed (1) a while back but I’m afraid I don’t really remember the details anymore. To check, what exactly is the bias you have in mind—that people inflate their self-reports scores generally when they are being given treatment? Is there one or more studies you can point me to so I can read up on this, or is this a hypothetical concern?
I don’t think I understand what you’re getting at with (2): are you asking what we infer if some intervention increases consumption but doesn’t increase self-reported life satisfaction in a scenario S but does in other scenarios? That sounds like a normal case where we get contradictory evidence. Let me know if I’ve missed something here.
What evidence currently exists around the external validity of the links between outcomes and ultimate impact (i.e. life satisfaction)?
I’m not sure what you mean by this. Are you asking what the evidence is on what the causes and correlated of life satisfaction is? Dolan et al 2008 have a much cited paper on this.
>people inflate their self-reports scores generally when they are being given treatment?
Yup, that’s what I meant.
>Is there one or more studies you can point me to so I can read up on this, or is this a hypothetical concern?
I’m afraid I don’t know this literature on blinding very well but a couple of pointers:
(i) StrongMinds notes “social desirability bias” as a major limitation of their Phase Two impact evaluation, and suggest collecting objective measures to supplement their analysis:
“Develop the means to negate this bias, either by determining a corrective percentage factor to apply or using some other innovative means, such as utilizing saliva cortisol stress testing. By testing the stress levels of depressed participants (proxy for depression), StrongMinds could theoretically determine whether they are being truthful when they indicate in their PHQ-9 responses that they are not depressed.” https://strongminds.org/wp-content/uploads/2013/07/StrongMinds-Phase-Two-Impact-Evaluation-Report-July-2015-FINAL.pdf
(ii) GiveWell’s discussion of the difference between blinded and non-blinded trials on water quality interventions when outcomes were self-reported [I work for GiveWell but didn’t have any role in that work and everything I post on this forum is in a personal capacity unless otherwise noted]
May be best to just chat about this in person but I’ll try to put it another way.
Say a single RCT of a cash transfer program in a particular region of Kenya doubled people’s consumption for a year, but had no apparent effect on life satisfaction. What should we believe about the likely effect of a future cash transfer program on life satisfaction? (taking it as an assumption for the moment that the wider evidence suggests that increases in consumption generally lead to increases in life satisfaction).
Possibility 1: there’s something about cash transfer programs which mean they don’t increase life satisfaction as much as other ways to increase consumption.
Possibility 2: this result was a fluke of context; there was something about that region at that time which meant increases in consumption didn’t translate to increases in reported life satisfaction, and we wouldn’t expect that to be true elsewhere (given the wider evidence base).
If Possibility 2 is true, then it would be more accurate to predict the effect of a future cash transfer program on life satisfaction by using the RCT effect of cash on consumption, and then extrapolating from the wider evidence base to the likely effect on life satisfaction. If possibility 1 is true, then we should simply take the measured effect from the RCT on life satisfaction as our prediction.
One way of distinguishing between possibility 1 and possibility 2 would be to look at the inter-study variance in the effects of similar programs on life satisfaction. If there’s high variance, that should update us to possibility 2. If there’s low variance, that should update us to possibility 1.
I haven’t seen this problem discussed before (although I haven’t looked very hard). It seems interesting and important to me.
Excited to see your work progressing Michael!
I thought it might be useful to highlight a couple of questions I personally find interesting and didn’t see on your research agenda. I don’t think these are the most important questions, but I haven’t seen them discussed before and they seem relevant to your work.
Writing this quickly so sorry if any of it’s unclear. Not necessarily expecting an answer in the short term; just wanted to flag the questions.
(1) How should self-reporting bias affect our best guess of the effect size of therapy-based interventions on life satisfaction (proxied through e.g. depression diagnostics)?
My understanding is that at least some of the effect size for antidepressants is due to placebo (although I understand there’s a big debate over how much).
If we assume that (i) at least some of this placebo effect is due to self-reporting bias (rather than a “real” placebo effect that genuinely makes people happier), and (ii) It’s impossible to properly blind therapeutic interventions, how should this affect our best guess of the effect size of therapy relative to what’s reported in various meta-analyses? Are observer-rating scales a good way to overcome this problem?
(2) How much do external validity concerns matter for directly comparing interventions on the basis of effect on life satisfaction?
If my model is: [intervention] → increased consumption → increased life satisfaction.
And let’s say I believe the first step has high external validity but the second step has very low external validity.
That would imply that directly measuring the effect of [intervention] on life satisfaction would have very low external validity.
It might also imply a better heuristic to make predictions on the effect of future similar interventions on life satisfaction would be:
(i) Directly measure the effect of [intervention] on consumption
(ii) Use the average effect of increased consumption on life satisfaction from previous research to estimate the ultimate effect on life satisfaction.
In other words: when the link between certain outcomes and ultimate impact differs between settings in a way that’s ex ante unpredictable, it may be better to proxy future impact of similar interventions through extrapolation of outcomes, rather than directly measuring impact.
What evidence currently exists around the external validity of the links between outcomes and ultimate impact (i.e. life satisfaction)?
Hello James,
Thanks for these.
I remember we discussed (1) a while back but I’m afraid I don’t really remember the details anymore. To check, what exactly is the bias you have in mind—that people inflate their self-reports scores generally when they are being given treatment? Is there one or more studies you can point me to so I can read up on this, or is this a hypothetical concern?
I don’t think I understand what you’re getting at with (2): are you asking what we infer if some intervention increases consumption but doesn’t increase self-reported life satisfaction in a scenario S but does in other scenarios? That sounds like a normal case where we get contradictory evidence. Let me know if I’ve missed something here.
I’m not sure what you mean by this. Are you asking what the evidence is on what the causes and correlated of life satisfaction is? Dolan et al 2008 have a much cited paper on this.
On (1)
>people inflate their self-reports scores generally when they are being given treatment?
Yup, that’s what I meant.
>Is there one or more studies you can point me to so I can read up on this, or is this a hypothetical concern?
I’m afraid I don’t know this literature on blinding very well but a couple of pointers:
(i) StrongMinds notes “social desirability bias” as a major limitation of their Phase Two impact evaluation, and suggest collecting objective measures to supplement their analysis:
“Develop the means to negate this bias, either by determining a corrective percentage factor to apply or using some other innovative means, such as utilizing saliva cortisol stress testing. By testing the stress levels of depressed participants (proxy for depression), StrongMinds could theoretically determine whether they are being truthful when they indicate in their PHQ-9 responses that they are not depressed.” https://strongminds.org/wp-content/uploads/2013/07/StrongMinds-Phase-Two-Impact-Evaluation-Report-July-2015-FINAL.pdf
(ii) GiveWell’s discussion of the difference between blinded and non-blinded trials on water quality interventions when outcomes were self-reported [I work for GiveWell but didn’t have any role in that work and everything I post on this forum is in a personal capacity unless otherwise noted]
https://blog.givewell.org/2016/05/03/reservations-water-quality-interventions/
On (2)
May be best to just chat about this in person but I’ll try to put it another way.
Say a single RCT of a cash transfer program in a particular region of Kenya doubled people’s consumption for a year, but had no apparent effect on life satisfaction. What should we believe about the likely effect of a future cash transfer program on life satisfaction? (taking it as an assumption for the moment that the wider evidence suggests that increases in consumption generally lead to increases in life satisfaction).
Possibility 1: there’s something about cash transfer programs which mean they don’t increase life satisfaction as much as other ways to increase consumption.
Possibility 2: this result was a fluke of context; there was something about that region at that time which meant increases in consumption didn’t translate to increases in reported life satisfaction, and we wouldn’t expect that to be true elsewhere (given the wider evidence base).
If Possibility 2 is true, then it would be more accurate to predict the effect of a future cash transfer program on life satisfaction by using the RCT effect of cash on consumption, and then extrapolating from the wider evidence base to the likely effect on life satisfaction. If possibility 1 is true, then we should simply take the measured effect from the RCT on life satisfaction as our prediction.
One way of distinguishing between possibility 1 and possibility 2 would be to look at the inter-study variance in the effects of similar programs on life satisfaction. If there’s high variance, that should update us to possibility 2. If there’s low variance, that should update us to possibility 1.
I haven’t seen this problem discussed before (although I haven’t looked very hard). It seems interesting and important to me.