[This is a more well-thought-out version of the argument I made on Twitter yesterday.]
the best I can tell is in Phase I we had a ~7.5 vs ~5.1 PHQ-9 reduction from “being surveyed” vs “being part of the group” and in Phase II we had ~5.1 vs ~4.5 PHQ-9 reduction from “being surveyed” vs “being part of the group”.
I think the Phase II numbers were not meant to be interpreted quite that way. For context, this is line chart of scores over time for Phase I, and this is the corresponding chart for Phase II. We can see that in the Phase II chart, the difference between the control and treatment groups is much larger than that in the Phase I chart. Eyeballing, it looks like the difference between the control and treatment groups in Phase II eventually reaches ~10 points, not 4.5.
[...] members in the treatment intervention group, on average, had a 4.5 point reduction in their total PHQ-9 Raw Score over the intervention period, as compared to the control populations. Further, [… t]he PHQ-9 Raw Score decreased on average by 0.86 points for a participant for every two groups she attended.
What this seems to be saying is they ran a linear regression model to fit a non-linear line, and the regression says that PHQ-9 scores decreased by 4.5 points in the treatment group, plus 0.86 points for every 2 sessions attended. So, for example, someone in the treatment group who attended 12 sessions (as 91% of women in the treatment group did) would get a 4.5 + 6*0.86 = 9.66 point drop over someone in the control group who attended 0 sessions.
A bit confusingly, the Phase I report described the result with the same kind of linear regression model:
Furthermore, the analysis determined that depressed female patients who completed the GIPT intervention, on average, experienced a 5.1 point reduction in their total PHQ-9 Raw Score over the entire 16-week intervention period, compared to the control group. Additionally, for each visit, these women experienced an average 0.63 reduction in their PHQ-9 Raw Score for depression.
But for Phase I, the effect associated with being in the treatment group controlling for sessions attended (5.1 points) is what matches the treatment-control gap eyeballed from the Phase I line chart.
It looks like there are differences between Phase I and Phase II regarding how the control group was handled. In the Phase I line chart, there are several PHQ-9 datapoints for the control group; in the Phase II chart there are only two, one at the beginning and one at the end. It looks like in Phase I, women in the control group took the PHQ-9 weekly, and this was counted as a “visit” in the regression model. In contrast, in Phase II, only the treatment group had visits that were counted that way (except perhaps for the beginning and end of the trial).
So I think it makes more sense to say that Phase II ended up finding a ~10 point decrease between the treatment and control groups, and Phase I a 5.1-point decrease, but with the obvious caveat that the difference was due to Phase II control group members not being surveyed as much. It doesn’t seem like you can answer the question “how much of the effect is due to the treatment, and how much due to being surveyed multiple times?” using Phase II data.
[This is a more well-thought-out version of the argument I made on Twitter yesterday.]
I think the Phase II numbers were not meant to be interpreted quite that way. For context, this is line chart of scores over time for Phase I, and this is the corresponding chart for Phase II. We can see that in the Phase II chart, the difference between the control and treatment groups is much larger than that in the Phase I chart. Eyeballing, it looks like the difference between the control and treatment groups in Phase II eventually reaches ~10 points, not 4.5.
The quote from the Phase II report in your post says:
What this seems to be saying is they ran a linear regression model to fit a non-linear line, and the regression says that PHQ-9 scores decreased by 4.5 points in the treatment group, plus 0.86 points for every 2 sessions attended. So, for example, someone in the treatment group who attended 12 sessions (as 91% of women in the treatment group did) would get a 4.5 + 6*0.86 = 9.66 point drop over someone in the control group who attended 0 sessions.
A bit confusingly, the Phase I report described the result with the same kind of linear regression model:
But for Phase I, the effect associated with being in the treatment group controlling for sessions attended (5.1 points) is what matches the treatment-control gap eyeballed from the Phase I line chart.
It looks like there are differences between Phase I and Phase II regarding how the control group was handled. In the Phase I line chart, there are several PHQ-9 datapoints for the control group; in the Phase II chart there are only two, one at the beginning and one at the end. It looks like in Phase I, women in the control group took the PHQ-9 weekly, and this was counted as a “visit” in the regression model. In contrast, in Phase II, only the treatment group had visits that were counted that way (except perhaps for the beginning and end of the trial).
So I think it makes more sense to say that Phase II ended up finding a ~10 point decrease between the treatment and control groups, and Phase I a 5.1-point decrease, but with the obvious caveat that the difference was due to Phase II control group members not being surveyed as much. It doesn’t seem like you can answer the question “how much of the effect is due to the treatment, and how much due to being surveyed multiple times?” using Phase II data.
Yes, I agree with this—editing the post to make this correction