[based on an internally run study of 250 uses] Mind Ease reduces anxiety by 51% on average, and helps people feel better 80% of the time.
Extraordinary claims like this (and itâs not the only oneâe.g. âvery likelyâ to help myself or people who I know who suffer from anxiety elsewhere in the post, âAnd for anxiety [discovering which interventions work best] is what weâve done, â45% reduction in negative feelingsâ in the app itself) demands much fuller and more rigorous description and justification. e.g. (and cf. PICO):
(Population): How are you recruiting the users? Mturk? Positly? Convenience sample from sharing the link? Are they paid for participation? Are they âpeople validated (somehow) as having an anxiety disorderâ or (as I guess) âpeople interested in reducing their anxiety/âhaving something to help when they are particularly anxious?â
(Population): Are the â250 usesâ 250 individuals each using Mindease once? If not, whatâs the distribution of duplicates?
(Intervention): Does â250 usesâ include everyone who fired up the app, or only those who âfinishedâ the exercise (and presumably filled out the post-exposure assessment)?
(Comparator): Is this a pre-post result? Or is this vs. the sham control mentioned later? (If so, what is the effect size on the sham control?)
(Outcome): If pre-post, is the postexp assessment immediately subsequent to the intervention?
(Outcome): âreduces anxiety by 51%â on what metric? (Playing with the app suggests 5-level Likert scales?)
(Outcome): Effect size (51% from what to what?) Inferential stats on the same (SE/âCI, etc.)
There are also natural external validity worries. If (as I think it is) the objective is âimmediate symptomatic reliefâ, results are inevitably confounded by anxiety a symptom that is often transient (or at least fluctuating in intensity), and one with high rates of placebo response. An app which does literally nothing but waits a couple of days before assessing (symptomatic) anxiety again will probably show great reductions in self-reported anxiety on pre-post, as people will be preferentially selected to use the app when feeling particularly anxious, and severity will tend to regress. This effect could apply to much shorter intervals (e.g. those required to perform a recommended exercise).
(Aside: An interesting validity test would be using GAD-7 for pre-post assessment. As all the items on GAD-7 are âhow often do you get X over the last 2 weeksâ, significant reduction in this metric immediately after the intervention should raise alarm).
In candour (and with regret) this write-up raises a lot of red flags to me. There is a large relevant literature which this post does not demonstrate command of. For example, thereâs a small hill of descriptive epidemiology papers on prevalence of anxiety as a symptom or anxiety disordersâincluding large population samples for GAD-7, which would look better routes to prevalence estimates than conducting a 300-person survey (and if you do run this survey, finding a prevalence in your sample of 73% >5 GAD given the population studies (e.g.) give means and medians ~2-3 and proportions >5 ~ 25% prompt obvious questions).
Likewise there are well-understood pitfalls in conducting research (some them particularly acute for intervention studies, and even moreso in intervention studies on mental health), which the âmarketing copyâ style presentation (heavy on exuberant confidence, light on how this is substantiated) gives little reassurance they were in fact avoided. I appreciate âwriting for an interested lay audienceâ (i.e. this one) demands a different style than writing to cater to academic scepticism. Yet the latter should be satisfied (either here or in a linked write-up), especially when attempting pioneering work in this area and claiming âextraordinarily goodâ results. Weâd be cautious in accepting this from outside sourcesâwe should mete out similar measure to projects developed âin houseâ.
I hope subsequent work proves my worries unfounded.
As Iâm sure you are aware, this post had the goal of making people in the EA community aware of what we are working on, and why we are working on it, rather than attempting to provide rigorous proof of the effectiveness of our interventions.
One important thing to note is that weâre not aiming to treat long-term anxiety, but rather to treat the acute symptoms of anxiety to help people feel better quickly at moments when they need it. We measure anxiety immediately before the intervention, then the intervention runs, then we measure anxiety again (using three likert scale questions asked both immediately before and immediately after the intervention). At this point we have run studies testing many techniques for quickly reducing acute anxiety, so we know that some work much better than others.
Iâve updated the post with some edits and extra footnotes in response to your feedback, and here are some point by point responses:
How are you recruiting the users? Mturk? Positly?
We recruit paid participants for our studies via Positly.com (which pulls from Mechanical Turk automatically applying extra quality measures and providing us with extra researcher focussed features). Depending on the goals for a study we sometimes recruit broadly (from anyone who wants to participate), and other times specifically seek to recruit people with high levels of anxiety.
Are the â250 usesâ 250 individuals each using Mindease once? If not, whatâs the distribution of duplicates?
This data is from 49 paid study participants who used the app about 5 times total on average over a period of about 5 days (at whatever times they choose).
This particular study targeted users who experience at least some anxiety.
Does â250 usesâ include everyone who fired up the app, or only those who âfinishedâ the exercise (and presumably filled out the post-exposure assessment)?
Itâs based on only the people completed an intervention (i.e. where we had a pre and post measurement).
Is this a pre-post result? Or is this vs. the sham control mentioned later? (If so, what is the effect size on the sham control?)
This is a pre-post result. In one of our earlier studies we found the effectiveness of the interventions to be about 2x â 2.5x that of the control (13-17 âpointsâ of pre-post mood change versus about 7 for the control). Weâve changed a lot about our methodology and interventions though, and donât have measurements for the control yet with the new changes.
If pre-post, is the postexp assessment immediately subsequent to the intervention?
Yes. Our goal is to have the user be much calmer by the time they finish the intervention than they were when they started.
âreduces anxiety by 51%â on what metric? (Playing with the app suggests 5-level Likert scales?)
Using the negative feelings (not any positive feelings) reported on 3 likert scale questions. So people who reported no negative feelings at the beginning of the intervention are ignored for analysis since there are no negative feelings reported that we could remove.
Ditto âfeels betterâ (measured how?)
The 80% success rate refers to whenever a userâs negative feelings are reduced by any amount.
And thank you for telling me your honest reaction, your feedback has helped improve the post.
Extraordinary claims like this (and itâs not the only oneâe.g. âvery likelyâ to help myself or people who I know who suffer from anxiety elsewhere in the post, âAnd for anxiety [discovering which interventions work best] is what weâve done, â45% reduction in negative feelingsâ in the app itself) demands much fuller and more rigorous description and justification. e.g. (and cf. PICO):
(Population): How are you recruiting the users? Mturk? Positly? Convenience sample from sharing the link? Are they paid for participation? Are they âpeople validated (somehow) as having an anxiety disorderâ or (as I guess) âpeople interested in reducing their anxiety/âhaving something to help when they are particularly anxious?â
(Population): Are the â250 usesâ 250 individuals each using Mindease once? If not, whatâs the distribution of duplicates?
(Intervention): Does â250 usesâ include everyone who fired up the app, or only those who âfinishedâ the exercise (and presumably filled out the post-exposure assessment)?
(Comparator): Is this a pre-post result? Or is this vs. the sham control mentioned later? (If so, what is the effect size on the sham control?)
(Outcome): If pre-post, is the postexp assessment immediately subsequent to the intervention?
(Outcome): âreduces anxiety by 51%â on what metric? (Playing with the app suggests 5-level Likert scales?)
(Outcome): Ditto âfeels betterâ (measured how?)
(Outcome): Effect size (51% from what to what?) Inferential stats on the same (SE/âCI, etc.)
There are also natural external validity worries. If (as I think it is) the objective is âimmediate symptomatic reliefâ, results are inevitably confounded by anxiety a symptom that is often transient (or at least fluctuating in intensity), and one with high rates of placebo response. An app which does literally nothing but waits a couple of days before assessing (symptomatic) anxiety again will probably show great reductions in self-reported anxiety on pre-post, as people will be preferentially selected to use the app when feeling particularly anxious, and severity will tend to regress. This effect could apply to much shorter intervals (e.g. those required to perform a recommended exercise).
(Aside: An interesting validity test would be using GAD-7 for pre-post assessment. As all the items on GAD-7 are âhow often do you get X over the last 2 weeksâ, significant reduction in this metric immediately after the intervention should raise alarm).
In candour (and with regret) this write-up raises a lot of red flags to me. There is a large relevant literature which this post does not demonstrate command of. For example, thereâs a small hill of descriptive epidemiology papers on prevalence of anxiety as a symptom or anxiety disordersâincluding large population samples for GAD-7, which would look better routes to prevalence estimates than conducting a 300-person survey (and if you do run this survey, finding a prevalence in your sample of 73% >5 GAD given the population studies (e.g.) give means and medians ~2-3 and proportions >5 ~ 25% prompt obvious questions).
Likewise there are well-understood pitfalls in conducting research (some them particularly acute for intervention studies, and even moreso in intervention studies on mental health), which the âmarketing copyâ style presentation (heavy on exuberant confidence, light on how this is substantiated) gives little reassurance they were in fact avoided. I appreciate âwriting for an interested lay audienceâ (i.e. this one) demands a different style than writing to cater to academic scepticism. Yet the latter should be satisfied (either here or in a linked write-up), especially when attempting pioneering work in this area and claiming âextraordinarily goodâ results. Weâd be cautious in accepting this from outside sourcesâwe should mete out similar measure to projects developed âin houseâ.
I hope subsequent work proves my worries unfounded.
Hey Gregory,
Thanks for the in-depth response.
As Iâm sure you are aware, this post had the goal of making people in the EA community aware of what we are working on, and why we are working on it, rather than attempting to provide rigorous proof of the effectiveness of our interventions.
One important thing to note is that weâre not aiming to treat long-term anxiety, but rather to treat the acute symptoms of anxiety to help people feel better quickly at moments when they need it. We measure anxiety immediately before the intervention, then the intervention runs, then we measure anxiety again (using three likert scale questions asked both immediately before and immediately after the intervention). At this point we have run studies testing many techniques for quickly reducing acute anxiety, so we know that some work much better than others.
Iâve updated the post with some edits and extra footnotes in response to your feedback, and here are some point by point responses:
We recruit paid participants for our studies via Positly.com (which pulls from Mechanical Turk automatically applying extra quality measures and providing us with extra researcher focussed features). Depending on the goals for a study we sometimes recruit broadly (from anyone who wants to participate), and other times specifically seek to recruit people with high levels of anxiety.
This data is from 49 paid study participants who used the app about 5 times total on average over a period of about 5 days (at whatever times they choose).
This particular study targeted users who experience at least some anxiety.
Itâs based on only the people completed an intervention (i.e. where we had a pre and post measurement).
This is a pre-post result. In one of our earlier studies we found the effectiveness of the interventions to be about 2x â 2.5x that of the control (13-17 âpointsâ of pre-post mood change versus about 7 for the control). Weâve changed a lot about our methodology and interventions though, and donât have measurements for the control yet with the new changes.
Yes. Our goal is to have the user be much calmer by the time they finish the intervention than they were when they started.
Using the negative feelings (not any positive feelings) reported on 3 likert scale questions. So people who reported no negative feelings at the beginning of the intervention are ignored for analysis since there are no negative feelings reported that we could remove.
The 80% success rate refers to whenever a userâs negative feelings are reduced by any amount.
And thank you for telling me your honest reaction, your feedback has helped improve the post.