I pretty much agree with your broad points. Some quick thoughts on each section from an “insider” point of view:
Short-term impacts of treatment: The problem with these studies is that they study “apps” as a category, but apps are 1. extremely different from one another and vary drastically in quality and 2. are constantly improving. The best apps are probably 10-100x better than the worst mental health apps. Also, functionality of the best apps today are very different than the best apps from a few years ago when these studies were run. In the future, they’ll probably be even more different. I don’t think we’re close to peak mental health app effectiveness.
Self-guided app adherence: Apple actually has made benchmarks available! I can grab them for you. For health and fitness apps which rely on a subscription business model, here are the 25th / 50th / 75th percentiles.
Day 1: ~16% / ~25% / ~34% Day 7: ~3% / ~7% / ~13% Day 28: ~0.75% / ~2.5% / ~6%
Note: The exact percentages vary by a little bit depending on which week is selected.
User acquisition costs: I can confirm there are huge differences in cost per install between US iOS (in the range of $1-2 per install) vs low-income country Android ($0.02 - $0.10). Although of course, ads aren’t the only way to get app downloads. Organic app downloads are $0.00!
Development costs: If you were to put us on your comparison, we’d be at 2 employees (my wife and I) @ ~8,000,000 downloads across all of our apps. I can confirm that service and hosting costs are basically negligible.
How to improve: I agree that marketing to low-CPI regions could be a great impact opportunity. I’m most excited about improving drop-off rates / effect sizes (two sides of the same coin). I can absolutely imagine a world where a self-guided mental health app is 10x more effective than the best one available now.
Again, great overview! Let me know if I can be helpful!
Short-term impacts: Mmm, this has made me realise I wasn’t explicit about the assumptions I made there—I should either make that effect size bound a bit wider or model it as an exponential (or possibly a beta). I think this CEA is best interpreted as ‘if you built an evidence-based product, what would its cost-effectiveness be?’ but even that should probably have a wider bound. And there’s the new update in Linardon et al. (2024) that will be worth incorporating.
Adherence: Thank you! That roughly tracks with the decay curves from SimilarWeb, which is good validation. Although you raise a good point—decay probably depends a lot on whether you’re feature-gating after a trial period or not. Do you have a ballpark for the ratio of installs to DAU?
CPI: Those are lower CPIs than the estimates I had—good to know! Are those on Facebook, Tik Tok, elsewhere? I was also assuming organic traffic is negligible after the first hundred thousand or so, but do you still see an effect there?
Dev costs: Lovely! Having worked in industry, I definitely have the sense that there are good incentive reasons why headcounts might be unnecessarily bloated 🙃
Opportunities: I won’t ask what your roadmap looks like, but it’s very promising that you have this hunch. In my own experience as a user, I can definitely concur.
I’ll mull for a bit and update the OP with some adjustments. I might also shoot you a DM with some curiosity questions later. Thank you again! 😍
Ratio of Installs to DAU: Hmm, that’s an interesting metric...the way I think about retention is like a layered cake, kind of like the baumkuchen I just ate for breakfast, but linear instead of round. Anyways, there’s time on the X axis and users on the Y axis. For any given day, there’s a sizeable layer of cream at the top which are the Day 0 users. And then right below that, a smaller layer of Day 1 users, etc. etc. Ultimately there are hundreds of layers of users from older daily cohorts. You can track each daily cohort through time and it’ll start big and then shrink rapidly, following the retention curves, until eventually it flatlines at some point (ideally above 0).
So you could look at overall installs to DAU, but that gives an advantage to new apps because they don’t have a lot of old installs from years-old cohorts that have left. Or you could compare daily installs to DAU, but that gives an advantage to old apps because they’ll have a lot of users from old cohorts.
A better metric could be DAU / MAU ratio, which measures like out of all of your active users, how many of them use the app every day. Here ~25% would be exceptional with an average of probably around 10%. But that’s also biased based on how many new users you’re bringing in each day.
CPI: Yes, those numbers are from Facebook / Instagram / TikTok ads.
In terms of organic traffic, it’s also a measure of time. Say for example you’re bringing in 1000 organic users a day. After a year, that’s 365k users. After 5 years that’s 1.8M users. Of course, the app still has to remain good to continue getting organic downloads. Since the definition of good is always improving, the app would need to be consistently updated.
I’d say estimate around 2⁄3 of our lifetime installs are organic, but it really depends on the app. I speculate that Daylio might be closer to 100% organic while Breeze is probably closer to 0%.
Hmm—good points. Getting Installs/DAU wrong could meaningfully affect the numbers, I guess longer-term retention per install is probably a better way of accounting for it. It was unclear to me whether to model retention as having a zero or nonzero limiting value, which would change some of the calculations.
Improving organic install rate would be promising if you could get it above 50%, I think (your apps sound very effective!). I suspect a lot of that is, as you say, about consistently building a good user experience and continuing to add value. (I see a lot of Daylio users complaining about the lack of updates & the increased ad load.)
Great overview! I’m Eddie, long-time EA and co-creator of the mental health app Clarity https://apps.apple.com/us/app/clarity-cbt-thought-diary/id1010391170
I pretty much agree with your broad points. Some quick thoughts on each section from an “insider” point of view:
Short-term impacts of treatment: The problem with these studies is that they study “apps” as a category, but apps are 1. extremely different from one another and vary drastically in quality and 2. are constantly improving. The best apps are probably 10-100x better than the worst mental health apps. Also, functionality of the best apps today are very different than the best apps from a few years ago when these studies were run. In the future, they’ll probably be even more different. I don’t think we’re close to peak mental health app effectiveness.
Self-guided app adherence: Apple actually has made benchmarks available! I can grab them for you. For health and fitness apps which rely on a subscription business model, here are the 25th / 50th / 75th percentiles.
Day 1: ~16% / ~25% / ~34%
Day 7: ~3% / ~7% / ~13%
Day 28: ~0.75% / ~2.5% / ~6%
Note: The exact percentages vary by a little bit depending on which week is selected.
User acquisition costs: I can confirm there are huge differences in cost per install between US iOS (in the range of $1-2 per install) vs low-income country Android ($0.02 - $0.10). Although of course, ads aren’t the only way to get app downloads. Organic app downloads are $0.00!
Development costs: If you were to put us on your comparison, we’d be at 2 employees (my wife and I) @ ~8,000,000 downloads across all of our apps. I can confirm that service and hosting costs are basically negligible.
How to improve: I agree that marketing to low-CPI regions could be a great impact opportunity. I’m most excited about improving drop-off rates / effect sizes (two sides of the same coin). I can absolutely imagine a world where a self-guided mental health app is 10x more effective than the best one available now.
Again, great overview! Let me know if I can be helpful!
Eddie, thank you (I’m a long time fan!)
Short-term impacts: Mmm, this has made me realise I wasn’t explicit about the assumptions I made there—I should either make that effect size bound a bit wider or model it as an exponential (or possibly a beta). I think this CEA is best interpreted as ‘if you built an evidence-based product, what would its cost-effectiveness be?’ but even that should probably have a wider bound. And there’s the new update in Linardon et al. (2024) that will be worth incorporating.
Adherence: Thank you! That roughly tracks with the decay curves from SimilarWeb, which is good validation. Although you raise a good point—decay probably depends a lot on whether you’re feature-gating after a trial period or not. Do you have a ballpark for the ratio of installs to DAU?
CPI: Those are lower CPIs than the estimates I had—good to know! Are those on Facebook, Tik Tok, elsewhere? I was also assuming organic traffic is negligible after the first hundred thousand or so, but do you still see an effect there?
Dev costs: Lovely! Having worked in industry, I definitely have the sense that there are good incentive reasons why headcounts might be unnecessarily bloated 🙃
Opportunities: I won’t ask what your roadmap looks like, but it’s very promising that you have this hunch. In my own experience as a user, I can definitely concur.
I’ll mull for a bit and update the OP with some adjustments. I might also shoot you a DM with some curiosity questions later. Thank you again! 😍
Ratio of Installs to DAU: Hmm, that’s an interesting metric...the way I think about retention is like a layered cake, kind of like the baumkuchen I just ate for breakfast, but linear instead of round. Anyways, there’s time on the X axis and users on the Y axis. For any given day, there’s a sizeable layer of cream at the top which are the Day 0 users. And then right below that, a smaller layer of Day 1 users, etc. etc. Ultimately there are hundreds of layers of users from older daily cohorts. You can track each daily cohort through time and it’ll start big and then shrink rapidly, following the retention curves, until eventually it flatlines at some point (ideally above 0).
So you could look at overall installs to DAU, but that gives an advantage to new apps because they don’t have a lot of old installs from years-old cohorts that have left. Or you could compare daily installs to DAU, but that gives an advantage to old apps because they’ll have a lot of users from old cohorts.
A better metric could be DAU / MAU ratio, which measures like out of all of your active users, how many of them use the app every day. Here ~25% would be exceptional with an average of probably around 10%. But that’s also biased based on how many new users you’re bringing in each day.
By the way, the only peer group benchmarks that apple provides are Conversion Rate, Proceeds per Paying User, Crash Rate, Day 1/7/28 Retention. https://developer.apple.com/app-store/peer-group-benchmarks/ . But they might be announcing more in March thanks to the EU’s DMA. https://developer.apple.com/support/dma-and-apps-in-the-eu/#app-analytics
CPI: Yes, those numbers are from Facebook / Instagram / TikTok ads.
In terms of organic traffic, it’s also a measure of time. Say for example you’re bringing in 1000 organic users a day. After a year, that’s 365k users. After 5 years that’s 1.8M users. Of course, the app still has to remain good to continue getting organic downloads. Since the definition of good is always improving, the app would need to be consistently updated.
I’d say estimate around 2⁄3 of our lifetime installs are organic, but it really depends on the app. I speculate that Daylio might be closer to 100% organic while Breeze is probably closer to 0%.
Hmm—good points. Getting Installs/DAU wrong could meaningfully affect the numbers, I guess longer-term retention per install is probably a better way of accounting for it. It was unclear to me whether to model retention as having a zero or nonzero limiting value, which would change some of the calculations.
Improving organic install rate would be promising if you could get it above 50%, I think (your apps sound very effective!). I suspect a lot of that is, as you say, about consistently building a good user experience and continuing to add value. (I see a lot of Daylio users complaining about the lack of updates & the increased ad load.)