Using a mobile app to measure well-being for cause prioritization research
In order to pursue the highest-impact causes, it is essential to know how best to define and measure quality of life, and investigate the factors that influence it. Whereas wealth and disease burden have been a major focus of economic research, policy analyses and charity evaluations, over the last decades there has been increasing academic interest in more direct measurements of quality of life, such as subjective well-being: people’s evaluations of their own lives and moods, and the quality with which they experience it. Past research on this subject using mood-tracking smartphone apps has yielded widely-cited studies, but they have not been replicated, and the potential for well-being research made possible by the near-ubiquitous usage of smartphones today has not yet been leveraged to its full capacity. With the aim of employing machine learning in order to contribute to the public’s understanding of the nature and causes of human well-being and flourishing, as well as aiding cause prioritization research, I have developed a mood-tracking mobile application, which allows users to effortlessly keep track of their emotions and circumstances, as well as the interactions between those.
Most people value doing good to others and improving the world. In the United States alone, $300 billion are spent on philanthropy annually, and even those of us who don’t directly engage in philanthropy are still concerned with causes we believe will bring good, such as improving global healthcare and education, eradicating poverty, mitigating climate change, increasing economic growth, and so on.
But an important tenet of effective altruism is recognizing that mere goodwill is not enough to make a change. The best strategies for addressing a problem are likely to be much more effective than the less effective ones. Studies have shown how the most large social programs in the U.S. have been shown to have weak or no effects, and the best healthcare interventions are much more effective than the rest.
Cause prioritization goes one meta-level up from traditional effective altruism research, and, as described in this wiki page, “looks at broad causes (e.g. migration, global warming, global health, life extension) in order to compare them, instead of examining individual charities within each cause (as has been traditional).” Paul Christiano provides arguments for cause prioritization in this document, and Katja Grace has conducted her own investigation on this issue, which I recommend reading.
In order to grasp the importance of cause prioritization research it is important to keep in mind that disagreements within the EA community on what are top priority causes to pursue are unlikely to stem from differences in value — we’re all trying to make the world a better place here, after all — but rather come from disagreements about how to do so, as well as from a healthy, and necessary, community-wide tendency to explore different ideas. In the end, you don’t want to get permanently emotionally attached to any single possible avenue for doing good; you want to keep in mind that what you are trying to achieve is making the world a better place, and pursue the cause that seems like the best for you to pursue right now in order to do so. I think most people here know that, but it’s good to remind ourselves of that regardless.
Moreover, a compelling reason for carrying out cause prioritization research I would like to add is that its value doesn’t have an intrinsic expiration date. Information and knowledge can remain useful for as long as any substrate they are instantiated on exist; save a catastrophe analogous to the burning of the Library of Alexandria (what would be much more difficult today, given how well-distributed web servers are geographically) we can expect knowledge to bring value for as long as civilization exists.
But how do we measure good? How do we measure well-being? That is a profoundly important question for addressing that issue. In order to decide which causes to prioritize, we need a satisfactory way of comparing their effectiveness, that could be applied across different and unrelated domains.
Essentially, what I am looking for is something analogous to QALYs (Quality-Adjusted Life Years). QALYs are widely used in economic evaluations to calculate the cost-effectiveness of different medical interventions, and directly influence healthcare spending from the United Kingdom, Netherlands, Germany, Australia, Canada, and New Zealand governments. They are used to compare not only the efficacy of different methods at preventing a certain disease, but also to identify which diseases to target in order to most effectively improve the target population’s aggregate health. They also influence the analyses from GiveWell and 80000hours on the effectiveness of different charities and causes.
But QALYs don’t take into account everything that we care about when selecting an altruistic cause, obviously; they weren’t developed with that purpose. Another major drawback they have is that, despite their popularity, they do a poor job at measuring the one thing they’re intended to measure, quality of life. Not a lot of people know this, but they are not at all measured based on the actual suffering experienced by patients in different health conditions, but rather on surveys of public preference. (see: NICE should value real experiences over hypothetical opinions, Valuing Health: A Brief Report on Subjective Well-Being versus Preferences).
A measurement of well-being that I think better encompasses what we care about and is thus better suited for cause prioritization research is something like subjective well-being (SWB). It describes how people experience the quality of their lives and includes both emotional reactions and cognitive judgments. It’s closely related (although not identical) to what we commonly call happiness.
Unfortunately, however, as of today SWB hasn’t enjoyed nearly as much influence in public policy and charity evaluation as QALYs have, and there has been a negligible amount of research on how societal interventions causally impact happiness. It is, however, a growing field in economics, and more and more people are beginning to realize its relevance. (What we do know about subjective well-being, though, yields a very different picture of what are the most effective causes than what we have right now, and my research on that will be the topic of another post in the future.)
(Recently, Michael Plant wrote on this forum on the reasons to use SWB to compare different causes and charities. His review is great, and I highly recommend reading it, as I will avoid going into too much depth here on topics that he covered much better than I could have.)
I mostly agree with Michael Plant, but whereas he recommends that a life satisfaction question be used to assess how much happiness different outcomes produce, I think that measures of real-time, experienced happiness would more closely measure well-being, and are today far less costly to research than they were in the past.
Indeed, the most effective way of measuring SWB, which has been dubbed the “gold-standard,” is with the experience sampling method (ESM), which was devised by Larson and Csikszentmihalyi in 1983. In this method, subjects carry around a notepad and wear a digital wristwatch, and are instructed to answer a short survey about what they are feeling and doing when the wristwatch beeps, at unpredictable points in time. This gives it stellar ecological validity: you can measure well-being at an exact point in time in an exact location, and aren’t prone to the memory biases that cloud retrospective life satisfaction judgements. Unfortunately, it was a costly and intensive research method, which demanded a lot of work from participants and researchers, and so was usually conducted in small groups of participants, and for a short period of time.
Later on, researchers began using palm handheld computers for that kind of research. That had the advantage that survey answers didn’t need to be manually entered into a computer by the researchers at the end of the study, and both the survey and the beeps could reside in one device. But it was still a very awkward, intrusive, and costly research method, which required people to carry around electronic devices everywhere they went and input data into them at random times.
Thankfully, nowadays people do that voluntarily!
The adoption of smartphones has been growing quickly since they first became a thing, around one decade ago, and much after the majority of ESM studied were conducted. Today, they are available to a wide demographic: there are tens of thousands of models available in the market, with prices ranging more than one order of magnitude, and around the world, 3.7 billion people are active mobile internet users, and nearly half of the internet connection comes from mobile devices. Moreover, it is reasonable to expect smartphone penetrance to keep growing; millions of companies depend on them to distribute their products. It is in the interest of several agents for smartphones to reach more and more people around the globe.
And smartphones make the once-costly, once-heavily intrusive experience sampling methodology become absolutely trivial. Reaching out to people at randomly chosen points in time, as they are normally living their lives, and asking them how they are feeling — with minimal interference — is easy as making sure they have internet connection and the proper software installed in their phone. That’s nearly costless, and it has a much greater ecological validity than any other method for measuring subjective well-being devised so far.
But the extent to which smartphones have been leveraged for subjective well-being research is still rather limited. The studies that have gained a substantial sample size using the experience sampling method through smartphones have been Killingsworth (2010), which gathered data from over 5,000 users of the app TrackYourHappiness, and Bryson & MacKerron (2014), which used data from tens of thousands of users of a similar app Mappiness.
Those are already pretty good sample sizes, and much larger than anything previous ESM studies could have achieved, with what their awkward notebooks and palm handheld computers. But it would be fairly easy and straightforward to do even better: Mappinness was geographically restricted to those in the U.K., and TrackYourHappiness is only available for iOS, and in English. Possibly much larger sample sizes could be achieved by merely translating the apps and making them available across different countries and for different devices, although that hasn’t happened.
Furthermore, the software world can greatly enrich the subjective well-being literature not only by providing an avenue for stellar, ecologically valid research with very large sample sizes, but also by making it possible to integrate a wide range of variables describing the person’s environment and circumstances with their self-reported well-being. For example, we can easily integrate details about weather and location, as well as biometric data (such as heart rate, exercise habits, and sleep patterns) from wearable devices such as Apple Watch, FitBit and Oura ring — to the common questions used in experience sampling studies, and see how they all correlate.
2. The How
I have developed an iOS app, SmartMood, as an attempt to take full advantage of the opportunity to understand well-being that widespread smartphone adoption provides, and, hopefully, to develop a common currency with which to compare the effectiveness of different altruistic causes. It will be released on the App Store later this month, but you can send me your email if you want to download the beta version.
At its core, the app does what the other experience sampling apps, such as TrackYourHappiness and Mappiness, have done. It sends the user notifications at random times during the day and asks them a variety of questions related to how they’re feeling and what they’re doing, which are answered through radio buttons, checkboxes, slider scales, or text boxes. It also presents correlations between mood and the other variables in a series of different charts that automatically created and updated as the user uses the app.
Traditional ESM research had subjects fill out reports on average ten times a day throughout a short time period (like a week), but I chose to opt for sending fewer notifications, from zero to five per day, perhaps, and retain users for a longer period of time. But I did make sure to make the notifications random and unpredictable, as they should be in this kind of research.
The main question meant to measure subjective well-being is “How good are you feeling right now?” That is rather similar to the question asked on Killingsworth’s and Bryson’s research. Alongside with it, users can select emotions out of a list that describe how they are feeling, allowing positive affect and negative affect scores to be calculated. The app also asks an array of questions related to the user’s circumstances, such as where they are, what they are doing, who they are interacting with, whether they have exercised in that day, how productive they have been, etc.
Users can also track any other variable they choose through the surveys. For example, I can choose to track my meditation habits, and meditation-related questions will be added to the list of possible questions and the app will automatically calculate how it affects my mood and create charts.
When designing the surveys, I had to face a few issues known in the literature to impact well-being measurements. In order to avoid the focusing illusion, I tried to assure that outcome variables are always asked before predictor variables. Questions addressing how users are feeling, how productive they have been, and what emotions best characterize their mood always come before questions related to whether they are interacting with anyone, where they are, what activity they are engaged in, etc.
Moreover, most of the questions present in each survey are random and unpredictable. When users answer how they are feeling, they don’t know whether they are going to be asked about something else or not. Albeit the research questions are all drawn from a common pool, and are repeated often enough to allow relevant information to be acquired regarding what affects their mood, they are not repeated enough to be predictable.
Another issue I had to tackle is ecological validity. Much of what makes the experience sampling method valuable is that it is supposed to sample representative moments of people’s day-to-day lives. This assumption does not hold if important variables affect the user’s response rate — for example, if the user only answers when they are feeling good, or when they are bored. That is much less of a problem with smartphones than it was back in the times of notebooks and digital wristwatches, as apps can be more seamlessly integrated into people’s day-to-day life. But in order to properly address that issue, I also track metadata, like the amount of time elapsed between a notification and when the user opens the app, the time interval between reports, etc.
By designing the app thus, I hope to measure well-being in a satisfactorily valid way. Let me know if I should address any other issue.
Most of the data that the app collects is what people explicitly report through the surveys. But people can also choose to share their location and Apple Health data as well. Location data allows well-being to be correlated with aspects of people’s surrounding environment, such as whether they are indoors or outdoors, whether they are in a suburb, urban area, or city, as well as weather data, such as amount of sunshine, the length of the days, the temperature, humidity, etc. AppleHealth, in turns, allows access to data about sleep patterns, exercise habits, and heart rate, and more. The app automatically makes charts based on some of those variables, if they are available.
3. Future Directions
As of today, SmartMood is only available in English, and for iOS 11+ devices. Needless to say, given the project’s ambitions, I plan on making it available for a wider array of people, which means developing an Android app and translating the text, as well as making user data accessible through the web. Those efforts are currently underway.
I plan on making SmartMood usable as a research framework for psychologists and psychiatrists. The way it would work is that researchers, within their normal SmartMood account, would create a research project, specify the variables that would be tracked (which could either be default ones tracked by SmartMood, or variables they themselves create), write a consent form, and receive a code through which study subjects would be able to enroll.
Researchers would have a dashboard with statistics and graphs from real-time subject data and be able to download the raw data anytime in a variety of formats. The data would be stored in my usual server, unless requested otherwise.
Currently I lack experience and knowledge in the machine learning field; however, that is a skill I will need to acquire in order to fulfill my aims, and I plan to do so. The knowledge yielded by machine learning will be able to improve user experience by learning how to predict mood-states in advance, and perhaps even giving conservative, data-backed suggestions on how to alleviate low mood.
At the moment, the data processing SmartMood does is limited to calculating a few correlations and effect sizes and presenting them to the user as a complement to graphs. However, in the future, I plan on applying more sophisticated statistical techniques and machine learning, both to enhance the user experience and for research purposes.
The research project will be rendered fairly pointless if no one uses the app, so I’ve put some thought on how to acquire users, and will certainly think much more about it in the future.
Relevantly, Michael Plant spent two years attempting to launch an app very similar in intention to SmartMood, but failed, and documented his experience in a blog/forum post (which, amusingly enough, I originally stumbled upon and read on the day after I started programming my app. It wasn’t enough to stop me.) He attributes the failure partially to lack of user interest:
“The main problem with Hippo [the name of his app] as a happiness tracker was that, fundamentally, it wasn’t that useful to people. Sounds obvious now, but took us ages to work out that was the problem. People seemed to like the idea, but didn’t find it useful and, where they said they found it useful, still didn’t stick with it.”
I think he was too quick to reach that conclusion, however, and I think significant progress in user acquisition can be made by giving the problem thorough consideration. A quick search for “mood tracking” in the AppStore and Google Play Store yields several results, some of which paid (and, honestly, none of which are as cool as my app). It is certainly the case that there is public demand for apps of this kind.
Moreover, as I have discussed above, there has already been valuable research published with data from tens of thousands of users of mood-tracking apps, so we know that that is doable. And it could easily have been made even more extensive by making the apps available to different countries, languages, and devices.
Perhaps it is merely the case that those more popular apps invested a great deal on stellar marketing, UI/UX design, and attention to the target consumer’s needs. Unfortunately, I myself still have rather weak knowledge of those areas, but I plan to learn more, and I will likely keep on researching and improving my methods even after I release the app to the AppStore.
As a head-start in user acquisition, I will probably run Facebook ads in developing countries; there the cost per app install can get lower than $0.50 (although it has been rising each year). Allocating $100/month, which I can do, would allow me to acquire half of the sample size of Matt Killingsworth’s research paper per year (2,400 users) — without counting organic growth! — which sounds pretty encouraging to me.
Furthermore, I recognize that user retention is not something that should be expected to happen naturally without any effort of my part either. Successful apps seems to put a great deal of work on it, through methods like sending users personalized emails and notifications, and reminding them of what they’ve posted in the past.
4. Closing remarks
I am very open to suggestions. If there is anything you disagree with and think I should do better, feel free to contact me. I have intentionally made the app flexible enough to easily adapt to questions being added, removed, or edited, without valuable data being lost and without users having to update their software.
A few people have referred to SmartMood as a startup, but it’s just a personal project of mine. I considered incorporating once, but later decided that it made more sense not to do so. It will be 100% free, generate no revenue, and be something I work on in my spare time.
I hope that the app will not only be useful from a research standpoint, but that it will bring real-time value to its users as well. People are pretty bad at knowing how different circumstances impact their mood. Moreover, as I argue in Life can be better than you think, people often have no idea about how their lives could be improved and assume that much more about their lives is fixed than is actually warranted. My hope is that, with minimal time investment, people will be able to uncover hidden relationships between their happiness and their habits that had never even crossed their minds before, adjust their lives accordingly, and become happier for doing so. I quite literally jump in excitement when I imagine that I could help someone’s life in such a way.
I will be very glad if it generates something valuable to users, and even more if I succeed at the task of facilitating cause prioritization research. I do not deem it astonishingly unlikely that I will be able to accomplish those things, but even if those efforts fail, that’s okay to me, and I will make sure to transparently document the project — like Michael Plant did — in order to let others learn from it.
References and further reading
Bryson, A., & MacKerron, G. (2016). Are you happy while you work?. The Economic Journal, 127(599), 106-125.
Csikszentmihalyi, M., & Larson, R. (2014). Validity and reliability of the experience-sampling method. In Flow and the foundations of positive psychology (pp. 35-54). Springer, Dordrecht.
Devlin, N. J., & Lorgelly, P. K. (2017). QALYs as a measure of value in cancer. Journal of Cancer Policy, 11, 19-25.
Dolan, P., & Kahneman, D. (2008). Interpretations of utility and their implications for the valuation of health. The economic journal, 118(525), 215-234.
Dolan P. (2009). NICE should value real experiences over hypothetical opinions. Nature;462:35.
Dolan, P., & Metcalfe, R. (2012). Valuing health: a brief report on subjective well-being versus preferences. Medical decision making, 32(4), 578-582.
Grace, K. (2014, September 5). Cause overview: cause prioritization. [Blog post]. Retrieved from https://80000hours.org/2014/09/cause-prioritization-summary/
Grace, K. (2017). Cause prioritization research. 80000 hours.
Helliwell, J., Layard, R., & Sachs, J. (2013). World Happiness Report 2013. United Nations Sustainable Development Solutions Network.
Killingsworth, M. A., & Gilbert, D. T. (2010). A wandering mind is an unhappy mind. Science, 330(6006), 932-932.
Kimhy, D., Delespaul, P., Corcoran, C., Ahn, H., Yale, S., & Malaspina, D. (2006). Computerized experience sampling method (ESMc): assessing feasibility and validity among individuals with schizophrenia. Journal of psychiatric research, 40(3), 221-230.
Kahneman, D., & Riis, J. (2005). Living, and thinking about it: Two perspectives on life. The science of well-being, 1.
Kahneman, D., & Sugden, R. (2005). Experienced utility as a standard of policy evaluation. Environmental and resource economics, 32(1), 161-181.
Kahneman, D., Krueger, A. B., Schkade, D., Schwarz, N., & Stone, A. A. (2006). Would you be happier if you were richer? A focusing illusion. Science, 312(5782), 1908-1910.
Kahneman, D., & Krueger, A. B. (2006). Developments in the measurement of subjective well-being. Journal of Economic perspectives, 20(1), 3-24.
Klarman, H. E., & Rosenthal, G. D. (1968). Cost effectiveness analysis applied to the treatment of chronic renal disease. Medical care, 6(1), 48-54.
Larson, R., & Csikszentmihalyi, M. (1983). The Experience Sampling Method. New Directions for Methodology of Social & Behavioral Science, 15, 41-56.
MacAskill, W. (2015). Doing good better: Effective altruism and a radical new way to make a difference. Guardian Faber Publishing.
Organisation for Economic Co-operation and Development (OECD). (2013). OECD guidelines on measuring subjective well-being.
Scollon, C. N., Prieto, C. K., & Diener, E. (2003). Experience sampling: promises and pitfalls, strength and weaknesses. In Journal of Happiness Studies, 4(1), 5-34.
Simler, K., & Hanson, R. (2017). The Elephant in the Brain: Hidden Motives in Everyday Life. Oxford University Press.
Sudhof, M., Goméz Emilsson, A., Maas, A. L., & Potts, C. (2014, August). Sentiment expression conditioned by affective transitions and social forces. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1136-1145). ACM.
Urry, H. L., Nitschke, J. B., Dolski, I., Jackson, D. C., Dalton, K. M., Mueller, C. J., … & Davidson, R. J. (2004). Making a life worth living: Neural correlates of well-being. Psychological science, 15(6), 367-372.
Van Praag, B. M. (1991). Ordinal and cardinal utility: an integration of the two dimensions of the welfare concept. Journal of econometrics, 50(1-2), 69-89.
Van Reenen, M. & Janssen, B. (2015). EQ-5D-5L User Guide: Basic information on how to use the EQ-5D-5L instrument. (Version 2.1). EQ-5D.
Wilson, T. D., & Gilbert, D. T. (2005). Affective forecasting: Knowing what to want. Current Directions in Psychological Science, 14(3), 131-134.