This is very interesting analysis. Just in case you’re not already aware, EA has a Happier Lives Institute that’s specifically focused on research questions involving happiness and Oxford University has the Wellbeing Research Centre. I know Andrew Oswald’s been researching socioeconomic factors behind national happiness measures for at least a couple of decades.
By way of critique, I agree education is a particularly surprising omission from the Gallup survey (particularly when they chose to include a measure of individual generosity) and suspect you’re right that they over-weight GDP per capita (and GNI is more appropriate), though I don’t think they claimed their factors were definitive. It appears your variables actually overlap quite a bit with theirs (e.g. the binary question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” to a significant extent encompasses LGBT rights and female empowerment and to a lesser extent general political freedoms), but there’s definitely benefit to disaggregating GDP/GNI and its positive side effects like being able to afford clean water, safe cooking etc. and some very logical additions like unemployment. I’m also curious whether your “social support” variable was similar to Gallup’s (e.g. answers to a question about personal networks specifically) or whether it encompassed other stuff (e.g. welfare state)?
The challenge with throwing up to 926 noisy and often highly-correlated variables at a relatively small panel data set is that some of the improvements in fit are likely to be spurious, particularly when it leads to odd conclusions like only HIV infections in one particular demographic affecting overall wellbeing. I’m not sure the best performing variables in statistical testing are actually those with the strongest causal relationship.
I’d be particularly cautious of drawing policy conclusions like introducing public campaign financing could be a cheap happiness win from it because I’m not sure there’s any causal relationship with that one at all. I thought it might be a simple case of most countries scoring zero and a small number of particularly well-functioning democracies having positive scores (which I’d still say was likely to be correlation not causation) but actually looking at the original data for the latest year it’s a continuous series of weighted scores that.… make very little sense (this surprised me since V-DEM is a meticulously researched and respected dataset, but they don’t use this index in any of their composite indices...). Sure, there are a few well-functioning democracies at the top and an absolute monarchy at the bottom, but in between it’s just bizarre: Switzerland has one of the worst scores, North Korea is in the middle (I suppose the Dear Leader does depend on state funding for his political power...), Cuba does pretty well compared with the UK etc. Why this very noisy series fits the regression better than dozens of other presumably more representative ones I don’t know, but I don’t think it’s anything to do with political races in North Korea being fairer!
(Apologies in advance if you end up seeing two versions of this!)
Hi David, thank you for digging in so deep! Let me respond with the same amount of effort.
First, I was not aware of the EA happiness institute (great!), but I do know about the Oxford centre, because it’s run by one of the authors of the 2023 WHR. I haven’t approached them yet because I wanted to kick the tires as hard as possible on the research first, but I welcome critiques of that strategy.
Re: the report, and its claims: (and please interpret all of my frustration as being very much at the report itself, not you!) While they do not *say* their model is definitive, they unambiguously *act* like it is. For god’s sake, they’re publishing something they call “The World Happiness Report,” and they’ve been doing so for a decade, with exactly the same model. As a researcher I find this particularly galling, because it just feels disingenuous. The data I use has been available for every year of the report’s publication, and yet the 2023 report starts with a table of contents, a brief introduction — and a full-page, full-color, adorable heart-filled cartoon that says GDP is a central factor in satisfaction! Listed, measured, and implied, to be first. Whereas my measurably more accurate model finds that the real predictive power of GNI (not even GDP!) says it explains 2.5% of model variation, and is the *eleventh* most important factor in the model. Sure they have caveats but they’re very much in the fine print, and they don’t make any difference anyway if there’s no alternative provided.
Next: I agree there is an important overlap, both conceptual and literal, in the variables we use. In particular, for the variables on Feelings of Freedom, Healthy Life Expectancy, and Social Support, I use the WHR data itself (in part because FoF and Support are only *available* from them). And you are right that several of the variables could be considered as pieces of other variables. I try to highlight this in the table towards the end of the paper because I think it’s important — and also to show how many even high-level categories the WHR is missing.
But I am adamant that the real value of models like this is in their ability to *improve* satisfaction, not just describe it — and actionability requires more precision than “Health.” I deeply believe it’s only in the nuances that we can actually see the outlines of a solution. It can also dramatically change the interpretation of the very large but very vague variables. For example, “Social Support” on its own sounds like the moral is “just be nice to people.” However when you realize that the model also has huge contributions from Gay and Lesbian Social Acceptance, and Political Power for Women, the moral becomes “be nice to people … including minorities, not just your own in-group … and actually don’t just be nice to them, but meaningfully share real power.” This has a dramatically different narrative, different policies, and different interventions.
Re: causality specifically I address that much more detail in the paper, but since you bring up noisy data I want to emphasize that I put a *lot* of work into making sure that the chosen variables are not noise. I ran dozens of iterations (well, thousands in total!) that randomly omitted rows, and randomly omitted columns, to test the robustness of the results, and the variables I report are the one that are selected every. single. time. In fact, I have reason to believe that the breadth of the search and the strong filter for robustness makes me much *less* susceptible to spurious variables than the WHR. In these thousands of tests, there are two out of six WHR variables that I estimate around thirtieth most important to prediction, and so don’t belong anywhere near the top six like they claim. That’s a third of their reported model!
Last, I’m very impressed that you got into the finance data! You’ve inspired me to take a closer look. I would still defend its inclusion like this: it showed up in every iteration of the robustness test, it’s strongly significant in the model, statistics is about aggregates not about North Korea, and I use 1,964 observations over a period of 18 years. (Plus, North Korea is definitely not included in the Gallup World Poll, so it’s moot for these conclusions anyway.) The variable is also completely consistent with the substance of the other VDem variables: people need political power (women’s political power), they need that political power to be meaningful not decorative (no shadow government), and they need a way to *achieve* that political power — hence public financing for elections. For all of these reasons I think it belongs in the model. It’s true that researchers like Nicholas Carnes emphasize that a naive implementation of funding is unlikely to help on its own—but I would argue that none of the discovered variables will be successful if implemented naively.
Still like above, I think the real contribution comes from the fact that financing provides not just an outcome, but a path — if also a reminder that each specific action can only have a small impact. You may say you want Women’s Political Power, okay, great — but how? Significant public financing for campaigns is a concrete resource, with a concrete action, and a concrete objective. In other words, there’s actually an extremely clear causal mechanism for public financing, that’s much better than many of the other variables. And that’s something that “Healthy Life Expectancy” just doesn’t provide.
This is very interesting analysis. Just in case you’re not already aware, EA has a Happier Lives Institute that’s specifically focused on research questions involving happiness and Oxford University has the Wellbeing Research Centre. I know Andrew Oswald’s been researching socioeconomic factors behind national happiness measures for at least a couple of decades.
By way of critique, I agree education is a particularly surprising omission from the Gallup survey (particularly when they chose to include a measure of individual generosity) and suspect you’re right that they over-weight GDP per capita (and GNI is more appropriate), though I don’t think they claimed their factors were definitive. It appears your variables actually overlap quite a bit with theirs (e.g. the binary question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” to a significant extent encompasses LGBT rights and female empowerment and to a lesser extent general political freedoms), but there’s definitely benefit to disaggregating GDP/GNI and its positive side effects like being able to afford clean water, safe cooking etc. and some very logical additions like unemployment. I’m also curious whether your “social support” variable was similar to Gallup’s (e.g. answers to a question about personal networks specifically) or whether it encompassed other stuff (e.g. welfare state)?
The challenge with throwing up to 926 noisy and often highly-correlated variables at a relatively small panel data set is that some of the improvements in fit are likely to be spurious, particularly when it leads to odd conclusions like only HIV infections in one particular demographic affecting overall wellbeing. I’m not sure the best performing variables in statistical testing are actually those with the strongest causal relationship.
I’d be particularly cautious of drawing policy conclusions like introducing public campaign financing could be a cheap happiness win from it because I’m not sure there’s any causal relationship with that one at all. I thought it might be a simple case of most countries scoring zero and a small number of particularly well-functioning democracies having positive scores (which I’d still say was likely to be correlation not causation) but actually looking at the original data for the latest year it’s a continuous series of weighted scores that.… make very little sense (this surprised me since V-DEM is a meticulously researched and respected dataset, but they don’t use this index in any of their composite indices...). Sure, there are a few well-functioning democracies at the top and an absolute monarchy at the bottom, but in between it’s just bizarre: Switzerland has one of the worst scores, North Korea is in the middle (I suppose the Dear Leader does depend on state funding for his political power...), Cuba does pretty well compared with the UK etc. Why this very noisy series fits the regression better than dozens of other presumably more representative ones I don’t know, but I don’t think it’s anything to do with political races in North Korea being fairer!
I just lost a very long response—I’m trying to comment just to see if this gets eaten too. Hopefully it’s just a moderation thing …
Response to David T (so I don’t forget, too)
(Apologies in advance if you end up seeing two versions of this!)
Hi David, thank you for digging in so deep! Let me respond with the same amount of effort.
First, I was not aware of the EA happiness institute (great!), but I do know about the Oxford centre, because it’s run by one of the authors of the 2023 WHR. I haven’t approached them yet because I wanted to kick the tires as hard as possible on the research first, but I welcome critiques of that strategy.
Re: the report, and its claims: (and please interpret all of my frustration as being very much at the report itself, not you!) While they do not *say* their model is definitive, they unambiguously *act* like it is. For god’s sake, they’re publishing something they call “The World Happiness Report,” and they’ve been doing so for a decade, with exactly the same model. As a researcher I find this particularly galling, because it just feels disingenuous. The data I use has been available for every year of the report’s publication, and yet the 2023 report starts with a table of contents, a brief introduction — and a full-page, full-color, adorable heart-filled cartoon that says GDP is a central factor in satisfaction! Listed, measured, and implied, to be first. Whereas my measurably more accurate model finds that the real predictive power of GNI (not even GDP!) says it explains 2.5% of model variation, and is the *eleventh* most important factor in the model. Sure they have caveats but they’re very much in the fine print, and they don’t make any difference anyway if there’s no alternative provided.
Next: I agree there is an important overlap, both conceptual and literal, in the variables we use. In particular, for the variables on Feelings of Freedom, Healthy Life Expectancy, and Social Support, I use the WHR data itself (in part because FoF and Support are only *available* from them). And you are right that several of the variables could be considered as pieces of other variables. I try to highlight this in the table towards the end of the paper because I think it’s important — and also to show how many even high-level categories the WHR is missing.
But I am adamant that the real value of models like this is in their ability to *improve* satisfaction, not just describe it — and actionability requires more precision than “Health.” I deeply believe it’s only in the nuances that we can actually see the outlines of a solution. It can also dramatically change the interpretation of the very large but very vague variables. For example, “Social Support” on its own sounds like the moral is “just be nice to people.” However when you realize that the model also has huge contributions from Gay and Lesbian Social Acceptance, and Political Power for Women, the moral becomes “be nice to people … including minorities, not just your own in-group … and actually don’t just be nice to them, but meaningfully share real power.” This has a dramatically different narrative, different policies, and different interventions.
Re: causality specifically I address that much more detail in the paper, but since you bring up noisy data I want to emphasize that I put a *lot* of work into making sure that the chosen variables are not noise. I ran dozens of iterations (well, thousands in total!) that randomly omitted rows, and randomly omitted columns, to test the robustness of the results, and the variables I report are the one that are selected every. single. time. In fact, I have reason to believe that the breadth of the search and the strong filter for robustness makes me much *less* susceptible to spurious variables than the WHR. In these thousands of tests, there are two out of six WHR variables that I estimate around thirtieth most important to prediction, and so don’t belong anywhere near the top six like they claim. That’s a third of their reported model!
Last, I’m very impressed that you got into the finance data! You’ve inspired me to take a closer look. I would still defend its inclusion like this: it showed up in every iteration of the robustness test, it’s strongly significant in the model, statistics is about aggregates not about North Korea, and I use 1,964 observations over a period of 18 years. (Plus, North Korea is definitely not included in the Gallup World Poll, so it’s moot for these conclusions anyway.) The variable is also completely consistent with the substance of the other VDem variables: people need political power (women’s political power), they need that political power to be meaningful not decorative (no shadow government), and they need a way to *achieve* that political power — hence public financing for elections. For all of these reasons I think it belongs in the model. It’s true that researchers like Nicholas Carnes emphasize that a naive implementation of funding is unlikely to help on its own—but I would argue that none of the discovered variables will be successful if implemented naively.
Still like above, I think the real contribution comes from the fact that financing provides not just an outcome, but a path — if also a reminder that each specific action can only have a small impact. You may say you want Women’s Political Power, okay, great — but how? Significant public financing for campaigns is a concrete resource, with a concrete action, and a concrete objective. In other words, there’s actually an extremely clear causal mechanism for public financing, that’s much better than many of the other variables. And that’s something that “Healthy Life Expectancy” just doesn’t provide.
And thank you again!