I’m surprised you object so strenuously to taking the log of GNI/Capita or GDP/capita. This to me seems like a very natural thing to do, given the diminishing marginal utility of money, and the nature of the distribution of national incomes. If you don’t take logs (or similar) your model will presumably be quite insensitive to the differences between really poor and really really poor. Indeed, I note you yourself use log GDP as the independent variable in a chart on the very next page (p17).
The chart you’re referring to, for those who aren’t looking at the paper, is not a chart of life satisfaction—it is a chart of carbon emissions. So pretty fundamentally, I don’t think the comparison is relevant. But in addition, the outcome (emissions) is also log-transformed, which makes the relationship much more intuitive—and since both axes are clearly labeled, and contain all relevant information, I don’t see any possibility of confusion.
By contrast, the use of log GDP in models of satisfaction appears to only have a potential for confusion, especially when the WHR only mentions this transformation in the technical tables, fails to mention it in their most prominent descriptions, and only transforms this single variable in that way. Furthermore, as I describe in quite substantial detail in the paper, the argument they give to justify this transformation (improved model fit) simply doesn’t hold up to statistical scrutiny when you’re using a properly specified model. This isn’t just a marginal technical complaint either—the transformation exaggerates the effect by 1100%
I agree that there may be non-linearities in the effect, but this is equally possible with any of the other variables. This is worth exploring, but I think it is much more important to make sure the dramatic potential improvements from simple changes are recognized before getting into more subtle, and far more slippery, models, especially when the current model isn’t showing serious issues.
But most critically, the whole goal is to not just assume that we know how life satisfaction works, but rather to let the data tell us. And when the data simply doesn’t support a log transform, I’m not going to include it.
Here’s the OWID charts for life satisfaction vs. GDP/capita. First linear (per the dovecote model):
Now with a log transform to GDP/capita (per the MHR):
I think it is visually clear the empirical relationship is better modelled as log-linear rather than linear. Compared to this, I don’t think the regression diagnostics suggesting non-inferiority of linear GDP (in the context of model selected from thousands of variables, at least some of which could log-linearly proxy for GDP, cf. Dan_Key’s comment) count for much.
Besides the impact of GDP (2.5% versus 40%), I’d expect which other variables end up being selected also to be sensitive to this analysis choice. Unfortunately, as it is the wrong one, I’d expect (quasi-)omitted variable bias to distort both which variables are included, and their relative contributions, in the dovecote model.
There’s so much to respond to that I’m going to split this up into three parts. Hopefully this will clarify things.
(Preface) On the innate problem of the logarithmic transform
Something that’s getting persistently and rather confusing lost is the fact that the log transform has inherent and enormous problems in interpretation, when using it in a linear model, alongside non-transformed variables. A regular linear parameter is describing an additive effect — a log-transformed parameter is describing a multiplicative effect. This means that reporting the estimate for the transformed variable right next to the estimate of an untransformed variable, without emphasizing the difference, is much worse than comparing apples and oranges. It’s more like selling apples, and *calling* them oranges. But this is how the WHR reports its results. In the 2022 WHR, the plots of “contributions” are not marked as LOG gdp; they’re just marked as gdp, and they are placed alongside linear parameters as though they are directly comparable. They are not. I consider this, alone, a “critical failure,” and is one of the central reasons I title my paper that way.
Another way to put this is that log gdp showing up as 40% of model variation is not because that model actually thinks gdp is hugely important — it is a *numerical illusion*, based on innately incomparable quantities. And yet, “GDP” (NOT log gdp) is put at the very head of the 2023 report, as THE variable that matters. The model simply does not support this.
Of course you can do a more responsible job of presentation, but it’s unavoidable that you’ve made your job harder for yourself. In ranking priorities, the goal is to rank things of equal nature, and the log creates a single element that is of an inherently different nature.
But all of this is preamble to the question you raise. My point is, given all the trouble it can cause, there’s got to be a VERY good reason to use the log. So — is there?
On the inappropriateness of the log transform in this context
Unfortunately, your graphs are a fundamentally incomplete, and ultimately misleading, view of the system, because satisfaction and GDP are simply not the only two variables involved. I might even say that accounting for the other variables is the point of the whole exercise. But the important part is that in this particular case, they makes all the difference.
The approach you suggest is misleading because 1. it doesn’t take all the relevant variables into account, and 2. it assumes a strong form for GDP based on a critically limited view of the data. But it’s easy to solve both of these problems. This can be done by fitting a Generalized Additive Model (GAM), with the full set of conditioning variables, and a non-parametric spline on the GNI term. This will give it the freedom to be log-shaped if the data supports that—but also any other shape, if that is a better fit. So I fit this model, I take the shape formed by the non-parametric component, and I fit two lines to it: a straight one, and a logarithmic one. They look like this:
(The degree of the spline, or the extent of the waviness, can of course be controlled manually, but this is the degree chosen algorithmically by the model. To force it to curve less will of course bias things even more in the direction of the linear model, because it would be forcing the fit to be simpler.)
Because I don’t want to make definitive mathematical judgments just by looking at things, I assess the models statistically — and the linear one just fits better. R^2 = 0.92, vs. 0.85. They’re close, sure, which I think is entirely consistent with a visual assessment — but log certainly doesn’t dominate, and in fact, is numerically worse, again, like in the first model. So the log is hard to interpret — this part is unavoidable — and empirically, *it’s just numerically worse.* To me, that’s open and shut. I see no reason, whatsoever, to use it, unless you’re actually *trying* to inflate the effect of GDP by 1100%.
Now — if you still look at this and, for whatever reason, absolutely refuse to accept any model without a non-linear effect — then fine, we can still make one of those, and we can also do it without making the strong, unnecessary, and practically-incomparable assumptions about the form of the relationship.
Since you’re inherently making the claim that there are fundamental differences in the dynamics of GDP in low-GDP countries vs high-GDP countries, then let’s just look at those countries separately! Looking at the raw plot of satisfaction ~ gdppc (untransformed), we can see a clear elbow at around $5,000. If your eyeball says $10k, that’s fine too, I tried both. But that’s clearly the approximate point where the effects of gdp *seem* to change — based on exactly the kind of visual analysis you’re using. To sketch the basic idea, you might imagine breaking the mass of data down into the following two approximate linear trends:
So then we fit models to each of these subsets of data, in which we now have some visible reason to believe that the sat ~ gdp/ni relationship actually *is* linear — and you get a contribution of 1.2% of model variation for the poor side, and 5.8% for the rich side. Which average out to 3.5%, where my pooled model estimated 2.5%. Yes the slope is steeper on the poor side — about twice and a half times as steep — but in neither case does GDP suddenly come anywhere near to looking like the dominant force, or the only way to salvation. Which again, just shouldn’t even be a surprise, because the *apparent* 40% contribution in the WHR model — and I really cannot stress this enough — *is a numerical illusion.*
Want three tranches? By all means. It’s the same. The log is profoundly misleading, utterly unnecessary, and quite simply not the right choice for this data.
On your assertion that my search space is “wrong”
To clear up what appears to be a surprisingly confident misconception, I do have log gdp available for the search algorithm. It does not make the final cut, because in addition to being of an intrinsically different nature than the rest of the variables, it simply does not produce a better model fit, statistically, whether you choose to believe this or not. But the other variables chosen are not influenced by my “wrong” search space, because that is not actually the search space I use. With my apologies, I’m really not sure why you think otherwise.
To be precise, the search with log gdp as a candidate still returns cities, and two variables on education, and public financing, and unemployment, and shadow government, and power for women (though labor market instead of political), and clean water, and gay and lesbian acceptance — along with lifespan, social support, freedom, and GNI, which is still selected 100% of the time, *even though log gdp is also in the model.* Though even *before* it’s removed, log gdp still comes out as less important than social support, once you actually properly specify the model.
Though I’m also not entirely sure why this is even relevant. I find a better model — period. It fits better, it predicts better, there are no serious diagnostic issues, and it doesn’t include log gdp. If you can find a better model than that, have at it — but if you can’t, I’m not sure how bias in the search process would make a difference. It is a simple constructive proof, in which the product of the search just performs better than what’s being currently presented as the international standard.
Thank you for the effort you went to in producing those graphs — and, sincerely, in pushing me to a more thorough defense of my decisions — I hope this helps you see why your approach is not a definitive analysis.
I’m surprised you object so strenuously to taking the log of GNI/Capita or GDP/capita. This to me seems like a very natural thing to do, given the diminishing marginal utility of money, and the nature of the distribution of national incomes. If you don’t take logs (or similar) your model will presumably be quite insensitive to the differences between really poor and really really poor. Indeed, I note you yourself use log GDP as the independent variable in a chart on the very next page (p17).
The chart you’re referring to, for those who aren’t looking at the paper, is not a chart of life satisfaction—it is a chart of carbon emissions. So pretty fundamentally, I don’t think the comparison is relevant. But in addition, the outcome (emissions) is also log-transformed, which makes the relationship much more intuitive—and since both axes are clearly labeled, and contain all relevant information, I don’t see any possibility of confusion.
By contrast, the use of log GDP in models of satisfaction appears to only have a potential for confusion, especially when the WHR only mentions this transformation in the technical tables, fails to mention it in their most prominent descriptions, and only transforms this single variable in that way. Furthermore, as I describe in quite substantial detail in the paper, the argument they give to justify this transformation (improved model fit) simply doesn’t hold up to statistical scrutiny when you’re using a properly specified model. This isn’t just a marginal technical complaint either—the transformation exaggerates the effect by 1100%
I agree that there may be non-linearities in the effect, but this is equally possible with any of the other variables. This is worth exploring, but I think it is much more important to make sure the dramatic potential improvements from simple changes are recognized before getting into more subtle, and far more slippery, models, especially when the current model isn’t showing serious issues.
But most critically, the whole goal is to not just assume that we know how life satisfaction works, but rather to let the data tell us. And when the data simply doesn’t support a log transform, I’m not going to include it.
Here’s the OWID charts for life satisfaction vs. GDP/capita. First linear (per the dovecote model):
Now with a log transform to GDP/capita (per the MHR):
I think it is visually clear the empirical relationship is better modelled as log-linear rather than linear. Compared to this, I don’t think the regression diagnostics suggesting non-inferiority of linear GDP (in the context of model selected from thousands of variables, at least some of which could log-linearly proxy for GDP, cf. Dan_Key’s comment) count for much.
Besides the impact of GDP (2.5% versus 40%), I’d expect which other variables end up being selected also to be sensitive to this analysis choice. Unfortunately, as it is the wrong one, I’d expect (quasi-)omitted variable bias to distort both which variables are included, and their relative contributions, in the dovecote model.
Hi Gregory --
There’s so much to respond to that I’m going to split this up into three parts. Hopefully this will clarify things.
(Preface) On the innate problem of the logarithmic transform
Something that’s getting persistently and rather confusing lost is the fact that the log transform has inherent and enormous problems in interpretation, when using it in a linear model, alongside non-transformed variables. A regular linear parameter is describing an additive effect — a log-transformed parameter is describing a multiplicative effect. This means that reporting the estimate for the transformed variable right next to the estimate of an untransformed variable, without emphasizing the difference, is much worse than comparing apples and oranges. It’s more like selling apples, and *calling* them oranges. But this is how the WHR reports its results. In the 2022 WHR, the plots of “contributions” are not marked as LOG gdp; they’re just marked as gdp, and they are placed alongside linear parameters as though they are directly comparable. They are not. I consider this, alone, a “critical failure,” and is one of the central reasons I title my paper that way.
Another way to put this is that log gdp showing up as 40% of model variation is not because that model actually thinks gdp is hugely important — it is a *numerical illusion*, based on innately incomparable quantities. And yet, “GDP” (NOT log gdp) is put at the very head of the 2023 report, as THE variable that matters. The model simply does not support this.
Of course you can do a more responsible job of presentation, but it’s unavoidable that you’ve made your job harder for yourself. In ranking priorities, the goal is to rank things of equal nature, and the log creates a single element that is of an inherently different nature.
But all of this is preamble to the question you raise. My point is, given all the trouble it can cause, there’s got to be a VERY good reason to use the log. So — is there?
On the inappropriateness of the log transform in this context
Unfortunately, your graphs are a fundamentally incomplete, and ultimately misleading, view of the system, because satisfaction and GDP are simply not the only two variables involved. I might even say that accounting for the other variables is the point of the whole exercise. But the important part is that in this particular case, they makes all the difference.
The approach you suggest is misleading because 1. it doesn’t take all the relevant variables into account, and 2. it assumes a strong form for GDP based on a critically limited view of the data. But it’s easy to solve both of these problems. This can be done by fitting a Generalized Additive Model (GAM), with the full set of conditioning variables, and a non-parametric spline on the GNI term. This will give it the freedom to be log-shaped if the data supports that—but also any other shape, if that is a better fit. So I fit this model, I take the shape formed by the non-parametric component, and I fit two lines to it: a straight one, and a logarithmic one. They look like this:
(The degree of the spline, or the extent of the waviness, can of course be controlled manually, but this is the degree chosen algorithmically by the model. To force it to curve less will of course bias things even more in the direction of the linear model, because it would be forcing the fit to be simpler.)
Because I don’t want to make definitive mathematical judgments just by looking at things, I assess the models statistically — and the linear one just fits better. R^2 = 0.92, vs. 0.85. They’re close, sure, which I think is entirely consistent with a visual assessment — but log certainly doesn’t dominate, and in fact, is numerically worse, again, like in the first model. So the log is hard to interpret — this part is unavoidable — and empirically, *it’s just numerically worse.* To me, that’s open and shut. I see no reason, whatsoever, to use it, unless you’re actually *trying* to inflate the effect of GDP by 1100%.
Now — if you still look at this and, for whatever reason, absolutely refuse to accept any model without a non-linear effect — then fine, we can still make one of those, and we can also do it without making the strong, unnecessary, and practically-incomparable assumptions about the form of the relationship.
Since you’re inherently making the claim that there are fundamental differences in the dynamics of GDP in low-GDP countries vs high-GDP countries, then let’s just look at those countries separately! Looking at the raw plot of satisfaction ~ gdppc (untransformed), we can see a clear elbow at around $5,000. If your eyeball says $10k, that’s fine too, I tried both. But that’s clearly the approximate point where the effects of gdp *seem* to change — based on exactly the kind of visual analysis you’re using. To sketch the basic idea, you might imagine breaking the mass of data down into the following two approximate linear trends:
So then we fit models to each of these subsets of data, in which we now have some visible reason to believe that the sat ~ gdp/ni relationship actually *is* linear — and you get a contribution of 1.2% of model variation for the poor side, and 5.8% for the rich side. Which average out to 3.5%, where my pooled model estimated 2.5%. Yes the slope is steeper on the poor side — about twice and a half times as steep — but in neither case does GDP suddenly come anywhere near to looking like the dominant force, or the only way to salvation. Which again, just shouldn’t even be a surprise, because the *apparent* 40% contribution in the WHR model — and I really cannot stress this enough — *is a numerical illusion.*
Want three tranches? By all means. It’s the same. The log is profoundly misleading, utterly unnecessary, and quite simply not the right choice for this data.
On your assertion that my search space is “wrong”
To clear up what appears to be a surprisingly confident misconception, I do have log gdp available for the search algorithm. It does not make the final cut, because in addition to being of an intrinsically different nature than the rest of the variables, it simply does not produce a better model fit, statistically, whether you choose to believe this or not. But the other variables chosen are not influenced by my “wrong” search space, because that is not actually the search space I use. With my apologies, I’m really not sure why you think otherwise.
To be precise, the search with log gdp as a candidate still returns cities, and two variables on education, and public financing, and unemployment, and shadow government, and power for women (though labor market instead of political), and clean water, and gay and lesbian acceptance — along with lifespan, social support, freedom, and GNI, which is still selected 100% of the time, *even though log gdp is also in the model.* Though even *before* it’s removed, log gdp still comes out as less important than social support, once you actually properly specify the model.
Though I’m also not entirely sure why this is even relevant. I find a better model — period. It fits better, it predicts better, there are no serious diagnostic issues, and it doesn’t include log gdp. If you can find a better model than that, have at it — but if you can’t, I’m not sure how bias in the search process would make a difference. It is a simple constructive proof, in which the product of the search just performs better than what’s being currently presented as the international standard.
Thank you for the effort you went to in producing those graphs — and, sincerely, in pushing me to a more thorough defense of my decisions — I hope this helps you see why your approach is not a definitive analysis.