Hereâs the OWID charts for life satisfaction vs. GDP/âcapita. First linear (per the dovecote model):
Now with a log transform to GDP/âcapita (per the MHR):
I think it is visually clear the empirical relationship is better modelled as log-linear rather than linear. Compared to this, I donât think the regression diagnostics suggesting non-inferiority of linear GDP (in the context of model selected from thousands of variables, at least some of which could log-linearly proxy for GDP, cf. Dan_Keyâs comment) count for much.
Besides the impact of GDP (2.5% versus 40%), Iâd expect which other variables end up being selected also to be sensitive to this analysis choice. Unfortunately, as it is the wrong one, Iâd expect (quasi-)omitted variable bias to distort both which variables are included, and their relative contributions, in the dovecote model.
Thereâs so much to respond to that Iâm going to split this up into three parts. Hopefully this will clarify things.
(Preface) On the innate problem of the logarithmic transform
Something thatâs getting persistently and rather confusing lost is the fact that the log transform has inherent and enormous problems in interpretation, when using it in a linear model, alongside non-transformed variables. A regular linear parameter is describing an additive effect â a log-transformed parameter is describing a multiplicative effect. This means that reporting the estimate for the transformed variable right next to the estimate of an untransformed variable, without emphasizing the difference, is much worse than comparing apples and oranges. Itâs more like selling apples, and *calling* them oranges. But this is how the WHR reports its results. In the 2022 WHR, the plots of âcontributionsâ are not marked as LOG gdp; theyâre just marked as gdp, and they are placed alongside linear parameters as though they are directly comparable. They are not. I consider this, alone, a âcritical failure,â and is one of the central reasons I title my paper that way.
Another way to put this is that log gdp showing up as 40% of model variation is not because that model actually thinks gdp is hugely important â it is a *numerical illusion*, based on innately incomparable quantities. And yet, âGDPâ (NOT log gdp) is put at the very head of the 2023 report, as THE variable that matters. The model simply does not support this.
Of course you can do a more responsible job of presentation, but itâs unavoidable that youâve made your job harder for yourself. In ranking priorities, the goal is to rank things of equal nature, and the log creates a single element that is of an inherently different nature.
But all of this is preamble to the question you raise. My point is, given all the trouble it can cause, thereâs got to be a VERY good reason to use the log. So â is there?
On the inappropriateness of the log transform in this context
Unfortunately, your graphs are a fundamentally incomplete, and ultimately misleading, view of the system, because satisfaction and GDP are simply not the only two variables involved. I might even say that accounting for the other variables is the point of the whole exercise. But the important part is that in this particular case, they makes all the difference.
The approach you suggest is misleading because 1. it doesnât take all the relevant variables into account, and 2. it assumes a strong form for GDP based on a critically limited view of the data. But itâs easy to solve both of these problems. This can be done by fitting a Generalized Additive Model (GAM), with the full set of conditioning variables, and a non-parametric spline on the GNI term. This will give it the freedom to be log-shaped if the data supports thatâbut also any other shape, if that is a better fit. So I fit this model, I take the shape formed by the non-parametric component, and I fit two lines to it: a straight one, and a logarithmic one. They look like this:
(The degree of the spline, or the extent of the waviness, can of course be controlled manually, but this is the degree chosen algorithmically by the model. To force it to curve less will of course bias things even more in the direction of the linear model, because it would be forcing the fit to be simpler.)
Because I donât want to make definitive mathematical judgments just by looking at things, I assess the models statistically â and the linear one just fits better. R^2 = 0.92, vs. 0.85. Theyâre close, sure, which I think is entirely consistent with a visual assessment â but log certainly doesnât dominate, and in fact, is numerically worse, again, like in the first model. So the log is hard to interpret â this part is unavoidable â and empirically, *itâs just numerically worse.* To me, thatâs open and shut. I see no reason, whatsoever, to use it, unless youâre actually *trying* to inflate the effect of GDP by 1100%.
Now â if you still look at this and, for whatever reason, absolutely refuse to accept any model without a non-linear effect â then fine, we can still make one of those, and we can also do it without making the strong, unnecessary, and practically-incomparable assumptions about the form of the relationship.
Since youâre inherently making the claim that there are fundamental differences in the dynamics of GDP in low-GDP countries vs high-GDP countries, then letâs just look at those countries separately! Looking at the raw plot of satisfaction ~ gdppc (untransformed), we can see a clear elbow at around $5,000. If your eyeball says $10k, thatâs fine too, I tried both. But thatâs clearly the approximate point where the effects of gdp *seem* to change â based on exactly the kind of visual analysis youâre using. To sketch the basic idea, you might imagine breaking the mass of data down into the following two approximate linear trends:
So then we fit models to each of these subsets of data, in which we now have some visible reason to believe that the sat ~ gdp/âni relationship actually *is* linear â and you get a contribution of 1.2% of model variation for the poor side, and 5.8% for the rich side. Which average out to 3.5%, where my pooled model estimated 2.5%. Yes the slope is steeper on the poor side â about twice and a half times as steep â but in neither case does GDP suddenly come anywhere near to looking like the dominant force, or the only way to salvation. Which again, just shouldnât even be a surprise, because the *apparent* 40% contribution in the WHR model â and I really cannot stress this enough â *is a numerical illusion.*
Want three tranches? By all means. Itâs the same. The log is profoundly misleading, utterly unnecessary, and quite simply not the right choice for this data.
On your assertion that my search space is âwrongâ
To clear up what appears to be a surprisingly confident misconception, I do have log gdp available for the search algorithm. It does not make the final cut, because in addition to being of an intrinsically different nature than the rest of the variables, it simply does not produce a better model fit, statistically, whether you choose to believe this or not. But the other variables chosen are not influenced by my âwrongâ search space, because that is not actually the search space I use. With my apologies, Iâm really not sure why you think otherwise.
To be precise, the search with log gdp as a candidate still returns cities, and two variables on education, and public financing, and unemployment, and shadow government, and power for women (though labor market instead of political), and clean water, and gay and lesbian acceptance â along with lifespan, social support, freedom, and GNI, which is still selected 100% of the time, *even though log gdp is also in the model.* Though even *before* itâs removed, log gdp still comes out as less important than social support, once you actually properly specify the model.
Though Iâm also not entirely sure why this is even relevant. I find a better model â period. It fits better, it predicts better, there are no serious diagnostic issues, and it doesnât include log gdp. If you can find a better model than that, have at it â but if you canât, Iâm not sure how bias in the search process would make a difference. It is a simple constructive proof, in which the product of the search just performs better than whatâs being currently presented as the international standard.
Thank you for the effort you went to in producing those graphs â and, sincerely, in pushing me to a more thorough defense of my decisions â I hope this helps you see why your approach is not a definitive analysis.
Hereâs the OWID charts for life satisfaction vs. GDP/âcapita. First linear (per the dovecote model):
Now with a log transform to GDP/âcapita (per the MHR):
I think it is visually clear the empirical relationship is better modelled as log-linear rather than linear. Compared to this, I donât think the regression diagnostics suggesting non-inferiority of linear GDP (in the context of model selected from thousands of variables, at least some of which could log-linearly proxy for GDP, cf. Dan_Keyâs comment) count for much.
Besides the impact of GDP (2.5% versus 40%), Iâd expect which other variables end up being selected also to be sensitive to this analysis choice. Unfortunately, as it is the wrong one, Iâd expect (quasi-)omitted variable bias to distort both which variables are included, and their relative contributions, in the dovecote model.
Hi Gregory --
Thereâs so much to respond to that Iâm going to split this up into three parts. Hopefully this will clarify things.
(Preface) On the innate problem of the logarithmic transform
Something thatâs getting persistently and rather confusing lost is the fact that the log transform has inherent and enormous problems in interpretation, when using it in a linear model, alongside non-transformed variables. A regular linear parameter is describing an additive effect â a log-transformed parameter is describing a multiplicative effect. This means that reporting the estimate for the transformed variable right next to the estimate of an untransformed variable, without emphasizing the difference, is much worse than comparing apples and oranges. Itâs more like selling apples, and *calling* them oranges. But this is how the WHR reports its results. In the 2022 WHR, the plots of âcontributionsâ are not marked as LOG gdp; theyâre just marked as gdp, and they are placed alongside linear parameters as though they are directly comparable. They are not. I consider this, alone, a âcritical failure,â and is one of the central reasons I title my paper that way.
Another way to put this is that log gdp showing up as 40% of model variation is not because that model actually thinks gdp is hugely important â it is a *numerical illusion*, based on innately incomparable quantities. And yet, âGDPâ (NOT log gdp) is put at the very head of the 2023 report, as THE variable that matters. The model simply does not support this.
Of course you can do a more responsible job of presentation, but itâs unavoidable that youâve made your job harder for yourself. In ranking priorities, the goal is to rank things of equal nature, and the log creates a single element that is of an inherently different nature.
But all of this is preamble to the question you raise. My point is, given all the trouble it can cause, thereâs got to be a VERY good reason to use the log. So â is there?
On the inappropriateness of the log transform in this context
Unfortunately, your graphs are a fundamentally incomplete, and ultimately misleading, view of the system, because satisfaction and GDP are simply not the only two variables involved. I might even say that accounting for the other variables is the point of the whole exercise. But the important part is that in this particular case, they makes all the difference.
The approach you suggest is misleading because 1. it doesnât take all the relevant variables into account, and 2. it assumes a strong form for GDP based on a critically limited view of the data. But itâs easy to solve both of these problems. This can be done by fitting a Generalized Additive Model (GAM), with the full set of conditioning variables, and a non-parametric spline on the GNI term. This will give it the freedom to be log-shaped if the data supports thatâbut also any other shape, if that is a better fit. So I fit this model, I take the shape formed by the non-parametric component, and I fit two lines to it: a straight one, and a logarithmic one. They look like this:
(The degree of the spline, or the extent of the waviness, can of course be controlled manually, but this is the degree chosen algorithmically by the model. To force it to curve less will of course bias things even more in the direction of the linear model, because it would be forcing the fit to be simpler.)
Because I donât want to make definitive mathematical judgments just by looking at things, I assess the models statistically â and the linear one just fits better. R^2 = 0.92, vs. 0.85. Theyâre close, sure, which I think is entirely consistent with a visual assessment â but log certainly doesnât dominate, and in fact, is numerically worse, again, like in the first model. So the log is hard to interpret â this part is unavoidable â and empirically, *itâs just numerically worse.* To me, thatâs open and shut. I see no reason, whatsoever, to use it, unless youâre actually *trying* to inflate the effect of GDP by 1100%.
Now â if you still look at this and, for whatever reason, absolutely refuse to accept any model without a non-linear effect â then fine, we can still make one of those, and we can also do it without making the strong, unnecessary, and practically-incomparable assumptions about the form of the relationship.
Since youâre inherently making the claim that there are fundamental differences in the dynamics of GDP in low-GDP countries vs high-GDP countries, then letâs just look at those countries separately! Looking at the raw plot of satisfaction ~ gdppc (untransformed), we can see a clear elbow at around $5,000. If your eyeball says $10k, thatâs fine too, I tried both. But thatâs clearly the approximate point where the effects of gdp *seem* to change â based on exactly the kind of visual analysis youâre using. To sketch the basic idea, you might imagine breaking the mass of data down into the following two approximate linear trends:
So then we fit models to each of these subsets of data, in which we now have some visible reason to believe that the sat ~ gdp/âni relationship actually *is* linear â and you get a contribution of 1.2% of model variation for the poor side, and 5.8% for the rich side. Which average out to 3.5%, where my pooled model estimated 2.5%. Yes the slope is steeper on the poor side â about twice and a half times as steep â but in neither case does GDP suddenly come anywhere near to looking like the dominant force, or the only way to salvation. Which again, just shouldnât even be a surprise, because the *apparent* 40% contribution in the WHR model â and I really cannot stress this enough â *is a numerical illusion.*
Want three tranches? By all means. Itâs the same. The log is profoundly misleading, utterly unnecessary, and quite simply not the right choice for this data.
On your assertion that my search space is âwrongâ
To clear up what appears to be a surprisingly confident misconception, I do have log gdp available for the search algorithm. It does not make the final cut, because in addition to being of an intrinsically different nature than the rest of the variables, it simply does not produce a better model fit, statistically, whether you choose to believe this or not. But the other variables chosen are not influenced by my âwrongâ search space, because that is not actually the search space I use. With my apologies, Iâm really not sure why you think otherwise.
To be precise, the search with log gdp as a candidate still returns cities, and two variables on education, and public financing, and unemployment, and shadow government, and power for women (though labor market instead of political), and clean water, and gay and lesbian acceptance â along with lifespan, social support, freedom, and GNI, which is still selected 100% of the time, *even though log gdp is also in the model.* Though even *before* itâs removed, log gdp still comes out as less important than social support, once you actually properly specify the model.
Though Iâm also not entirely sure why this is even relevant. I find a better model â period. It fits better, it predicts better, there are no serious diagnostic issues, and it doesnât include log gdp. If you can find a better model than that, have at it â but if you canât, Iâm not sure how bias in the search process would make a difference. It is a simple constructive proof, in which the product of the search just performs better than whatâs being currently presented as the international standard.
Thank you for the effort you went to in producing those graphs â and, sincerely, in pushing me to a more thorough defense of my decisions â I hope this helps you see why your approach is not a definitive analysis.