Does your model without log(GNI per capita) basically just include a proxy for log(GNI per capita), by including other predictor variables that, in combination, are highly predictive of log(GNI per capita)?
With a pool of 1058 potential predictor variables, many of which have some relationship to economic development or material standards of living, it wouldn’t be surprising if you could build a model to predict log(GNI per capita) with a very good fit. If that is possible with this pool of variables, and if log(GNI per capita) is linearly predictive of life satisfaction, then if you build a model predicting life satisfaction which can’t include log(GNI per capita), it can instead account for that variance by including the variables that predict log(GNI per capita).
And if you transform log(GNI per capita) into a form whose relationship with life satisfaction is sufficiently non-linear, and build a model which can only account for the linear portion of the relationship between that transformed variable and life satisfaction, then within that linear model those proxy variables might do a much better job than transformed log(GNI per capita) of accounting for the variance in life satisfaction.
I think the first thing to emphasize is that, even when you do include log(g:dp/ni), the measured effect still isn’t all that big. It says that you’ll get an increase of 1.5 points satisfaction … if you almost triple gdp! (I.e. multiply it by 2.7, because that’s just mechanically what it means when you transform a linear predictor by the natural log.) Since that’s either ludicrous or impossible for many countries, there are plenty of cases where it doesn’t even make sense to consider. My largest problem has nothing to do with the non-linearities of the log—if it fit better, great! But 1) it just, simply, numerically, doesn’t and 2) the fact that you have to interpret the log in a fundamentally different way than all the non-transformed variables makes it extraordinarily misleading. You get a bigger bump on the graph—but it’s a bigger bump that means something fundamentally different than all of the other bumps (a multiplicative effect, not an additive one). Then when you include it in charts as if it doesn’t mean something different, you’re floating towards very nasty territory.
It is certainly the case that many of the variables are highly collinear, but there are clearly no obvious close proxies in the list. If I removed log(gdp) but introduced log(trading volume) or something, that would be suspicious—but you can see all 14 of the variables that are actually in the model. I would have to be approximating log(gdp) with—water and preschool? The 1,058 variables are searched over, yes—but then 1,044 of them are rejected, and simple don’t enter.
I’m sorry though, I just don’t understand your last paragraph. If the true effect needs a log, then the log should account for that effect. And if the effect is properly transformed, I don’t understand how a different variable would do a better job of accounting for the variance than the true variable. Happy to discuss if you can clarify though.
Getting 1.5 points by 2.7x’ing GDP actually sounds like a lot to me? It predicts that the United States should be 1.9 points ahead of China and China should be 2.0 points ahead of Kenya. It’s very hard to get a 1.9 point improvement in satisfaction by doing anything.
The point is not that 1.5 is a large number, in terms of single variables—it is—the point is that 2.7x is a ridiculous number.
But 1.5 also isn’t such a huge effect within the full scale of what’s measured. The maximum value in the data is just over 8. Even something “huge” like 1.5, out of a total of 8, is less than twenty percent. If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it’s far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they’d get a far larger return.
Can you specify what you mean with “2.7x is a ridiculous number”?
I ask because it does happen that economies grow like that in a fairly short amount of time. For example, since the year 2000:
China’s GDPpc 2.7x’d about 2.6 times
Vietnam’s did it ~2.4 times
Ethiopia’s ~2.1 times
India’s ~1.7 times
Rwanda’s ~1.3 times
The US’s GDPpc is on track to 2.7x from 2000 in about 2029, assuming a 4% annual increase
So I assume you don’t mean something like “2.7x never happens”. Do you mean something more like “it’s hard to find policies that produce 2.7x growth in a reasonable amount of time” or “typically it takes economies decades to 2.7x”?
I think I captured my intended meaning fairly well with my ending comment:
“If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it’s far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they’d get a far larger return.”
If you’re asking because you didn’t find that convincing though, I’m happy to elaborate.
Does your model without log(GNI per capita) basically just include a proxy for log(GNI per capita), by including other predictor variables that, in combination, are highly predictive of log(GNI per capita)?
With a pool of 1058 potential predictor variables, many of which have some relationship to economic development or material standards of living, it wouldn’t be surprising if you could build a model to predict log(GNI per capita) with a very good fit. If that is possible with this pool of variables, and if log(GNI per capita) is linearly predictive of life satisfaction, then if you build a model predicting life satisfaction which can’t include log(GNI per capita), it can instead account for that variance by including the variables that predict log(GNI per capita).
And if you transform log(GNI per capita) into a form whose relationship with life satisfaction is sufficiently non-linear, and build a model which can only account for the linear portion of the relationship between that transformed variable and life satisfaction, then within that linear model those proxy variables might do a much better job than transformed log(GNI per capita) of accounting for the variance in life satisfaction.
I think the first thing to emphasize is that, even when you do include log(g:dp/ni), the measured effect still isn’t all that big. It says that you’ll get an increase of 1.5 points satisfaction … if you almost triple gdp! (I.e. multiply it by 2.7, because that’s just mechanically what it means when you transform a linear predictor by the natural log.) Since that’s either ludicrous or impossible for many countries, there are plenty of cases where it doesn’t even make sense to consider. My largest problem has nothing to do with the non-linearities of the log—if it fit better, great! But 1) it just, simply, numerically, doesn’t and 2) the fact that you have to interpret the log in a fundamentally different way than all the non-transformed variables makes it extraordinarily misleading. You get a bigger bump on the graph—but it’s a bigger bump that means something fundamentally different than all of the other bumps (a multiplicative effect, not an additive one). Then when you include it in charts as if it doesn’t mean something different, you’re floating towards very nasty territory.
It is certainly the case that many of the variables are highly collinear, but there are clearly no obvious close proxies in the list. If I removed log(gdp) but introduced log(trading volume) or something, that would be suspicious—but you can see all 14 of the variables that are actually in the model. I would have to be approximating log(gdp) with—water and preschool? The 1,058 variables are searched over, yes—but then 1,044 of them are rejected, and simple don’t enter.
I’m sorry though, I just don’t understand your last paragraph. If the true effect needs a log, then the log should account for that effect. And if the effect is properly transformed, I don’t understand how a different variable would do a better job of accounting for the variance than the true variable. Happy to discuss if you can clarify though.
Getting 1.5 points by 2.7x’ing GDP actually sounds like a lot to me? It predicts that the United States should be 1.9 points ahead of China and China should be 2.0 points ahead of Kenya. It’s very hard to get a 1.9 point improvement in satisfaction by doing anything.
The point is not that 1.5 is a large number, in terms of single variables—it is—the point is that 2.7x is a ridiculous number.
But 1.5 also isn’t such a huge effect within the full scale of what’s measured. The maximum value in the data is just over 8. Even something “huge” like 1.5, out of a total of 8, is less than twenty percent. If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it’s far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they’d get a far larger return.
Can you specify what you mean with “2.7x is a ridiculous number”?
I ask because it does happen that economies grow like that in a fairly short amount of time. For example, since the year 2000:
China’s GDPpc 2.7x’d about 2.6 times
Vietnam’s did it ~2.4 times
Ethiopia’s ~2.1 times
India’s ~1.7 times
Rwanda’s ~1.3 times
The US’s GDPpc is on track to 2.7x from 2000 in about 2029, assuming a 4% annual increase
So I assume you don’t mean something like “2.7x never happens”. Do you mean something more like “it’s hard to find policies that produce 2.7x growth in a reasonable amount of time” or “typically it takes economies decades to 2.7x”?
I think I captured my intended meaning fairly well with my ending comment:
“If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it’s far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they’d get a far larger return.”
If you’re asking because you didn’t find that convincing though, I’m happy to elaborate.
2.7x isalmost exactly the amount world gdp per capita has changed in the last 30 years. Obviously some individual countries (e.g. China) have had bigger increases in that window.30 years isn’t that high in the grand scheme of things; it’s far smaller than most lifetimes.(EDIT: nvm this is false, the chart said “current dollars” which I thought meant inflation-adjusted, but it’s actually not inflation adjusted)