The phenomenon you describe as “rescaling” is generally known as a (violation of) measurement invariance in psychometrics. It is typically tested by observing whether the measurement model (i.e., the relationship between the unobservable psychological construct and the measured indicators of that construct) differ across groups (a comprehensive evaluation of different approaches is in Millsap, 2011).
I would interpret the tests of measurement invariance you use.…
If people are getting happier over time — but reporting it on a stretched or stricter scale — then the link between how happy someone says they are, and what they do when they’re unhappy, should weaken over time.
In other words: if life satisfaction is increasing, but the reporting scale is stretching, then big life decisions — like leaving a job or ending a relationship — should become less predictable from reported happiness
....to actually be measures of “prediction invariance”: which holds when a measure has the same regression coefficient with respect to an external criterion across different groups or time.
But as Borsboom (2006) points out, prediction invariance and measurement invariance might actually be in tension with each other under a wide range of situations. Here’s a relevant quotation:
In 1997 Millsap published an important paper in Psychological Methods on the relation between prediction invariance and measurement invariance. The paper showed that, under realistic conditions, prediction invariance does not support measurement invariance. In fact, prediction invariance is generally indicative of violations of measurement invariance: if two groups differ in their latent means, and a test has prediction invariance across the levels of the grouping variable, it must have measurement bias with regard to group membership. Conversely, when a test is measurement invariant, it will generally show differences in predictive regression parameters.
This is stretching my knowledge of the topic beyond its bounds, but this issue seems related to the general inconsistency between measurement invariance and selection invariance, which has been explored independently in psychometrics and machine learning (e.g., the chapters on facial recognition and recidivism in The Alignment Problem).
Thanks a lot for this. I hadn’t actually come across these terms; that’s super useful. I’ll have to read both these articles when I get a chance, will report back.
To synthesize a few of the comments on this post—This comment sounds like a general instance of the issue that @geoffrey points out in another comment: what @Charlie Harrison is describing as a violation of “prediction invariance” may just be a violation of “measurement invariance”; in particular because happiness (the real thing, not the measure) may have a different relationship with GMEOH events over time.
I basically agree with this critique of the results in the post, but want to add that I nonetheless think this is a very cool piece of research and I am excited to see more exploration along these lines!
One idea that I had—maybe someone has done something like this? -- is to ask people to watch a film or read a novel and rate the life satisfaction of the characters in the story. For instance, they might be asked to answer a question like “How much does Jane Eyre feel satisfied by her life, on a scale of 1-10?”. (Note that we aren’t asking how much the respondent empathizes with Jane or would enjoy being her, simply how much satisfaction they believe Jane gets from Jane’s life.) This might allow us to get a shared baseline for comparison. If people’s assessments of Jane’s life go up or down over time, (or differ between people) it seems unlikely that this is a result of a violation of “prediction invariance”, since Jane Eyre is an unchanging novel with fixed facts about how Jane feels. Instead, it seems like this would indicate a change in measurement: i.e. how people assign numerical scores to particular welfare states.
haha, yes, people have done this! This is called ‘vignette-adjustment’. You basically get people to read short stories and rate how happy they think the character is. There are a few potential issues with this method: (1) they aren’t included in long-term panel data; (2) people might interpret the character’s latent happiness differently based on their own happiness
Anchoring vignettes may also sometimes lack stability within persons. That said, it’s par for the course that any one source of evidence for invariance is going to have its strengths and weaknesses. We’ll always be looking for convergence across methods rather than a single cure-all.
The phenomenon you describe as “rescaling” is generally known as a (violation of) measurement invariance in psychometrics. It is typically tested by observing whether the measurement model (i.e., the relationship between the unobservable psychological construct and the measured indicators of that construct) differ across groups (a comprehensive evaluation of different approaches is in Millsap, 2011).
I would interpret the tests of measurement invariance you use.…
....to actually be measures of “prediction invariance”: which holds when a measure has the same regression coefficient with respect to an external criterion across different groups or time.
But as Borsboom (2006) points out, prediction invariance and measurement invariance might actually be in tension with each other under a wide range of situations. Here’s a relevant quotation:
This is stretching my knowledge of the topic beyond its bounds, but this issue seems related to the general inconsistency between measurement invariance and selection invariance, which has been explored independently in psychometrics and machine learning (e.g., the chapters on facial recognition and recidivism in The Alignment Problem).
Thanks a lot for this. I hadn’t actually come across these terms; that’s super useful. I’ll have to read both these articles when I get a chance, will report back.
To synthesize a few of the comments on this post—This comment sounds like a general instance of the issue that @geoffrey points out in another comment: what @Charlie Harrison is describing as a violation of “prediction invariance” may just be a violation of “measurement invariance”; in particular because happiness (the real thing, not the measure) may have a different relationship with GMEOH events over time.
I basically agree with this critique of the results in the post, but want to add that I nonetheless think this is a very cool piece of research and I am excited to see more exploration along these lines!
One idea that I had—maybe someone has done something like this? -- is to ask people to watch a film or read a novel and rate the life satisfaction of the characters in the story. For instance, they might be asked to answer a question like “How much does Jane Eyre feel satisfied by her life, on a scale of 1-10?”. (Note that we aren’t asking how much the respondent empathizes with Jane or would enjoy being her, simply how much satisfaction they believe Jane gets from Jane’s life.) This might allow us to get a shared baseline for comparison. If people’s assessments of Jane’s life go up or down over time, (or differ between people) it seems unlikely that this is a result of a violation of “prediction invariance”, since Jane Eyre is an unchanging novel with fixed facts about how Jane feels. Instead, it seems like this would indicate a change in measurement: i.e. how people assign numerical scores to particular welfare states.
haha, yes, people have done this! This is called ‘vignette-adjustment’. You basically get people to read short stories and rate how happy they think the character is. There are a few potential issues with this method: (1) they aren’t included in long-term panel data; (2) people might interpret the character’s latent happiness differently based on their own happiness
Oh, great, thanks so much! I’ll check this out.
Anchoring vignettes may also sometimes lack stability within persons. That said, it’s par for the course that any one source of evidence for invariance is going to have its strengths and weaknesses. We’ll always be looking for convergence across methods rather than a single cure-all.