The Comparability of Subjective Scales

This post summarises my new working paper, A Happy Possibility About Happiness (And Other Subjective) Scales: An Investigation and Tentative Defence of the Cardinality Thesis. It is a cross-post from the HLI website.

TL;DR Can we compare individuals’ self-reported data, or is (say) one person’s 510 the same as another person’s 8/​10? While this ‘the elephant in the room’ for happiness research, I was surprised to learn very little has been written about it; so little, in fact, I couldn’t even find a clear statement of what the problem is supposed to be. I explain what the problem is, propose a new theoretical justification (based on ‘Schelling points’) for why we should expect self-reported data to be comparable, suggest the (thin-ish) evidence base supports this theory, then set out directions for future work.

Below is 1,000 word summary of the 12,000 word paper.


​It is very common for people to give numerical ratings of their subjective experiences. For instance, people often score their happiness, life satisfaction, job satisfaction, health, pain, the movies they watch, and so on, on a 0-10 scale.

A long-standing worry about these self-reports is whether the numbers represent the same thing to different people at different times. For example, if two people say they are 510 happy, can we assume they are as happy as each other? More technically, the basic (but not only) question is whether subjective scales are cardinally comparable: does a one-point change, on a given scale, represent the same size change?

If the scales are merely ordinal—that is, the numbers represent a ranking but contain no information on the relative magnitudes of differences—we are in trouble: it would not be possible to use subjective scales to say what would increase overall happiness.

Opinions about the Cardinality Thesis—the idea that subjective scales are cardinal—seem to divide on disciplinary lines: economists are sceptical of it, psychologists less so (Ferrer‐i‐Carbonell and Frijters, 2004). Despite this division, and the fact this is a foundational methodological issue, it seems to have attracted little attention in the literature. Researchers will tend to assume the scales are ordinal or cardinal, and so use different statistical tests, without defending their decision (Kristoffersen, 2011). This lack of scrutiny is likely explained by a combination of two things. First, the topic is very applied for philosophy and very theoretical for social science, so falls into an interdisciplinary ‘no man’s land’. Second, researchers assume any differences in scale use will ‘wash out’ as noise in large samples anyway and so this problem can be ignored.

The result is that, at present, there is insufficient conceptual clarity around the Cardinality Thesis to know whether or not there is a problem or what could be done about it. Quoting Stone and Krueger (2018, p189):

In order to have more concrete ideas about the extent to which this may be a problem, we should have a better idea of why such differences [in scale interpretation] might exist in the first place, and have some theoretical justification for a concern with systematic differences in how subjective well-being questions are interpreted and answered.

Against this background, this paper makes four main contributions.

First, it proposes a novel theoretical explanation for how people interpret scales that draws on philosophy of language and game theory. In brief: conversation is a cooperative endeavour governed by various maxims (Grice, 1989). As subjective scales are vague and individuals want to be understood, scale interpretation should be understood as a search for a ‘focal point’ (or ‘Schelling point’), a default solution chosen in the absence of communication (Schelling, 1960). A specific focal point is proposed: when given an undefined scale with a finite number of options, individuals (unconsciously) use the end-points to refer to the realistic limits of that quantity, e.g. 1010 is maximum happiness anyone feels. Further, they interpret the scale as linear, so each point represents the same change in quantity. If this hypothesis is correct, self-reports will be cardinally comparable (given one further assumption, phenomenal cardinality, defined below).

Second, the paper then states four conditions that are individually necessary and jointly sufficient for cardinal comparability to hold on average. Stated roughly, these are:

  • C1: phenomenal cardinality (the underlying subjective state, e.g. happiness, is felt in units)

  • C2: linearity (each reported unit change represents the same change in magnitude)

  • C3: intertemporality (each individual uses the scale the same way over time and the scale end-points represent the real limits)

  • C4: interpersonality (different individuals use the scale the same way and the scale end-points represent the real limits).

What should we conclude about the cardinal comparability of subjective data if the conditions fail? It depends which condition(s) fails. The first condition is binary and fundamental: each subjective phenomenon is either felt in units or it is not. Of course, a non-cardinal phenomenon cannot be measured on a cardinal scale. However, all the other conditions can fail by degree and what’s important is by how much they deviate. By analogy, C1 is about whether we can have a measuring stick at all; C2 concerns whether the measuring sticks are bent, C3 is if the length of each of each stick changes over time, and C4 is whether different people have the same length sticks. It makes a difference if our measuring sticks are slightly bent or very crooked. The cardinality thesis could fail to be exactly true, but nevertheless be approximately true, such that it is unproblematic to treat it as true.

Third, the paper notes we can use evidence and reasoning to assess whether, and to what extent, the conditions hold, even though they concern subjective states. While it is always true that more than one hypothesis will fit the facts, that does not mean all hypotheses are equally likely. Here, as elsewhere, we rely on inference to the best explanation. It then goes on to examine each condition in turn, primarily drawing on the subjective well-being literature. In each case, there is evidence indicating the condition does hold and no strong evidence suggesting it does not. As such, the tentative conclusion of the paper is that subjective scales are best understood as cardinally comparable, unless and until other evidence suggests otherwise.

Fourth, it sets out some testable predictions of the theory and explains how such tests could be used to ‘correct’ the data if people do not intuitively interpret subjective scales in the way hypothesised.

The conclusion of this paper is therefore optimistic. Not only does there not seem to be a problem where we feared there might be one, but we may well be able to fix the problem if we later discover it does exist.

This research was produced by the Happier Lives Institute.

If you like our work, please consider subscribing to our newsletter.

You can also follow us on Facebook, Twitter, and LinkedIn