Cool! Glad to see this, I’ve been harping on about the NPS for some time (1, 2, 3, 4).
We usually do this because we don’t want to take people’s time up by asking three questions. I haven’t done a very rigorous analysis of the trade-offs here though, and it could be that we are making a mistake and should use ACSI instead.
As you may have considered, you could ask just one of the ACSI items, rather than asking the one NPS item. This would have lower reliability than asking all three ACSI items, but I suspect that one ACSI item would have higher validity than the one NPS item. (This is particularly the case when trying to elicit general satisfaction with the EA community, but maybe less so if you literally want to know whether people are likely to recommend an event to their friends).
The added value of using three items to generate a composite measure is potentially pretty straightforward to estimate, esp if you have prior data with the items. Happy to talk more about this.
Thanks David! If you have references or could say more about the virtues of asking one ACSI question versus the NPS question, I would love to read/hear them.
There are two broad reasons why I would prefer the ACSI items (considered individually) over the NPS (style) item:
The ACSI items are (mostly) more face valid
The ACSI items generally performed better than the NPS when we ran both of these in the EAS 2020
Face validity
This depends on what you are trying to measure, so I’ll start with the context in the EAS, where (as I understand it) we are trying to measure general satisfaction with or evaluation of the EA community.
Here, I think the ACSI items we used (“How well does the EA community compare to your ideal? [(1) Not very close to the ideal - (10) Very close to the ideal]” and “What is your overall satisfaction with the EA community? [(1) Very dissatisfied - (10) Very satisfied]”) more closely and cleanly reflect the construct of interest.
In contrast, I think the NPS style item (“If you had a friend who you thought would agree with the core principles of EA, how excited would you be to introduce them to the EA community?”) does not very clearly or cleanly reflect general satisfaction. Rather, we should expect it to be confounded with:
Attitudes about introducing people to the EA community (different people have different views about how positive growing the EA community more broadly is)
Perceived/projected personal “excitement” (related to one’s (perceived) emotionality, excitability etc.)
Sociability/extraversion/interest in introducing friends to things in general, as well as one’s own level of social engagement with EA (if one is socially embedded in EA, introducing friends might make more sense than if you are very pro EA, but your interaction with it is entirely non-social)
I think some of these issues are due to the general inferiority of the NPS as a measure of what it’s supposed to be measuring:
And some of them are due to the peculiarities of the context where we’re using NPS (generally used to measure satisfaction with a consumer product) to measure attitudes towards a social movement one is a part of (hence the need to add the caveat about “a friend who you thought would agree with the core principles of EA”).
Some of the other contexts where you’re using NPS might differ. Likelihood to recommend may make more sense when you’re trying to measure evaluations of an event someone attended. But note that the ‘NPS’ question may simply be measuring qualitatively different things when used in these different contexts, despite the same instrument being presented. i.e. asking about recommending the EA community as a whole elicits judgments about whether it’s good to recommend EA to people (does spreading EA seem impactful or harmful etc?), whereas asking about recommending an event someone attended mostly just reflects positive evaluation of the course. Still, I slightly prefer a simple ACSI satisfaction measure over NPS style items, since I think it will be clearer, as well as more consistent across contexts.
Performance of measures
Since we included both the NPS item and two ACSI items in EAS 2020 we can say a little about how they performed, although with only 1-2 items and not much to compare them to, there’s not a huge amount we can do to evaluate them.
Still, the general impression I got from the performance of the items last year confirms my view that the two ACSI measures cohere as a clean measure of satisfaction, while NPS and the other items are more of a mess. As noted, we see that the two ACSI measures are closely correlated with each other (presumably measuring satisfaction), while the NPS measure is moderately correlated with the ‘bespoke’ measures (e.g. “I feel that I am part of the EA community”) which seem to be (noisily) measuring engagement more than satisfaction or positive evaluation. I think it’s ultimately unclear what any of those three items are measuring since they’re all just imperfectly correlated with each other, engagement and with satisfaction, so I think they are measuring a mix of things, some of which are unknown. Theoretically, one could simply run a larger suite of items, designed to measure satisfaction, engagement, and other things which we think might be related (such as what the bespoke measures are intended to measure) and tease out what the measures are tracking. But there’s not a huge amount we can do with just 5-6 items and 2-3 apparent factors they are measuring.
Benefits of multiple measures
As an aside, we put together some illustrations of the possible concrete benefits of using a composite measure of multiple items, rather than a single measure.
The plot below shows the error (differences between the measured value and the true value: higher values, in absolute terms, are worse) with a single item vs an average made from two or three items. Naturally, this depends on assumptions about how noisy each item is and how correlated each of the items are, but it is generally the case that using multiple items helps to reduce error and ensure that estimates come closer to the true value.
This next image shows the power to detect a correlation of around r = 0.3 using 1, 2 or 3 items. The composite of more items should have lower measurement error. When only a single item is used, the higher measurement error means that a true relationship between the measured variable and another variable of interest can be harder to detect. With the average of 2 or 3 items, the measure is less noisy, and so the same underlying effect can be detected more easily (i.e., with fewer participants). (The three different images just show different standards for significance)
Cool! Glad to see this, I’ve been harping on about the NPS for some time (1, 2, 3, 4).
As you may have considered, you could ask just one of the ACSI items, rather than asking the one NPS item. This would have lower reliability than asking all three ACSI items, but I suspect that one ACSI item would have higher validity than the one NPS item. (This is particularly the case when trying to elicit general satisfaction with the EA community, but maybe less so if you literally want to know whether people are likely to recommend an event to their friends).
The added value of using three items to generate a composite measure is potentially pretty straightforward to estimate, esp if you have prior data with the items. Happy to talk more about this.
Thanks David! If you have references or could say more about the virtues of asking one ACSI question versus the NPS question, I would love to read/hear them.
Hi Ben.
There are two broad reasons why I would prefer the ACSI items (considered individually) over the NPS (style) item:
The ACSI items are (mostly) more face valid
The ACSI items generally performed better than the NPS when we ran both of these in the EAS 2020
Face validity
This depends on what you are trying to measure, so I’ll start with the context in the EAS, where (as I understand it) we are trying to measure general satisfaction with or evaluation of the EA community.
Here, I think the ACSI items we used (“How well does the EA community compare to your ideal? [(1) Not very close to the ideal - (10) Very close to the ideal]” and “What is your overall satisfaction with the EA community? [(1) Very dissatisfied - (10) Very satisfied]”) more closely and cleanly reflect the construct of interest.
In contrast, I think the NPS style item (“If you had a friend who you thought would agree with the core principles of EA, how excited would you be to introduce them to the EA community?”) does not very clearly or cleanly reflect general satisfaction. Rather, we should expect it to be confounded with:
Attitudes about introducing people to the EA community (different people have different views about how positive growing the EA community more broadly is)
Perceived/projected personal “excitement” (related to one’s (perceived) emotionality, excitability etc.)
Sociability/extraversion/interest in introducing friends to things in general, as well as one’s own level of social engagement with EA (if one is socially embedded in EA, introducing friends might make more sense than if you are very pro EA, but your interaction with it is entirely non-social)
I think some of these issues are due to the general inferiority of the NPS as a measure of what it’s supposed to be measuring:
And some of them are due to the peculiarities of the context where we’re using NPS (generally used to measure satisfaction with a consumer product) to measure attitudes towards a social movement one is a part of (hence the need to add the caveat about “a friend who you thought would agree with the core principles of EA”).
Some of the other contexts where you’re using NPS might differ. Likelihood to recommend may make more sense when you’re trying to measure evaluations of an event someone attended. But note that the ‘NPS’ question may simply be measuring qualitatively different things when used in these different contexts, despite the same instrument being presented. i.e. asking about recommending the EA community as a whole elicits judgments about whether it’s good to recommend EA to people (does spreading EA seem impactful or harmful etc?), whereas asking about recommending an event someone attended mostly just reflects positive evaluation of the course. Still, I slightly prefer a simple ACSI satisfaction measure over NPS style items, since I think it will be clearer, as well as more consistent across contexts.
Performance of measures
Since we included both the NPS item and two ACSI items in EAS 2020 we can say a little about how they performed, although with only 1-2 items and not much to compare them to, there’s not a huge amount we can do to evaluate them.
Still, the general impression I got from the performance of the items last year confirms my view that the two ACSI measures cohere as a clean measure of satisfaction, while NPS and the other items are more of a mess. As noted, we see that the two ACSI measures are closely correlated with each other (presumably measuring satisfaction), while the NPS measure is moderately correlated with the ‘bespoke’ measures (e.g. “I feel that I am part of the EA community”) which seem to be (noisily) measuring engagement more than satisfaction or positive evaluation. I think it’s ultimately unclear what any of those three items are measuring since they’re all just imperfectly correlated with each other, engagement and with satisfaction, so I think they are measuring a mix of things, some of which are unknown. Theoretically, one could simply run a larger suite of items, designed to measure satisfaction, engagement, and other things which we think might be related (such as what the bespoke measures are intended to measure) and tease out what the measures are tracking. But there’s not a huge amount we can do with just 5-6 items and 2-3 apparent factors they are measuring.
Benefits of multiple measures
As an aside, we put together some illustrations of the possible concrete benefits of using a composite measure of multiple items, rather than a single measure.
The plot below shows the error (differences between the measured value and the true value: higher values, in absolute terms, are worse) with a single item vs an average made from two or three items. Naturally, this depends on assumptions about how noisy each item is and how correlated each of the items are, but it is generally the case that using multiple items helps to reduce error and ensure that estimates come closer to the true value.
This next image shows the power to detect a correlation of around r = 0.3 using 1, 2 or 3 items. The composite of more items should have lower measurement error. When only a single item is used, the higher measurement error means that a true relationship between the measured variable and another variable of interest can be harder to detect. With the average of 2 or 3 items, the measure is less noisy, and so the same underlying effect can be detected more easily (i.e., with fewer participants). (The three different images just show different standards for significance)
I just wanted to say that I always appreciate your in-depth responses David! They are always really easy to follow and informative :)
I’d also be interested in this!