Thanks for writing this! This is a helpful overview of some of the challenges in coming up with a single quantitative view.
Overall, I think this suggests two things about how to display and interpret the relevant data.
First, when using purely quantitative estimates of distributions in currency terms to illustrate an overall trend, use a variety of different consistent estimates. It seems like when the whole thing you’re trying to estimate is income inequality, stitching together different sources for the portions below and above the 80th percentile is very likely to introduce problems. For instance, if your above-80% source is better at detecting income, or otherwise biased upwards relative to your below-80% source, then this will substantially overestimate income inequality.
I would have liked to instead see the whole curve drawn from PovCalNet numbers, with the trendline from Milanovic overlaid on it. Or, ideally, as many different estimated lines as you can measure on the same axis. It’s fine to merely footnote or link to explanations for exactly why the lines differ, how they were generated, and your thoughts on which ones are better estimates for which quantiles, but when you use a single graph, people are likely to assume that it’s an authoritative illustration of a single data source, whereas showing multiple estimates makes it clearer that there is uncertainty about the details, but not about the fact that the distribution is very unequal.
If you’re worried that this would lead to a too-noisy graph, I highly recommend Edward Tufte’s books for advice on how to visually display a large amount of quantitative information elegantly.
Second, we shouldn’t use these numbers directly to make judgments about specific programs to help poor people. Instead, when trying to evaluate any particular decision, we should make sure we understand how a difference in dollar figures relates to a difference in material conditions, since this will not be perfectly consistent.
For instance, if you use “purchasing power parity” figures, you may get a better estimate of how big differences in material circumstances are, but at the cost of obscuring things like what percentage of someone’s income a cash transfer of a certain size will constitute. For this reason, the work charities like GiveDirectly and JPAL are doing directly reporting on what happens as a result of various interventions is extremely important.
Thanks Ben, this sounds reasonable. I’m working to create a new figure that will have more recent data, inflation adjust up to 2017, and offer more details about precisely how it was constructed. I’ll keep these ideas in mind.
Unfortunately, as I’m waiting on other people busy people to get back to me with the data/information I need, I can’t say when I’ll be able to put it up.
As an aside, one advantage to Tufte-style information-dense charts is that they can be interesting enough to engage readers with more of the details of the content, and not just say “yeah, OK” to your main point. For instance, the first graph in your post rewards additional attention. When readers engage with the details, they may learn more valuable things from the data than the ones you’d had in mind.
On reflection, it’s not clear to me that anyone has the appropriate level of urgency around this. Two distinct datasets were stitched together at the 80th percentile. The dataset used for the above-80 figures was chosen specifically because it had higher numbers. This chart was then used specifically to illustrate how unequally the quantity was distributed.
This is not a problem on the level of “someone could potentially be misled”. This is a problem on the level of “this chart was cooked up specifically to favor the intended conclusion.” When you’re picking and choosing sources for part of a trend, it stops mattering that the chart was originally based on real data.
It’s entirely possible for someone to make this sort of error thoughtlessly rather than maliciously, but now that the error has been discovered, the honest thing to do is promptly and prominently retract the chart, with an explanation.
It’s also possible I’m somehow misunderstanding. For instance, I’m confused about why there isn’t at least a small discontinuity around the 80th percentile—substantially differing methodologies shouldn’t get the exact same numbers.
“This is a problem on the level of “this chart was cooked up specifically to favor the intended conclusion....”
Actually, our incentives were the precise reverse when this data was being put together. These figures first appeared in the ‘How Rich Are You Calculator’. In that context we took people who knew their income, and told them what percentage of households they were richer than. It would have been in our interests to include the lowest income numbers possible for the richest folks in the world, in order to inflate what global income percentile people stood at.
That could have been achieved by going with PovcalNet’s numbers the whole way. Had we been lazy we could have done this more easily than what we did do, as these numbers were already public. We could have then claimed that an individual earning $36,500 is richer than 99.85% of the world! But this is quite wrong. PovcalNet is designed to be reliable for lower incomes, as part of the World Bank’s attempt to measure poverty and economic development around the world. It progressively understates the incomes of people at the top of the income distribution as they aren’t well sampled; hence the need for Milanović’s alternative numbers for that group.
GWWC used Milanović’s numbers for as much of the distribution as he gave us data for (i.e. it did not exercise discretion about where to switch).
Unfortunately, I was not working at GWWC when the two datasets were combined, so I wouldn’t want to comment on how that was done. Any new chart should document how things like that are performed (and mine will).
The most material problem as I see it is that PovcalNet and other measures of poverty usually measure consumption (to ensure inclusion of e.g. growing your own food or foraging for free things), while figures for people in developed countries measure income (as that’s what people know and it can be found in on tax records, while most households don’t know their net consumption in any given year). The effect on the shape should be modest:
Most people on the graph, which caps out at $100k, are not being among the super-rich, which means they will consume most of their lifetime income before they die. The US personal household savings rate is a measly 6%, suggesting pretty small adjustments.
A large fraction of people at the bottom of the distribution are not in a position to accumulate significant financial assets—most ‘savings’ will come in the form of consumer durables (e.g. bricks or a roof on a house) that will be picked up as consumption. Furthermore using consumption inflates the income of the poor relative to the rich because it includes things received for free that wouldn’t be included in income measures for people in the developing world.
Nonetheless, I think this does bias the graph towards showing higher inequality. I’m not yet sure how I’ll fix this, as I don’t know of reliable figures across the whole distribution that use only one of these measures, or figures of net savings as a percent of income across the income distribution, which could be used to fix the discrepancy. I’m open to ideas or new data sources if anyone has one. In the absence of that we’ll just have to continue explaining this weakness of the method.
I’m looking forward to improving this as far as I can, but I suspect that it won’t change the big picture very much.
I’m glad I was mistaken about at least part of this—if the stitching-together was originally meant to avoid overstating what percentile someone was in, and originally intended for point estimates rather than to illustrate a trend, then that seems pretty reasonable.
In that context (which I didn’t have, and I hope it’s clear to you how without that context I’d have drawn the opposite conclusion), using the existing stitched-together data to make a chart seems like a neutral error, the sort of thing someone does because that’s the dataset they happen to have lying around. (Unless, of course, someone would have been more likely to notice and flag a chart with a suppressed trend than a chart with an exaggerated one. That sort of bias is very hard to overcome.)
This is why things like keeping track of sources are so important, though. Without that, a decision intended to make a tool more conservative ended up being used in a graph where it could be expected to exaggerate a trend, and no one seems to have noticed until you went digging (for which, again, thank you). I’m glad you intend to do better with your version.
The broader concern I share is the risk of data moving from experts to semi-experts to non-experts, with a loss of understanding at each stage. This is basically a ubiquitous problem, and EA is no exception. From looking into this back in 2013 I understand well where these numbers come from, the parts of the analysis that make me most nervous, and what they can and can’t show. But I think it’s fair to say that there has existed a risk of derivative works being produced by people dabbling in the topic on a tough schedule, and i) losing the full citation, or ii) accidentally presenting the numbers in a misleading way.
A classic case of this playing out at the moment is the confusion around GiveWell’s estimated ‘cost per life saved’ for AMF, vs the new ‘cost per life saved equivalent’. GiveWell has tried, but research communication is hard. I feel sorry for people who engage in EA advocacy part time as it’s very easy for them to get a detail wrong, or have their facts out of date (snap quiz, how probable is each of these in light of the latest research: deworming impacts i) weight, ii) school attendance, iii) incomes later in life?). This stuff should be corrected, but with love, as folks are usually doing their best, and not everyone can be expected to fully understand or keep up with research in effective altruism.
One valuable thing about this debate has been that it reminds us that people working on communicating ideas need to speak with the experts who are aware of the details and stress about getting things as accurate as they can be in practice. Ideally one individual should become the point-person who truly understands any complex data source (and gets replaced when staff move on).
The nature of the correction, I think, is that I underestimated how much individual caution there was in coming up with the original numbers. I was suggesting some amount of individual motivated cognition in generating the stitched-together dataset in the first place, and that’s what I think I was wrong about.
I still think that:
(1) The stitching-together represents a big problem and not a minor one. This is because it’s basically impossible to “sanity check” charts like this without introducing some selection bias. Each step away from the original source compounds this problem. Hugging the source data as tightly as you can and keeping track of the methodology is really the only way to fight this. Otherwise, even if there is no individual intent to mislead, we end up passing information through a long series of biased filters, and thus mainly flattering our preconceptions.
I can see the appeal of introducing individual human gatekeepers into the picture, but that comes with a pretty bad bottlenecking problem, and substitutes the bias of a single individual for the bias of the system. Having experts is great, but the point of sharing a chart is to give other people access to the underlyling information in away that’s intuitive to interpret. Robin Hanson’s post on academic vs amateur methods puts the case for this pretty clearly:
A key tradeoff in our methods is between ease and directness on the one hand, and robustness and rigor on the other. [...] When you need to make an immediate decision fast, direct easy methods look great. But when many varied people want to share an analysis process over a longer time period, more robust rigorous methods start to look better. Easy direct easy methods tend to be more uncertain and context dependent, and so don’t aggregate as well. Distant others find it harder to understand your claims and reasoning, and to judge their reliability. So distant others tend more to redo such analysis themselves rather than building on your analysis. [...]
You might think their added freedom would result in amateurs contributing proportionally more to intellectual progress, but in fact they contribute less. Yes, amateurs can and do make more initial progress when new topics arise suddenly far from topics where established expert institutions have specialized. But then over time amateurs blow their lead by focusing less and relying on easier more direct methods. They rely more on informal conversation as analysis method, they prefer personal connections over open competitions in choosing people, and they rely more on a perceived consensus among a smaller group of fellow enthusiasts. As a result, their contributions just don’t appeal as widely or as long.
GiveWell is a great example of an organization that keeps track of sources so that people who are interested can figure out how they got their numbers.
(2) It’s weird and a little sketchy that there’s not a discontinuity around 80%. This could easily be attributable to Milanovic rather than CEA, but I still think it’s a problem that that wasn’t caught, or—if there turns out to be a good explanation—documented.
(3) It’s entirely appropriate to hold CEA’s CEO (the one who used this chart at the start of the controversy you’re responding to by adding helpful information) to be held to a much higher standard than some amateur or part-time EA advocate who got excited about the implications of the chart. For this reason, while I think you’re right that it’s hard to avoid amateurs introducing large errors and substantial bias by oversimplifying things, that doesn’t seem all that relevant to the case that started this.
Here are some other thoughts off the top of my head. As I see it there are different points this figure could be used to support:
i) The social impact of someone earning, e.g. $100k a year, is potentially quite large, as they are earning more than the global average, making them unusually powerful.
ii) It’s high-impact to help people in the developing world because many people are so very poor.
iii) This high level of inequality is an indication of a deep injustice in the economic system that needs to be resolved.
It seems like some folks are particularly worried about the graph being used to support the third point. But I can’t actually recall anyone in EA circles using it to make that case (though I think one could try). Our workshop notes that some in the audience may see things that way, but then works to remain neutral on the topic as it would be a big debate in itself.
Point i) seems best measured by someone’s disposable income as a fraction of total global disposable income, or at least the average global disposable income.
Point ii) is best made by the ratio of the income of our hypothetical donor to that of someone at the 10th percentile (e.g. or whatever income percentile is the beneficiary of marginal work by GiveDirectly or AMF). Despite outstanding income growth in the middle of the distribution, IIRC the 10th percentile’s income hasn’t risen much at all. It remains around the minimum subsistence level. With graduate incomes rising in the US, this ratio has probably increased since 2008. Whether this ratio is 30, 100 or 300 is one factor relevant to how good the opportunities look in poverty reduction as a cause relatively to others (what the ratio is and how much that matters is discussed in the_jaded_one’s thread). We turn to this ratio later in our career guide, and recently did a fact check on the incomes of GiveDirectly recipients.
Interestingly, the ratio of a reader’s income to the global median doesn’t seem the best measure for any of these purposes.
Thanks for writing this! This is a helpful overview of some of the challenges in coming up with a single quantitative view.
Overall, I think this suggests two things about how to display and interpret the relevant data.
First, when using purely quantitative estimates of distributions in currency terms to illustrate an overall trend, use a variety of different consistent estimates. It seems like when the whole thing you’re trying to estimate is income inequality, stitching together different sources for the portions below and above the 80th percentile is very likely to introduce problems. For instance, if your above-80% source is better at detecting income, or otherwise biased upwards relative to your below-80% source, then this will substantially overestimate income inequality.
I would have liked to instead see the whole curve drawn from PovCalNet numbers, with the trendline from Milanovic overlaid on it. Or, ideally, as many different estimated lines as you can measure on the same axis. It’s fine to merely footnote or link to explanations for exactly why the lines differ, how they were generated, and your thoughts on which ones are better estimates for which quantiles, but when you use a single graph, people are likely to assume that it’s an authoritative illustration of a single data source, whereas showing multiple estimates makes it clearer that there is uncertainty about the details, but not about the fact that the distribution is very unequal.
If you’re worried that this would lead to a too-noisy graph, I highly recommend Edward Tufte’s books for advice on how to visually display a large amount of quantitative information elegantly.
Second, we shouldn’t use these numbers directly to make judgments about specific programs to help poor people. Instead, when trying to evaluate any particular decision, we should make sure we understand how a difference in dollar figures relates to a difference in material conditions, since this will not be perfectly consistent.
For instance, if you use “purchasing power parity” figures, you may get a better estimate of how big differences in material circumstances are, but at the cost of obscuring things like what percentage of someone’s income a cash transfer of a certain size will constitute. For this reason, the work charities like GiveDirectly and JPAL are doing directly reporting on what happens as a result of various interventions is extremely important.
Thanks Ben, this sounds reasonable. I’m working to create a new figure that will have more recent data, inflation adjust up to 2017, and offer more details about precisely how it was constructed. I’ll keep these ideas in mind.
Unfortunately, as I’m waiting on other people busy people to get back to me with the data/information I need, I can’t say when I’ll be able to put it up.
I’m looking forward to it whenever it’s ready.
As an aside, one advantage to Tufte-style information-dense charts is that they can be interesting enough to engage readers with more of the details of the content, and not just say “yeah, OK” to your main point. For instance, the first graph in your post rewards additional attention. When readers engage with the details, they may learn more valuable things from the data than the ones you’d had in mind.
On reflection, it’s not clear to me that anyone has the appropriate level of urgency around this. Two distinct datasets were stitched together at the 80th percentile. The dataset used for the above-80 figures was chosen specifically because it had higher numbers. This chart was then used specifically to illustrate how unequally the quantity was distributed.
This is not a problem on the level of “someone could potentially be misled”. This is a problem on the level of “this chart was cooked up specifically to favor the intended conclusion.” When you’re picking and choosing sources for part of a trend, it stops mattering that the chart was originally based on real data.
It’s entirely possible for someone to make this sort of error thoughtlessly rather than maliciously, but now that the error has been discovered, the honest thing to do is promptly and prominently retract the chart, with an explanation.
It’s also possible I’m somehow misunderstanding. For instance, I’m confused about why there isn’t at least a small discontinuity around the 80th percentile—substantially differing methodologies shouldn’t get the exact same numbers.
Actually, our incentives were the precise reverse when this data was being put together. These figures first appeared in the ‘How Rich Are You Calculator’. In that context we took people who knew their income, and told them what percentage of households they were richer than. It would have been in our interests to include the lowest income numbers possible for the richest folks in the world, in order to inflate what global income percentile people stood at.
That could have been achieved by going with PovcalNet’s numbers the whole way. Had we been lazy we could have done this more easily than what we did do, as these numbers were already public. We could have then claimed that an individual earning $36,500 is richer than 99.85% of the world! But this is quite wrong. PovcalNet is designed to be reliable for lower incomes, as part of the World Bank’s attempt to measure poverty and economic development around the world. It progressively understates the incomes of people at the top of the income distribution as they aren’t well sampled; hence the need for Milanović’s alternative numbers for that group.
GWWC used Milanović’s numbers for as much of the distribution as he gave us data for (i.e. it did not exercise discretion about where to switch).
Unfortunately, I was not working at GWWC when the two datasets were combined, so I wouldn’t want to comment on how that was done. Any new chart should document how things like that are performed (and mine will).
The most material problem as I see it is that PovcalNet and other measures of poverty usually measure consumption (to ensure inclusion of e.g. growing your own food or foraging for free things), while figures for people in developed countries measure income (as that’s what people know and it can be found in on tax records, while most households don’t know their net consumption in any given year). The effect on the shape should be modest:
Most people on the graph, which caps out at $100k, are not being among the super-rich, which means they will consume most of their lifetime income before they die. The US personal household savings rate is a measly 6%, suggesting pretty small adjustments.
A large fraction of people at the bottom of the distribution are not in a position to accumulate significant financial assets—most ‘savings’ will come in the form of consumer durables (e.g. bricks or a roof on a house) that will be picked up as consumption. Furthermore using consumption inflates the income of the poor relative to the rich because it includes things received for free that wouldn’t be included in income measures for people in the developing world.
Nonetheless, I think this does bias the graph towards showing higher inequality. I’m not yet sure how I’ll fix this, as I don’t know of reliable figures across the whole distribution that use only one of these measures, or figures of net savings as a percent of income across the income distribution, which could be used to fix the discrepancy. I’m open to ideas or new data sources if anyone has one. In the absence of that we’ll just have to continue explaining this weakness of the method.
I’m looking forward to improving this as far as I can, but I suspect that it won’t change the big picture very much.
I’m glad I was mistaken about at least part of this—if the stitching-together was originally meant to avoid overstating what percentile someone was in, and originally intended for point estimates rather than to illustrate a trend, then that seems pretty reasonable.
In that context (which I didn’t have, and I hope it’s clear to you how without that context I’d have drawn the opposite conclusion), using the existing stitched-together data to make a chart seems like a neutral error, the sort of thing someone does because that’s the dataset they happen to have lying around. (Unless, of course, someone would have been more likely to notice and flag a chart with a suppressed trend than a chart with an exaggerated one. That sort of bias is very hard to overcome.)
This is why things like keeping track of sources are so important, though. Without that, a decision intended to make a tool more conservative ended up being used in a graph where it could be expected to exaggerate a trend, and no one seems to have noticed until you went digging (for which, again, thank you). I’m glad you intend to do better with your version.
Hi Ben, thanks for retracting the comment.
The broader concern I share is the risk of data moving from experts to semi-experts to non-experts, with a loss of understanding at each stage. This is basically a ubiquitous problem, and EA is no exception. From looking into this back in 2013 I understand well where these numbers come from, the parts of the analysis that make me most nervous, and what they can and can’t show. But I think it’s fair to say that there has existed a risk of derivative works being produced by people dabbling in the topic on a tough schedule, and i) losing the full citation, or ii) accidentally presenting the numbers in a misleading way.
A classic case of this playing out at the moment is the confusion around GiveWell’s estimated ‘cost per life saved’ for AMF, vs the new ‘cost per life saved equivalent’. GiveWell has tried, but research communication is hard. I feel sorry for people who engage in EA advocacy part time as it’s very easy for them to get a detail wrong, or have their facts out of date (snap quiz, how probable is each of these in light of the latest research: deworming impacts i) weight, ii) school attendance, iii) incomes later in life?). This stuff should be corrected, but with love, as folks are usually doing their best, and not everyone can be expected to fully understand or keep up with research in effective altruism.
One valuable thing about this debate has been that it reminds us that people working on communicating ideas need to speak with the experts who are aware of the details and stress about getting things as accurate as they can be in practice. Ideally one individual should become the point-person who truly understands any complex data source (and gets replaced when staff move on).
The nature of the correction, I think, is that I underestimated how much individual caution there was in coming up with the original numbers. I was suggesting some amount of individual motivated cognition in generating the stitched-together dataset in the first place, and that’s what I think I was wrong about.
I still think that:
(1) The stitching-together represents a big problem and not a minor one. This is because it’s basically impossible to “sanity check” charts like this without introducing some selection bias. Each step away from the original source compounds this problem. Hugging the source data as tightly as you can and keeping track of the methodology is really the only way to fight this. Otherwise, even if there is no individual intent to mislead, we end up passing information through a long series of biased filters, and thus mainly flattering our preconceptions.
I can see the appeal of introducing individual human gatekeepers into the picture, but that comes with a pretty bad bottlenecking problem, and substitutes the bias of a single individual for the bias of the system. Having experts is great, but the point of sharing a chart is to give other people access to the underlyling information in away that’s intuitive to interpret. Robin Hanson’s post on academic vs amateur methods puts the case for this pretty clearly:
GiveWell is a great example of an organization that keeps track of sources so that people who are interested can figure out how they got their numbers.
(2) It’s weird and a little sketchy that there’s not a discontinuity around 80%. This could easily be attributable to Milanovic rather than CEA, but I still think it’s a problem that that wasn’t caught, or—if there turns out to be a good explanation—documented.
(3) It’s entirely appropriate to hold CEA’s CEO (the one who used this chart at the start of the controversy you’re responding to by adding helpful information) to be held to a much higher standard than some amateur or part-time EA advocate who got excited about the implications of the chart. For this reason, while I think you’re right that it’s hard to avoid amateurs introducing large errors and substantial bias by oversimplifying things, that doesn’t seem all that relevant to the case that started this.
What are the answers to the snap quiz btw?
Here are some other thoughts off the top of my head. As I see it there are different points this figure could be used to support:
i) The social impact of someone earning, e.g. $100k a year, is potentially quite large, as they are earning more than the global average, making them unusually powerful.
ii) It’s high-impact to help people in the developing world because many people are so very poor.
iii) This high level of inequality is an indication of a deep injustice in the economic system that needs to be resolved.
It seems like some folks are particularly worried about the graph being used to support the third point. But I can’t actually recall anyone in EA circles using it to make that case (though I think one could try). Our workshop notes that some in the audience may see things that way, but then works to remain neutral on the topic as it would be a big debate in itself.
Point i) seems best measured by someone’s disposable income as a fraction of total global disposable income, or at least the average global disposable income.
Point ii) is best made by the ratio of the income of our hypothetical donor to that of someone at the 10th percentile (e.g. or whatever income percentile is the beneficiary of marginal work by GiveDirectly or AMF). Despite outstanding income growth in the middle of the distribution, IIRC the 10th percentile’s income hasn’t risen much at all. It remains around the minimum subsistence level. With graduate incomes rising in the US, this ratio has probably increased since 2008. Whether this ratio is 30, 100 or 300 is one factor relevant to how good the opportunities look in poverty reduction as a cause relatively to others (what the ratio is and how much that matters is discussed in the_jaded_one’s thread). We turn to this ratio later in our career guide, and recently did a fact check on the incomes of GiveDirectly recipients.
Interestingly, the ratio of a reader’s income to the global median doesn’t seem the best measure for any of these purposes.