I’m glad I was mistaken about at least part of this—if the stitching-together was originally meant to avoid overstating what percentile someone was in, and originally intended for point estimates rather than to illustrate a trend, then that seems pretty reasonable.
In that context (which I didn’t have, and I hope it’s clear to you how without that context I’d have drawn the opposite conclusion), using the existing stitched-together data to make a chart seems like a neutral error, the sort of thing someone does because that’s the dataset they happen to have lying around. (Unless, of course, someone would have been more likely to notice and flag a chart with a suppressed trend than a chart with an exaggerated one. That sort of bias is very hard to overcome.)
This is why things like keeping track of sources are so important, though. Without that, a decision intended to make a tool more conservative ended up being used in a graph where it could be expected to exaggerate a trend, and no one seems to have noticed until you went digging (for which, again, thank you). I’m glad you intend to do better with your version.
The broader concern I share is the risk of data moving from experts to semi-experts to non-experts, with a loss of understanding at each stage. This is basically a ubiquitous problem, and EA is no exception. From looking into this back in 2013 I understand well where these numbers come from, the parts of the analysis that make me most nervous, and what they can and can’t show. But I think it’s fair to say that there has existed a risk of derivative works being produced by people dabbling in the topic on a tough schedule, and i) losing the full citation, or ii) accidentally presenting the numbers in a misleading way.
A classic case of this playing out at the moment is the confusion around GiveWell’s estimated ‘cost per life saved’ for AMF, vs the new ‘cost per life saved equivalent’. GiveWell has tried, but research communication is hard. I feel sorry for people who engage in EA advocacy part time as it’s very easy for them to get a detail wrong, or have their facts out of date (snap quiz, how probable is each of these in light of the latest research: deworming impacts i) weight, ii) school attendance, iii) incomes later in life?). This stuff should be corrected, but with love, as folks are usually doing their best, and not everyone can be expected to fully understand or keep up with research in effective altruism.
One valuable thing about this debate has been that it reminds us that people working on communicating ideas need to speak with the experts who are aware of the details and stress about getting things as accurate as they can be in practice. Ideally one individual should become the point-person who truly understands any complex data source (and gets replaced when staff move on).
The nature of the correction, I think, is that I underestimated how much individual caution there was in coming up with the original numbers. I was suggesting some amount of individual motivated cognition in generating the stitched-together dataset in the first place, and that’s what I think I was wrong about.
I still think that:
(1) The stitching-together represents a big problem and not a minor one. This is because it’s basically impossible to “sanity check” charts like this without introducing some selection bias. Each step away from the original source compounds this problem. Hugging the source data as tightly as you can and keeping track of the methodology is really the only way to fight this. Otherwise, even if there is no individual intent to mislead, we end up passing information through a long series of biased filters, and thus mainly flattering our preconceptions.
I can see the appeal of introducing individual human gatekeepers into the picture, but that comes with a pretty bad bottlenecking problem, and substitutes the bias of a single individual for the bias of the system. Having experts is great, but the point of sharing a chart is to give other people access to the underlyling information in away that’s intuitive to interpret. Robin Hanson’s post on academic vs amateur methods puts the case for this pretty clearly:
A key tradeoff in our methods is between ease and directness on the one hand, and robustness and rigor on the other. [...] When you need to make an immediate decision fast, direct easy methods look great. But when many varied people want to share an analysis process over a longer time period, more robust rigorous methods start to look better. Easy direct easy methods tend to be more uncertain and context dependent, and so don’t aggregate as well. Distant others find it harder to understand your claims and reasoning, and to judge their reliability. So distant others tend more to redo such analysis themselves rather than building on your analysis. [...]
You might think their added freedom would result in amateurs contributing proportionally more to intellectual progress, but in fact they contribute less. Yes, amateurs can and do make more initial progress when new topics arise suddenly far from topics where established expert institutions have specialized. But then over time amateurs blow their lead by focusing less and relying on easier more direct methods. They rely more on informal conversation as analysis method, they prefer personal connections over open competitions in choosing people, and they rely more on a perceived consensus among a smaller group of fellow enthusiasts. As a result, their contributions just don’t appeal as widely or as long.
GiveWell is a great example of an organization that keeps track of sources so that people who are interested can figure out how they got their numbers.
(2) It’s weird and a little sketchy that there’s not a discontinuity around 80%. This could easily be attributable to Milanovic rather than CEA, but I still think it’s a problem that that wasn’t caught, or—if there turns out to be a good explanation—documented.
(3) It’s entirely appropriate to hold CEA’s CEO (the one who used this chart at the start of the controversy you’re responding to by adding helpful information) to be held to a much higher standard than some amateur or part-time EA advocate who got excited about the implications of the chart. For this reason, while I think you’re right that it’s hard to avoid amateurs introducing large errors and substantial bias by oversimplifying things, that doesn’t seem all that relevant to the case that started this.
Here are some other thoughts off the top of my head. As I see it there are different points this figure could be used to support:
i) The social impact of someone earning, e.g. $100k a year, is potentially quite large, as they are earning more than the global average, making them unusually powerful.
ii) It’s high-impact to help people in the developing world because many people are so very poor.
iii) This high level of inequality is an indication of a deep injustice in the economic system that needs to be resolved.
It seems like some folks are particularly worried about the graph being used to support the third point. But I can’t actually recall anyone in EA circles using it to make that case (though I think one could try). Our workshop notes that some in the audience may see things that way, but then works to remain neutral on the topic as it would be a big debate in itself.
Point i) seems best measured by someone’s disposable income as a fraction of total global disposable income, or at least the average global disposable income.
Point ii) is best made by the ratio of the income of our hypothetical donor to that of someone at the 10th percentile (e.g. or whatever income percentile is the beneficiary of marginal work by GiveDirectly or AMF). Despite outstanding income growth in the middle of the distribution, IIRC the 10th percentile’s income hasn’t risen much at all. It remains around the minimum subsistence level. With graduate incomes rising in the US, this ratio has probably increased since 2008. Whether this ratio is 30, 100 or 300 is one factor relevant to how good the opportunities look in poverty reduction as a cause relatively to others (what the ratio is and how much that matters is discussed in the_jaded_one’s thread). We turn to this ratio later in our career guide, and recently did a fact check on the incomes of GiveDirectly recipients.
Interestingly, the ratio of a reader’s income to the global median doesn’t seem the best measure for any of these purposes.
I’m glad I was mistaken about at least part of this—if the stitching-together was originally meant to avoid overstating what percentile someone was in, and originally intended for point estimates rather than to illustrate a trend, then that seems pretty reasonable.
In that context (which I didn’t have, and I hope it’s clear to you how without that context I’d have drawn the opposite conclusion), using the existing stitched-together data to make a chart seems like a neutral error, the sort of thing someone does because that’s the dataset they happen to have lying around. (Unless, of course, someone would have been more likely to notice and flag a chart with a suppressed trend than a chart with an exaggerated one. That sort of bias is very hard to overcome.)
This is why things like keeping track of sources are so important, though. Without that, a decision intended to make a tool more conservative ended up being used in a graph where it could be expected to exaggerate a trend, and no one seems to have noticed until you went digging (for which, again, thank you). I’m glad you intend to do better with your version.
Hi Ben, thanks for retracting the comment.
The broader concern I share is the risk of data moving from experts to semi-experts to non-experts, with a loss of understanding at each stage. This is basically a ubiquitous problem, and EA is no exception. From looking into this back in 2013 I understand well where these numbers come from, the parts of the analysis that make me most nervous, and what they can and can’t show. But I think it’s fair to say that there has existed a risk of derivative works being produced by people dabbling in the topic on a tough schedule, and i) losing the full citation, or ii) accidentally presenting the numbers in a misleading way.
A classic case of this playing out at the moment is the confusion around GiveWell’s estimated ‘cost per life saved’ for AMF, vs the new ‘cost per life saved equivalent’. GiveWell has tried, but research communication is hard. I feel sorry for people who engage in EA advocacy part time as it’s very easy for them to get a detail wrong, or have their facts out of date (snap quiz, how probable is each of these in light of the latest research: deworming impacts i) weight, ii) school attendance, iii) incomes later in life?). This stuff should be corrected, but with love, as folks are usually doing their best, and not everyone can be expected to fully understand or keep up with research in effective altruism.
One valuable thing about this debate has been that it reminds us that people working on communicating ideas need to speak with the experts who are aware of the details and stress about getting things as accurate as they can be in practice. Ideally one individual should become the point-person who truly understands any complex data source (and gets replaced when staff move on).
The nature of the correction, I think, is that I underestimated how much individual caution there was in coming up with the original numbers. I was suggesting some amount of individual motivated cognition in generating the stitched-together dataset in the first place, and that’s what I think I was wrong about.
I still think that:
(1) The stitching-together represents a big problem and not a minor one. This is because it’s basically impossible to “sanity check” charts like this without introducing some selection bias. Each step away from the original source compounds this problem. Hugging the source data as tightly as you can and keeping track of the methodology is really the only way to fight this. Otherwise, even if there is no individual intent to mislead, we end up passing information through a long series of biased filters, and thus mainly flattering our preconceptions.
I can see the appeal of introducing individual human gatekeepers into the picture, but that comes with a pretty bad bottlenecking problem, and substitutes the bias of a single individual for the bias of the system. Having experts is great, but the point of sharing a chart is to give other people access to the underlyling information in away that’s intuitive to interpret. Robin Hanson’s post on academic vs amateur methods puts the case for this pretty clearly:
GiveWell is a great example of an organization that keeps track of sources so that people who are interested can figure out how they got their numbers.
(2) It’s weird and a little sketchy that there’s not a discontinuity around 80%. This could easily be attributable to Milanovic rather than CEA, but I still think it’s a problem that that wasn’t caught, or—if there turns out to be a good explanation—documented.
(3) It’s entirely appropriate to hold CEA’s CEO (the one who used this chart at the start of the controversy you’re responding to by adding helpful information) to be held to a much higher standard than some amateur or part-time EA advocate who got excited about the implications of the chart. For this reason, while I think you’re right that it’s hard to avoid amateurs introducing large errors and substantial bias by oversimplifying things, that doesn’t seem all that relevant to the case that started this.
What are the answers to the snap quiz btw?
Here are some other thoughts off the top of my head. As I see it there are different points this figure could be used to support:
i) The social impact of someone earning, e.g. $100k a year, is potentially quite large, as they are earning more than the global average, making them unusually powerful.
ii) It’s high-impact to help people in the developing world because many people are so very poor.
iii) This high level of inequality is an indication of a deep injustice in the economic system that needs to be resolved.
It seems like some folks are particularly worried about the graph being used to support the third point. But I can’t actually recall anyone in EA circles using it to make that case (though I think one could try). Our workshop notes that some in the audience may see things that way, but then works to remain neutral on the topic as it would be a big debate in itself.
Point i) seems best measured by someone’s disposable income as a fraction of total global disposable income, or at least the average global disposable income.
Point ii) is best made by the ratio of the income of our hypothetical donor to that of someone at the 10th percentile (e.g. or whatever income percentile is the beneficiary of marginal work by GiveDirectly or AMF). Despite outstanding income growth in the middle of the distribution, IIRC the 10th percentile’s income hasn’t risen much at all. It remains around the minimum subsistence level. With graduate incomes rising in the US, this ratio has probably increased since 2008. Whether this ratio is 30, 100 or 300 is one factor relevant to how good the opportunities look in poverty reduction as a cause relatively to others (what the ratio is and how much that matters is discussed in the_jaded_one’s thread). We turn to this ratio later in our career guide, and recently did a fact check on the incomes of GiveDirectly recipients.
Interestingly, the ratio of a reader’s income to the global median doesn’t seem the best measure for any of these purposes.