“Most people make the mistake of generalizing from a single data point. Or at least, I do.”—SA
When can you learn a lot from one data point? People, especially stats- or science- brained people, are often confused about this, and frequently give answers that (imo) are the opposite of useful. Eg they say that usually you can’t know much but if you know a lot about the meta-structure of your distribution (eg you’re interested in the mean of a distribution with low variance), sometimes a single data point can be a significant update.
This type of limited conclusion on the face of it looks epistemically humble, but in practice it’s the opposite of correct. Single data points aren’t particularly useful when you know a lot, but they’re very useful when you have very little knowledge to begin with. If your uncertainty about a variable in question spans many orders of magnitude, the first observation can often reduce more uncertainty than the next 2-10 observations put together[1]. Put another way, the most useful situations for updating massively from a single data point are when you know very little to begin with.
For example, if an alien sees a human car for the first time, the alien can make massive updates on many different things regarding Earthling society, technology, biology and culture. Similarly, an anthropologist landing on an island of a previously uncontacted tribe can rapidly learn so much about a new culture from a single hour of peaceful interaction [2].
Some other examples:
Your first day at a new job.
First time visiting a country/region you previously knew nothing about. One afternoon in Vietnam tells you roughly how much things cost, how traffic works, what the food is like, languages people speak, how people interact with strangers.
Trying a new fruit for the first time. One bite of durian tells you an enormous amount about whether you’ll like durian.
Your first interaction with someone’s kid tells you roughly how old they are, how verbal they are, what they’re like temperamentally. You went from “I know nothing about this child” to a working model.
Far from idiosyncratic and unscientific, these forms of “generalizing from a single data point” are just very normal, and very important, parts of normal human life and street epistemology.
This is the point that Douglas Hubbard tries to hammer in repeatedly over the course of his book, How to Measure Anything: You know less than you think you do, and a single measurement can sometimes be a massive update.
[1] this is basically tautological from a high-entropy prior.
Seems you and Spencer Greenberg (whose piece you linked to) are talking past each other because you both disagree on what the interesting epistemic question is and/or are just writing for different audiences?
Spencer is asking “When can a single observation justify a strong inference about a general claim?” which is about de-risking overgeneralisation, a fair thing to focus on since many people generalise too readily
You’re asking “When does a single observation maximally reduce your uncertainty?” which is about information-theoretic value, which (like you said) is moreso aimed towards the “stats-brained”
Also seems a bit misleading to count something like “one afternoon in Vietnam” or “first day at a new job” as a single data point when it’s hundreds of them bundled together? Spencer’s examples seem to lean more towards actual single data points (if not all the way). And Spencer’s 4th example on how one data point can sometimes unlock a whole bunch of other data points by triggering a figure-ground inversion that then causes a reconsideration of your vie seems perfectly aligned with Hubbard’s point.
That said I do think the point you’re making is the more practically useful one, I guess I’m just nitpicking.
Also seems a bit misleading to count something like “one afternoon in Vietnam” or “first day at a new job” as a single data point when it’s hundreds of them bundled together?
From a information-theoretic perspective, people almost never refer to a single data point as strictly as just one bit, so whether you are counting only one float in a database or a whole row in a structured database, or also a whole conversation, we’re sort of negotiating price.
I think the “alien seeing a car” makes the case somewhat clearer. If you already have a deep model of cars (or even a shallow one), seeing another instance of a Ford Focus tells you relatively little, but an alien coming across one will get many bits about it, perhaps more than a human spending an afternoon in Vietnam.
“Most people make the mistake of generalizing from a single data point. Or at least, I do.”—SA
When can you learn a lot from one data point? People, especially stats- or science- brained people, are often confused about this, and frequently give answers that (imo) are the opposite of useful. Eg they say that usually you can’t know much but if you know a lot about the meta-structure of your distribution (eg you’re interested in the mean of a distribution with low variance), sometimes a single data point can be a significant update.
This type of limited conclusion on the face of it looks epistemically humble, but in practice it’s the opposite of correct. Single data points aren’t particularly useful when you know a lot, but they’re very useful when you have very little knowledge to begin with. If your uncertainty about a variable in question spans many orders of magnitude, the first observation can often reduce more uncertainty than the next 2-10 observations put together[1]. Put another way, the most useful situations for updating massively from a single data point are when you know very little to begin with.
For example, if an alien sees a human car for the first time, the alien can make massive updates on many different things regarding Earthling society, technology, biology and culture. Similarly, an anthropologist landing on an island of a previously uncontacted tribe can rapidly learn so much about a new culture from a single hour of peaceful interaction [2].
Some other examples:
Your first day at a new job.
First time visiting a country/region you previously knew nothing about. One afternoon in Vietnam tells you roughly how much things cost, how traffic works, what the food is like, languages people speak, how people interact with strangers.
Trying a new fruit for the first time. One bite of durian tells you an enormous amount about whether you’ll like durian.
Your first interaction with someone’s kid tells you roughly how old they are, how verbal they are, what they’re like temperamentally. You went from “I know nothing about this child” to a working model.
Far from idiosyncratic and unscientific, these forms of “generalizing from a single data point” are just very normal, and very important, parts of normal human life and street epistemology.
This is the point that Douglas Hubbard tries to hammer in repeatedly over the course of his book, How to Measure Anything: You know less than you think you do, and a single measurement can sometimes be a massive update.
[1] this is basically tautological from a high-entropy prior.
[2] I like Monolingual Fieldwork as a demonstration for the possibilities in linguistics: https://www.youtube.com/watch?v=sYpWp7g7XWU&t=2s
Seems you and Spencer Greenberg (whose piece you linked to) are talking past each other because you both disagree on what the interesting epistemic question is and/or are just writing for different audiences?
Spencer is asking “When can a single observation justify a strong inference about a general claim?” which is about de-risking overgeneralisation, a fair thing to focus on since many people generalise too readily
You’re asking “When does a single observation maximally reduce your uncertainty?” which is about information-theoretic value, which (like you said) is moreso aimed towards the “stats-brained”
Also seems a bit misleading to count something like “one afternoon in Vietnam” or “first day at a new job” as a single data point when it’s hundreds of them bundled together? Spencer’s examples seem to lean more towards actual single data points (if not all the way). And Spencer’s 4th example on how one data point can sometimes unlock a whole bunch of other data points by triggering a figure-ground inversion that then causes a reconsideration of your vie seems perfectly aligned with Hubbard’s point.
That said I do think the point you’re making is the more practically useful one, I guess I’m just nitpicking.
From a information-theoretic perspective, people almost never refer to a single data point as strictly as just one bit, so whether you are counting only one float in a database or a whole row in a structured database, or also a whole conversation, we’re sort of negotiating price.
I think the “alien seeing a car” makes the case somewhat clearer. If you already have a deep model of cars (or even a shallow one), seeing another instance of a Ford Focus tells you relatively little, but an alien coming across one will get many bits about it, perhaps more than a human spending an afternoon in Vietnam.