(Arguably nitpicking, in the sense that I suspect this would not change the bottom line, posted because the use of stats here raised my eyebrows)
For some calibration, risk of drug abuse, which is a reasonable baseline for other types of violent behavior as well, is about 2-3x in adopted children. This is not conditioning on it being a teenager adoption, which I expect would likely increase the ratio to more something like 3-4x, given the additional negative selection effects.
Sibling abuse rates are something like 20% (or 80% depending on your definition). And is the most frequent form of household abuse. This means by adopting a child you are adding something like an additional 60% chance of your other child going through at least some level of abuse
For the benefit of those who didn’t click through the link, the rate on their chosen measure is very roughly 3.5% for adoptees versus roughly 1.5% for the general population, which I assume is where the 2-3x came from. I also buy that by adopting a teenager this number is going to be pushed up towards the foster child outcomes (~8%); a guess like 5% (“3-4x”) seems reasonable.
But you can’t directly extrapolate from the ratio on a rare outcome to a typical outcome, e.g. a 20% → 67% (67 = 20 * 5 / 1.5) change in the absolute likelihood of sibling abuse, which I think is basically what you are doing here, though do correct me if I’m wrong since there were some numbers you gave I couldn’t follow. The statistical intuition going into that is rough, but here’s a concrete, if technical, example:
A 1.5% bad tail outcome in a normal distribution means you are 2.17 standard deviations below the mean, a 5% tail outcome means you are 1.64 SDs below the mean, and so you would go 1.5% → 5% just by dropping the mean by 0.53 SDs. But this would only move a 20% likelihood outcome to 38%, well short of 67% or even your 60%. To get a 20% outcome to 60% you need a 1.1 SD move, which would be equivalent to a 1.5% outcome becoming 14%. The choice of normal distribution in the above is arbitrary, but I expect the pattern to hold among reasonable choices for this case.
In less technical language: you don’t have to move a distribution very much to change the probability of tail outcomes by a lot, whereas almost by definition you do have to move a distribution a lot to change the probability of typical outcomes by a lot.
Thanks for this explanation. That part of Habryka’s comment also struck me as very suspicious when I read it, but it wasn’t immediately obvious what’s wrong with it exactly.
Yeah, I think this is a totally fair critique and I updated some after reading it!
I wrote the above after a long Slack conversation with Aaron at like 2AM, just trying to capture the rough shape of the argument without spending too much time on it.
I do think actually chasing this argument all the way through is interesting and possibly worth it. I think it’s pretty plausible it could make a 2-3x difference in the final outcome (and possibly a lot more!), and I hadn’t actually thought through it all the way. And while I had some gut sense it was important to differentiate between median and tail outcomes here, I hadn’t properly thought through the exact relationship between the two and am appreciative of you doing some more of the thinking.
I currently prefer your estimate of “moving it from 20% to 38%” as something like my best guess.
So, one thing I was thinking about was that people frequently use the murder-rate as a proxy for the overall crime rate, and I think I remember people doing that without any adjustment of the type you are thinking about here. Is there something special about the murder rate as a fraction of violent crimes, or should we actually make the same adjustments in that case?
I think similar adjustments should be made if you are extrapolating to crimes with very different prevalence. For example, the US murder rate is 4-5x that of the UK, but I wouldn’t expect the US to have that many more bike thefts.
Proxy seems fine if you’re focused on which country/city/etc. has higher overall crime, rather than estimating magnitude.
(FWIW, attempt at Googling the above suggest ~300k bike thefts per year in UK versus 2m in US, US population 5x bigger so that’s only 1.33x the UK rate. A quick check on bicycle sales in the two countries does not suggest that this is because of very different cycling rates. No links because on phone, but above is very rough anyway. I’m left with somewhat greater confidence that the gap is in fact <<4x, like 1.2x − 2x, though.)
Similar comments could be made about extrapolating from the large number of US billionaires (way more per capita than any other country IIRC) to the relative rates of people earning more than $200k/$50k/etc. That case might be more intuitive.
A less important motivation/mechanism is probabilities/ratios (instead of odds) are bounded above by one. For rare events ‘doubling the probability’ versus ‘doubling the odds’ get basically the same answer, but not so for more common events. Loosely, flipping a coin three times ‘trebles’ my risk of observing it landing tails, but the probability isn’t 1.5. (cf).
E.g.
Sibling abuse rates are something like 20% (or 80% depending on your definition). And is the most frequent form of household abuse. This means by adopting a child you are adding something like an additional 60% chance of your other child going through at least some level of abuse (and I would estimate something like a 15% chance of serious abuse). [my emphasis]
If you used the 80% definition instead of 20%, then the ‘4x’ risk factor implied by 60% additional chance (with 20% base rate) would give instead an additional 240% chance.
[(Of interest, 20% to 38% absolute likelihood would correspond to an odds ratio of ~2.5, in the ballpark of 3-4x risk factors discussed before. So maybe extrapolating extreme event ratios to less-extreme event ratios can do okay if you keep them in odds form. The underlying story might have something to do with logistic distributions closely resemble normal distributions (save at the tails), so thinking about shifting a normal distribution across the x axis so (non-linearly) more or less of it lies over a threshold loosely resembles adding increments to log-odds (equivalent to multiplying odds by a constant multiple) giving (non-linear) changes when traversing a logistic CDF.
But it still breaks down when extrapolating very large ORs from very rare events. Perhaps the underlying story here may have something to do with higher kurtosis : ‘>2SD events’ are only (I think) ~5X more likely than >3SD events for logistic distributions, versus ~20X in normal distribution land. So large shifts in likelihood of rare(r) events would imply large logistic-land shifts (which dramatically change the whole distribution, e.g. an OR of 10 makes evens --> >90%) much more modest in normal-land (e.g. moving up an SD gives OR>10 for previously 3SD events, but ~2 for previously ‘above average’ ones)]
Yep, I should have definitely kept the probabilities in log-form, just to be less confusing. It wouldn’t have made a huge difference to the outcome, but it seems better practice than the thing that I did.
(Arguably nitpicking, in the sense that I suspect this would not change the bottom line, posted because the use of stats here raised my eyebrows)
For the benefit of those who didn’t click through the link, the rate on their chosen measure is very roughly 3.5% for adoptees versus roughly 1.5% for the general population, which I assume is where the 2-3x came from. I also buy that by adopting a teenager this number is going to be pushed up towards the foster child outcomes (~8%); a guess like 5% (“3-4x”) seems reasonable.
But you can’t directly extrapolate from the ratio on a rare outcome to a typical outcome, e.g. a 20% → 67% (67 = 20 * 5 / 1.5) change in the absolute likelihood of sibling abuse, which I think is basically what you are doing here, though do correct me if I’m wrong since there were some numbers you gave I couldn’t follow. The statistical intuition going into that is rough, but here’s a concrete, if technical, example:
A 1.5% bad tail outcome in a normal distribution means you are 2.17 standard deviations below the mean, a 5% tail outcome means you are 1.64 SDs below the mean, and so you would go 1.5% → 5% just by dropping the mean by 0.53 SDs. But this would only move a 20% likelihood outcome to 38%, well short of 67% or even your 60%. To get a 20% outcome to 60% you need a 1.1 SD move, which would be equivalent to a 1.5% outcome becoming 14%. The choice of normal distribution in the above is arbitrary, but I expect the pattern to hold among reasonable choices for this case.
In less technical language: you don’t have to move a distribution very much to change the probability of tail outcomes by a lot, whereas almost by definition you do have to move a distribution a lot to change the probability of typical outcomes by a lot.
Thanks for this explanation. That part of Habryka’s comment also struck me as very suspicious when I read it, but it wasn’t immediately obvious what’s wrong with it exactly.
Yeah, I think this is a totally fair critique and I updated some after reading it!
I wrote the above after a long Slack conversation with Aaron at like 2AM, just trying to capture the rough shape of the argument without spending too much time on it.
I do think actually chasing this argument all the way through is interesting and possibly worth it. I think it’s pretty plausible it could make a 2-3x difference in the final outcome (and possibly a lot more!), and I hadn’t actually thought through it all the way. And while I had some gut sense it was important to differentiate between median and tail outcomes here, I hadn’t properly thought through the exact relationship between the two and am appreciative of you doing some more of the thinking.
I currently prefer your estimate of “moving it from 20% to 38%” as something like my best guess.
So, one thing I was thinking about was that people frequently use the murder-rate as a proxy for the overall crime rate, and I think I remember people doing that without any adjustment of the type you are thinking about here. Is there something special about the murder rate as a fraction of violent crimes, or should we actually make the same adjustments in that case?
I think similar adjustments should be made if you are extrapolating to crimes with very different prevalence. For example, the US murder rate is 4-5x that of the UK, but I wouldn’t expect the US to have that many more bike thefts.
Proxy seems fine if you’re focused on which country/city/etc. has higher overall crime, rather than estimating magnitude.
(FWIW, attempt at Googling the above suggest ~300k bike thefts per year in UK versus 2m in US, US population 5x bigger so that’s only 1.33x the UK rate. A quick check on bicycle sales in the two countries does not suggest that this is because of very different cycling rates. No links because on phone, but above is very rough anyway. I’m left with somewhat greater confidence that the gap is in fact <<4x, like 1.2x − 2x, though.)
Similar comments could be made about extrapolating from the large number of US billionaires (way more per capita than any other country IIRC) to the relative rates of people earning more than $200k/$50k/etc. That case might be more intuitive.
A less important motivation/mechanism is probabilities/ratios (instead of odds) are bounded above by one. For rare events ‘doubling the probability’ versus ‘doubling the odds’ get basically the same answer, but not so for more common events. Loosely, flipping a coin three times ‘trebles’ my risk of observing it landing tails, but the probability isn’t 1.5. (cf).
E.g.
If you used the 80% definition instead of 20%, then the ‘4x’ risk factor implied by 60% additional chance (with 20% base rate) would give instead an additional 240% chance.
[(Of interest, 20% to 38% absolute likelihood would correspond to an odds ratio of ~2.5, in the ballpark of 3-4x risk factors discussed before. So maybe extrapolating extreme event ratios to less-extreme event ratios can do okay if you keep them in odds form. The underlying story might have something to do with logistic distributions closely resemble normal distributions (save at the tails), so thinking about shifting a normal distribution across the x axis so (non-linearly) more or less of it lies over a threshold loosely resembles adding increments to log-odds (equivalent to multiplying odds by a constant multiple) giving (non-linear) changes when traversing a logistic CDF.
But it still breaks down when extrapolating very large ORs from very rare events. Perhaps the underlying story here may have something to do with higher kurtosis : ‘>2SD events’ are only (I think) ~5X more likely than >3SD events for logistic distributions, versus ~20X in normal distribution land. So large shifts in likelihood of rare(r) events would imply large logistic-land shifts (which dramatically change the whole distribution, e.g. an OR of 10 makes evens --> >90%) much more modest in normal-land (e.g. moving up an SD gives OR>10 for previously 3SD events, but ~2 for previously ‘above average’ ones)]
Yep, I should have definitely kept the probabilities in log-form, just to be less confusing. It wouldn’t have made a huge difference to the outcome, but it seems better practice than the thing that I did.