You point to skewness as the key mathematical concept. However, I think that most of what you discuss (e.g. “outliers which drive up the mean”) is better captured by kurtosis or other measures of ‘heavy-tailedness’ (and sometimes we may care more about the more familiar concept of variance).
In practice these will often be similar, so whether we look for e.g. “highly skewed” or “heavy-tailed” distributions often won’t make a difference.
However, conceptually they are quite different. Skewness is related to the third central moment, i.e. the third power of deviations from the mean. This means it largely measures whether we have more outliers toward the right or toward the left of the mean, and thus is a measure of ‘asymmetry’.
But arguably we often care more about how much weight outliers in either direction have compared to ‘typical’ values, or by how much future outliers might exceed the maximum value we’ve seen so far. I.e. ‘absolute’ measures of heavy-tailedness that are not defined by way of comparison to the tail on the other end of the distribution. Then we may be more interested in a measure of heavy-tailedness that is applied to just one tail of the distribution; or in kurtosis, which is related to the fourth central moment, i.e. the fourth square of deviations from the mean and thus a measure of how much weight outliers on either side have compared to typical values.
Concretely, skewness can be unintuitive for distributions that are multimodal (which can drive up skewness for a quite different reason than the distribution having a heavy or long tail on one side), or for distributions that have heavy/long tails on both sides but to different degrees.
More minor nitpick:
You point to skewness as the key mathematical concept. However, I think that most of what you discuss (e.g. “outliers which drive up the mean”) is better captured by kurtosis or other measures of ‘heavy-tailedness’ (and sometimes we may care more about the more familiar concept of variance).
In practice these will often be similar, so whether we look for e.g. “highly skewed” or “heavy-tailed” distributions often won’t make a difference.
However, conceptually they are quite different. Skewness is related to the third central moment, i.e. the third power of deviations from the mean. This means it largely measures whether we have more outliers toward the right or toward the left of the mean, and thus is a measure of ‘asymmetry’.
But arguably we often care more about how much weight outliers in either direction have compared to ‘typical’ values, or by how much future outliers might exceed the maximum value we’ve seen so far. I.e. ‘absolute’ measures of heavy-tailedness that are not defined by way of comparison to the tail on the other end of the distribution. Then we may be more interested in a measure of heavy-tailedness that is applied to just one tail of the distribution; or in kurtosis, which is related to the fourth central moment, i.e. the fourth square of deviations from the mean and thus a measure of how much weight outliers on either side have compared to typical values.
Concretely, skewness can be unintuitive for distributions that are multimodal (which can drive up skewness for a quite different reason than the distribution having a heavy or long tail on one side), or for distributions that have heavy/long tails on both sides but to different degrees.