I agree that could work, although doing it is not straightforward—for technical reasons, there aren’t many instances where you get added precision by doing a convenience survey ‘on top’ of a random sample, although they do exist.
(Unfortunately, random FB sample was small, with something like 80% non-response, thus making it not very helpful to sample sampling deviation from the ‘true’ population. In some sense the subgroup comparisons do provide some of this information by pointing to different sub-populations—what they cannot provide is a measure as to whether these subgroups are being represented proportionally or not. A priori though, that would seem pretty unlikely.)
As David notes, the ‘EA FB group’ is highly unlikely to be a representative sample. But I think it is more plausibly representative along axes we’d be likely to be interested in the survey. I’d guess EAs who are into animal rights are not hugely more likely to be in facebook in contrast to those who are into global poverty, for example (could there be some effects? absolutely—I’d guess FB audience skews young and computer savvy, so maybe folks interested in AI etc. might be more likely to be found there, etc. etc.)
The problem with going to each ‘cluster’ of EAs is that you are effectively sampling parallel rather than orthogonal to your substructure: if you over-sample the young and computer literate, that may not throw off the relative proportions of who lives where or who cares more about poverty than the far future; you’d be much more fearful of this if you oversample a particular EA subculture like LW.
I’d be more inclined to ‘trust’ the proportion data (%age male, %xrisk, %etc) if the survey was ‘just’ of the EA facebook group, either probabilistically or convenience sampled. Naturally, still very far from perfect, and not for all areas (age, for example). (Unfortunately, you cannot just filter the survey and just look at those who clicked through via the FB link to construct this data—there’s plausibly lots of people who clicked through via LW but would have clicked through via FB if there was no LW link, so ignoring all these responses likely inverts anticipated bias).
I agree that could work, although doing it is not straightforward—for technical reasons, there aren’t many instances where you get added precision by doing a convenience survey ‘on top’ of a random sample, although they do exist.
(Unfortunately, random FB sample was small, with something like 80% non-response, thus making it not very helpful to sample sampling deviation from the ‘true’ population. In some sense the subgroup comparisons do provide some of this information by pointing to different sub-populations—what they cannot provide is a measure as to whether these subgroups are being represented proportionally or not. A priori though, that would seem pretty unlikely.)
As David notes, the ‘EA FB group’ is highly unlikely to be a representative sample. But I think it is more plausibly representative along axes we’d be likely to be interested in the survey. I’d guess EAs who are into animal rights are not hugely more likely to be in facebook in contrast to those who are into global poverty, for example (could there be some effects? absolutely—I’d guess FB audience skews young and computer savvy, so maybe folks interested in AI etc. might be more likely to be found there, etc. etc.)
The problem with going to each ‘cluster’ of EAs is that you are effectively sampling parallel rather than orthogonal to your substructure: if you over-sample the young and computer literate, that may not throw off the relative proportions of who lives where or who cares more about poverty than the far future; you’d be much more fearful of this if you oversample a particular EA subculture like LW.
I’d be more inclined to ‘trust’ the proportion data (%age male, %xrisk, %etc) if the survey was ‘just’ of the EA facebook group, either probabilistically or convenience sampled. Naturally, still very far from perfect, and not for all areas (age, for example). (Unfortunately, you cannot just filter the survey and just look at those who clicked through via the FB link to construct this data—there’s plausibly lots of people who clicked through via LW but would have clicked through via FB if there was no LW link, so ignoring all these responses likely inverts anticipated bias).