So as you discuss, this survey suffers from selection bias. At the time, I suggested[1] looking at SlateStarCodex results instead, filtering by self-reported EA affiliation . I can’t find the results for 2021 or 2022, but using results from 2020:
## Helpers
formatAsPercent <- function (float){
return(sprintf("%0.1f%%", float * 100))
}
## Body
data <- read.csv("2020ssc_public.csv", header=TRUE, stringsAsFactors = FALSE)
data_EAs <- data[data["EAID"] == "Yes",]
n=dim(data_EAs)[1]
n ## 993 EAs answered the survey.
mental_illnesses <- colnames(data)[56:68]
cat("Illness, rate diagnosed, rate suspected")
for(x in mental_illnesses){
rate_diagnosed=sum(data_EAs[x] == "I have a formal diagnosis of this condition")/n ## 18%
rate_suspected=sum(data_EAs[x] == "I think I might have this condition, although I have never been formally diagnosed")/n ## 18%
cat(paste(x, formatAsPercent(rate_diagnosed), formatAsPercent(rate_suspected), sep=", "), "\n")
}
Code to replicate this here, source for the data here[2]. There are other things one could look at, like whether self-assessed EAs are more likely to be mentally ill, or whether this is mediated by income, student status, etc.
So as you discuss, this survey suffers from selection bias. At the time, I suggested[1] looking at SlateStarCodex results instead, filtering by self-reported EA affiliation . I can’t find the results for 2021 or 2022, but using results from 2020:
returns
Plotting this:
Code to replicate this here, source for the data here[2]. There are other things one could look at, like whether self-assessed EAs are more likely to be mentally ill, or whether this is mediated by income, student status, etc.
This point was originally suggested by David Moss in the first iteration of the survey; I just remembered.
I downloaded the .xlsx data and converted it to a csv, because the csv data doesn’t seem to be available.