Roughly speaking, there seem to be two main benefits and two main costs to making an anonymised dataset public. The main costs: i) time and ii) people being turned off of the EA Survey due to believing that their data will be available and identifiable. The main benefits: iii) the community being able to access information (which isn’t included in our public reports) and iv) transparency and validation from people being able to replicate our results.
Unfortunately, the dataset is so heavily anonymised in order to try to reduce cost (ii) (while simultaneously increasing cost (i)), that it seems impossible for people to replicate many of our analyses (even with the public dataset), because the data is so heavily obscured, essentially vitiating (iv). We have considered, and are considering, other options like producing a simulated dataset for future surveys in order to allow people to complete their own analyses, if there were sufficient demand, but this would come at an even higher time cost. Conversely, it seems benefit (iii) can be attained, in the main, without releasing a public dataset, just by producing additional aggregate analyses on request (where possible).
Of course, we’ll see how this system works this year and may revisit it in the future.
Thanks!
Roughly speaking, there seem to be two main benefits and two main costs to making an anonymised dataset public. The main costs: i) time and ii) people being turned off of the EA Survey due to believing that their data will be available and identifiable. The main benefits: iii) the community being able to access information (which isn’t included in our public reports) and iv) transparency and validation from people being able to replicate our results.
Unfortunately, the dataset is so heavily anonymised in order to try to reduce cost (ii) (while simultaneously increasing cost (i)), that it seems impossible for people to replicate many of our analyses (even with the public dataset), because the data is so heavily obscured, essentially vitiating (iv). We have considered, and are considering, other options like producing a simulated dataset for future surveys in order to allow people to complete their own analyses, if there were sufficient demand, but this would come at an even higher time cost. Conversely, it seems benefit (iii) can be attained, in the main, without releasing a public dataset, just by producing additional aggregate analyses on request (where possible).
Of course, we’ll see how this system works this year and may revisit it in the future.