Hey everyone, I’m working on a data science project where I’ll be analyzing and visualizing correlations on any datasets I want to with some ML algorithms. I’ve found Our World in Data but I’m wondering if anyone knows other places to find EA related datasets?
Thanks for your time and help :)
(If you have any specific ideas about what datasets to look into, I’d be really interested in hearing your ideas! Right now I’m thinking about looking at correlations between the aspects of life we associate with well-being across the world with measured well-being. I’d also be especially interested in working on a dataset related to AI safety or animal welfare.)
Hi! I work at OWID – here are a few things that could help you. Their relevance will depend a lot on:
the volume of data that you need
what you mean by EA-related (for example there are many datasets available on global health and wellbeing, but very few on longtermism and x-risks)
IHME’s Global Burden of Disease, used for most disease burden estimates. This is an extremely dense dataset with many dimensions, so if you find the subject relevant, you probably won’t run out of ideas with this one.
World Happiness Report. Not a ton of data here, but some tables that could be interesting to explore.
Crop and livestock data from the FAO
Livestock counts—HYDE & FAO (2017)
Data on CO2 and Greenhouse Gas Emissions by Our World in Data
I’m aggregating and visualising EA datasets on https://www.effectivealtruismdata.com/.
I haven’t yet implemented data download links, but they should be done within a week.
The first thing to do with safety that jumps at me is the correlation of gradients between different uses of scaling laws
Anyone make progress on this? I might start an Airtable with ‘lists of (lists of) EA relevant datasets’ … including the ones below and more
Thanks for your question, Harry! As it happens, I am also looking for a potentially EA-related dataset for a data science project, so I figured I’d bandwagon on.
Thank you for suggesting OWID as a source of datasets. I’ll check that out. I’ll also post back if I find anything responsive to your interests.
Follow-up question: Do you care about how large the data set is? Like are we talking 100′s of KB, many GB, or some other order of magnitude?
That’s great! I have to decide by Thursday, so I’ll let you know what we’re working on :).
Definitely nothing larger than a few gigabytes I would say. I’m pretty new to data science and we’re using pretty simple methods in this project, so I’m guessing we’ll also want to do a relatively simple regression or classification analysis on a relatively simple (and maybe small) dataset.
I recently found a huge public database of databases, and I figured I’d share it here.
A bit stale now, but a good source I found for datasets relevant to EA-adjacent topics (particularly climate change and global health) is https://www.drivendata.org/. They have competitions similar to Kaggle, which I think makes it a pretty ideal source of datasets for people who are new to data science and machine learning. Somebody else posted in a different thread on this forum about DrivenData IIRC.