yfu

Karma: 8

Cause Exploration Prize: Modernizing the Social Science Research Infrastructure

yfuAug 12, 2022, 5:42 AM

4 points

0 comments1 min readEA link

yfu Mar 1, 2022, 1:57 AM
6 points
0 ∶ 0
on: The Future Fund’s Project Ideas Competition
A search engine for micro-level data
Macro-level data is easy to find these days. If you want to know the historical GDP of China or carbon emissions of the U.S., you can find the information on many non-profit and for-profit sites via Google.
But suppose you want to quickly look up “people’s satisfaction with their daily lives” and “the amount they spend on food,” you’d have to read dozens of papers, locate the names of the datasets used, find the places where such survey data is hosted (if it’s available at all), create an account on the hosting site, download the data, and check whether the variable matches what you were looking for. The process wastes researchers’ time and stifles novel and cross-disciplinary use of existing data.
I’d like to see/build a search engine that catalogs all variable names and other pieces of meta-data for all datasets that humans have ever created. (Google’s product https://datasetsearch.research.google.com fails to catalog many important datasets and doesn’t allow variable-level search, which I think is the main value proposition of this hypothetical search engine.)
Using this hypothetical search engine, researchers can quickly look up datasets that contain the variable they want, filter by relevant parameters such as age, country, and year of data collection. Lots of academic journals now require authors to make their data public (e.g. https://dataverse.harvard.edu), so we should build on this momentum to further increase the value of open data. Re-use of existing data is very limited because researchers have no tool for discovery. Knowledge of “what data is available on X topic” largely exists in experts’ heads and transmit via word of mouth.
Another reason this search engine should be funded is that it lacks commercial viability: The amount of manual labor doesn’t decrease with scale. The datasets that will be catalogued by this hypothetical search engine take on all sorts of formats, and the codebooks don’t follow a fixed machine-readable template. (I assume large language models won’t be of much help either.) Thus, if we think that such a search engine ought to exist, it would be funded only by philanthropy.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer

yfu

Cause Ex­plo­ra­tion Prize: Mod­ern­iz­ing the So­cial Science Re­search Infrastructure

Cause Exploration Prize: Modernizing the Social Science Research Infrastructure