[Question] Training a GPT model on EA texts: what data?

I plan to finetune GPT-J, a large language model similar to GPT-3 creative by EleutherAI, on effective altruism texts. GPT-J is known to be better at mathematical, logical, and analytic reasoning than GPT-3 due to a large training on academic texts.

The goals are:

  1. Accurately reflect how the EA community thinks

  2. Represent texts widely read in the EA community

  3. Helps the language model think well

My proposed training mix:

What sources am I missing?

Please suggest important blog posts and post series I should add to the training mix, and explain how important to or popular EA they are.

Can you help me estimate how much mindshare each of the items labelled ”??” occupies in a typical EA?

I’m new to EA, so I would strongly appreciate input.