I plan to finetune GPT-J, a large language model similar to GPT-3 creative by EleutherAI, on effective altruism texts. GPT-J is known to be better at mathematical, logical, and analytic reasoning than GPT-3 due to a large training on academic texts.
The goals are:
Accurately reflect how the EA community thinks
Represent texts widely read in the EA community
Helps the language model think well
My proposed training mix:
60% EA Forum posts above a certain karma threshold
Bias towards newer posts according to a ?? curve
Weight the likelihood of inclusion of each post by a function of its karma (how does that map to views?
Books (3.3MB)
The Alignment Problem (1MB)
The Precipice (0.9MB)
Doing Good Better (0.5MB)
The Scout Mindset (0.5MB)
80,000 Hours (0.4KB)
Articles and blog posts on EA
Replacing Guilt Sequence (h/t Lorenzo)
… what else?
EA Forum Topic Descriptions (h/t Lorenzo)
OpenPhilanthropy.org (h/t Lorenzo)
GivingWhatWeCan.org (h/t Lorenzo)
including comments
??% Rationalism
??% Overcoming Bias
??% Slate Star Codex
??% HPMOR
What sources am I missing?
Please suggest important blog posts and post series I should add to the training mix, and explain how important to or popular EA they are.
Can you help me estimate how much mindshare each of the items labelled ”??” occupies in a typical EA?
I’m new to EA, so I would strongly appreciate input.
Some other resources that come to mind, not sure if they would all be useful and I’m probably forgetting tons:
- https://forum.effectivealtruism.org/library
—https://forum.effectivealtruism.org/topics/all
—https://blog.givewell.org/ (maybe including comments)
- Besides the blog, there’s lots of other great stuff and links to documents around GiveWell website, random samples: https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/comparing-moral-weights
https://docs.google.com/document/d/1ZKq-MNU-xtn_48uN33L6VvBEZRAduvjwWMeaEffL4K4
https://docs.google.com/document/d/1Jwe0PzDhCIIE3ymH_1Ct8btAaQ8_C_wXl23AxDgZi9M
—https://www.givingwhatwecan.org/ (the blog but also other pages e.g.)
- https://80000hours.org/all-articles/
- https://www.openphilanthropy.org/ has much more content than the grants database
—https://www.lesswrong.com/tag/effective-altruism
Posts/comments in Facebook groups, slack groups, and discord groups?
Thanks for these sources.
How should GiveWell blog and 80,000 hours blog weighted against each other? My instinct is to weight by the number of views.
Does the EA community have the norm that these comments are public? I want to make sure the consent of participants is obtained.
That’s a very good point and I think it’s definitely not the norm, didn’t think about text potentially getting leaked from the training set.
What do you mean against each other? Do you mean compared to everything else, including the forum posts/comments?
I have no idea, I think the number of views might lead to a better representation of the wider community, while the more technical posts might be more representative of the more “professional” parts of the movement.
How much % of the training mix should be the GiveWell blog and how much should be the 80,000 hours blog? In other words, how many bytes of blog posts should be used from each, relative to the entire dataset?
What kinds of posts are on each blog, and which best reflects the wider EA community, and which reflects the professional EA community? How can this be used to create a dataset?
I also checked and neither blog has a direct view count measure—some other proxy metric would need to be used.
Hmmm. You’re focused on the input text.
Maybe, like, it seems like you want to focus on the “output” instead (and define some metric[1] relative to this output and the “your targeted performance of the model”) ?
In contrast to focusing on the output, focusing on the mix of input data seems different.
For example, it’s not clear that a pass with a batch of GiveWell content, will shift GPT-3 more or less vs a same size batch of 80k content. It’s not clear that the input length of text would be a good measure, versus something like “perplexity of the fine tune text to the current GPT-3 output”. I haven’t trained a GPT-3 model though so I’m not sure.
Although, in some sense, it’s really hard/crazy to think about what this metric would be, besides something trivial like perplexity. Maybe this difficulty is what you want to avoid?
I honestly really don’t know :/
I know it doesn’t help you, but I would expect both blogs (and all the other stuff on the websites that’s not in the blogs) to have some content aimed at a wider audience and some content that goes more into depth for a narrower audience.
EA Forum Wiki
Works by Peter Singer
https://globalprioritiesinstitute.org/
https://rethinkpriorities.org/research (although this overlaps forum a bit)
I’m still using ChatGPT here https://chatgptpl.com/, and I want to ask which version of GPT is it using?
I just scraped the EA Forum for you. Contains metadata too: authors, score, votes, date_published, text (post contents), comments.
Here’s a link: https://drive.google.com/file/d/1XA71s2K4j89_N2x4EbTdVYANJ7X3P4ow/view?usp=drivesdk
Good luck.
Note: We just released a big dataset of AI alignment texts. If you’d like to learn more about it, check out our post here: https://www.lesswrong.com/posts/FgjcHiWvADgsocE34/a-descriptive-not-prescriptive-overview-of-current-ai
Thanks.
Depending on exactly what you’re trying to achieve (e.g. superficial vs idealized modelling) maybe consider more foundational academic sources? E.g.
All of Peter Singer’s books, Sidgwick’s Methods of Ethics, and Parfit’s Reasons and Persons.
utilitarianism.net
All the publications and working papers that have come out of GPI, FHI, etc.
Effective Altruism: Philosophical Issues (ed. Hilary Greaves & Theron Pummer)
A book on ethics seems worth considering. Can you tell me more about how the ideas relate to EA? Nonetheless, these are useful sources for future projects regarding AI alignment.
Is utilitarianism.net only about utilitarianism? If so, the rest of the training set should already have a sufficient degree of utilitarian bias.
How influential is FHI’s texts on the EA community?
This seems like a good text to make the model more generally coherent.
First, you have to be aware that how some EAs think does not need to be the ultimate most good. So, if you are training a computer, make sure it optimizes for thought development based on fundamental EA principles, ideally including an increasing variety, relevance, quality, and complexness of perspectives, rather than reflecting specific ideas (e. g. distribute bednets) or (if this is not possible) make such software the logically following generation.
Then, there can be a great diversity of thoughts within EA (e. g. Healthier Hens and Aligned AI representatives can think quite differently) and not all (broader EA) community members can have (thought-through) perspectives on different topics in EA. So, rather than the ‘typical’ EA, this would be really ‘the EA community’ thinking, unless you already mean that.
The weighing of perspectives can be the tricky part. I would not go with a simple upvote (or that as a fraction of viewership, or an expression that weights upvote and downvote (fractions of viewership) in some way), because you are not sure why people upvote of downvote a post. It can be that a downvote is because of the post fails to confirm biases (e. g.) that people do not like challenged or that an upvote because a post motivates fear and ‘siding with the author’ due to their allusion to abuse (1, 2). Rather, I would try to observe the positive development and scale up of the EA community (weighted) impact of the post. One idea, for inspiration, is:
{[upvote/views-upbias^(1/3)]-[downvote/views-downbias^(1/3)]^2}
*postviews/quarteraverageviews
+fw(ref/n)
where
upbias (-1, 1) is the Forum editors’ or users’ perspective on the fraction of upvotes that happened due to fear, other negative emotions, or limited critical thinking that the post motivated otherwise
downbias (-1,1) is the fraction of downvotes due to voters’ biases, such as confirmation bias, which the post usually did not motivate (but could have)
ref is the sum of the {upvote-downvote expression} of all posts that mention this post
n is the number of posts in which this post is mentioned
fw() is a weighting function that reaches about 90% of a horizontal asymptote after 6 posts
This makes downvotes more serious (quadratic) than upvotes (linear) and biased upvotes and downvotes very serious (cubic). It adjusts for less views earlier by making views a fraction of quarterly average views. The impact of the post is estimated by the average score expression of the posts weighted in a way that makes little difference if a post is mentioned 6 or 20 times but significant difference if it is mentioned 1 or 2 times.
Biasing toward newer posts: this makes sense in terms of detecting latest thinking but then you have the fundamentals which are not being reposted and should be weighted heavily. So, if you can figure how to differentiate the posts that together express the EA fundamental principles from those that maybe were once popular because thinking was developing or were just randomly upvoted, then that would be great. Would you go with the older posts’ author’s karma, maybe, assuming that if they were less consistent with EA they would not gain as much karma (but then there is again the problem of some authors possibly gaining upvotes because of ‘marketing tricks,’ aggression etc rather than setting EA fundamentals—or, worse yet, interpreting EA fundamentals in a biased manner and going unnoticed).
The main issue with using written records is that a lot of EA material, especially introductory one, can be used to stimulate one’s thinking rather than specify the principles or provide answers. So, the best way to go about detecting the community’s thinking is listening to the perspectives, which may not be written, until saturation at various ‘depths’ of EA understanding. For example, if you attend (or transcribe the audio of) the EA intro fellowships, in-depth fellowships, specialized programs, events, etc then you could gain a more accurate snapshot of people’s thinking. Ideally, you would also analyze the tone of the voice and non-verbal expressions to better estimate the community members’ thinking. Then, it would be great to gather info on the rationale of expressing in a certain way (e. g. people’s thinking about what was going on—some fellowships have feedback forms).
Books, I think better general fundamentals are:
The Expanding Circle (expanding moral circle)
The Scout Mindset/The Big Picture: On the Origins of Life, Meaning and the Universe Itself (both about openmindedness but one for quite young and the other experienced audience)
The Most Good You Can Do (cost-effectiveness and impartiality)
Doing Good Better (only the “five key questions” Appendix because even the examples of EA principles in Chapters 2-6 are random, so could make the software reflect e. g. heath triage in Rwanda as a fundamental part of EA)
In addition, comprehensive introductory/fundamental/impartial cause area specific texts include:
(Global health and development): Poor Economics (cost-effectiveness, intended beneficiaries’ perspective, measurement discussion)
(GCRs) Global Catastrophic Risks (comprehensive neutral description of major GCRs)
(Animal welfare): nothing really describing the complexity and variety of thinking and actions that I can think of but Animal Liberation could be possibly used (although it can be outdated and ‘slow down’ thought development by its context where it not common to think that animals can be treated equally)
For spirit (if the software can read the spirit rather than the letter), I recommend
The Big Picture: On the Origins of Life, Meaning and the Universe Itself (inspires rationality and altruism by talking about quantum physics and related topics)
Moral Tribes (motivates cooperation and morality improvement in decisionmaking)
The Better Angels of our Nature (if the code cannot read spirit, do not at all include this text, which can be in a biased way exaggerated as 500 pages of Medieval Europe torture instruments)
Happiness: Lessons From a New Science (money does not make you happy maybe it is a complex set of external and internal factors)
Latest developments I think/was referred to
Being You: A New Science of Consciousness (2021) (deep thinking and understandable neuroscience of individuals’ subjective experiences)
I would refrain from including The Precipice because it could be interpreted as a way to get attention by allusions to risk and the possibility of temporary effort and an extended period of relaxation in a way that appeals to emotion rather than reason.
The 80k guide can be specific to a certain audience. To balance it somewhat out (although still audiences can be specific), I would also include Animal Advocacy Careers resources, the Probably Good Guide, and CE resources, and AI Safety Support links.
Most Important Century Sequence gains attention by allusions to abuse, including based on sexist or racist biases—try it: do you feel like you do not want to be reading but scrolling down maybe accepting due to fear or seeking reassurance that the author did not mean anything in a bad way or trying to gain power over others by gaining exclusive knowledge—or, are you feeling like you are in a cooperative environment where everyone tries to sincerely do the reasonable most good by developing thoughts and projects?
I think that GPT-3, just like other advanced marketing, is optimized to gain attention in such way, e. g. aggressive allusions where the reader feels shame for paying attention so then renarrates that they wanted to pay attention, fears to be abused so rather joins what the piece portrays as abusive (as if this was the environment), or portraying concepts that could be interpreted as abuse but this cannot be easily pinpointed. So, if you use a similar model to interpret texts and include writing such as the Most Important Century Sequence, you can gain abusive interpretation because this will be highlighted. You can, however, gain this regardless of the texts that you include. Note that it can be challenging to pinpoint the nuances in the expressions, especially if these are common in advertisement-heavy (affluent) environments. So, you may consider going though the code and seeing where it can pick up aggressive/fear motivating language (maybe if algorithms optimize for attention) and address it in the API or apply negative discrimination (cooperation and positive feelings writing that people would generally not pay as much attention to).
The EA Handbook can be focused on convincing people to expand their moral circles by different appeals (e. g. big numbers, morality) rather than specify fundamentals and equip people to develop solutions.
I would include the definitions of EA (Effective Altruism—Introduction, The Definition of Effective Altruism), weighing the newer one a bit more.
I would not include rationality/ACX etc pieces because there can be a disproportionate share of personally valuable (but not necessarily EA-related) posts—EA-related posts from these media get linked onto the EA Forum.
With the OPP database you need to be aware that it closely works with Good Ventures, and may thus, in addition to cost-effective research, look into the organization’s interests. For example, criminal justice reform in the US may not be the most cost-effective way to spend money, for example since they go further abroad or other issues are better solvable with financial resources—or, it may be the most cost-effective since a sound system where consequences are learning opportunities has to be developed somewhere so that it can be scaled up. So, conduct any weighting etc of grants keeping this in mind. Maybe, even examining the extent to which different grants can reflect different EA fundamentals, under different reading parameters—after thinking extensively about the reputational risk of your org.
Disclosed EA Funds can quite accurately reflect CEA’s current (public) preferences. Also, Founders Pledge donations/research, GWWC donations, Future Fund, …) grants can be considered.
I would go over the CEA website. Different EA groups’ pages can be great for idea synthesis. Event descriptions (on EA Forum or groups’ pages) can show the ongoing dialogues. You can also go over the EA-related orgs websites (some listed here) to read the scope of the focus.
The goal is not to create a model to create the most good. While aligning an AI with values and principles could be a potentially interesting project, the goal of this project is to create a descriptive model of the EA community, not a normative one of the idealized EA community.
I believe GPT-3 can do more than memorizing specific objectives like malaria nets. Infusing principles deeply would need to happen using more sophisticated techniques, probably post-finetuning.
How do I calculate upbias?
Thank you for the books to use in the dataset. I will review each of them.
The original GPT-3 was trained largely on a web crawl known as Common Crawl. Users on the internet, especially tend to optimize for attention. Unlike GPT-3, GPT-J’s training set is around a third academic sources.
SSC blog includes posts like Meditations on Moloch or the review of Seeing Like a State. These seem like perspectives important to the EA community. Are you suggesting I include posts based on if they’re linked from the EA Forum frequently?
I’ll try to crawl the EA Funds’ grant program as well.
Ok.
Average of the values estimated by editors and users familiar with emotional reasoning/marketing tricks or hosting a focus group discussion and agreeing on a number (using human intelligence to calibrate and weigh participants’ estimates based on their arguments and relevant skills presentation).
Thanks for reviewing the books. In case you are interested I made reading questions for 5 of them.
GPT-3/J: I see. So, the 2⁄3 reduce critical reasoning by attention-captivating tricks using the legitimacy presentation of the 1⁄3 academic sources ah hah (can be read as an exaggeration). The inclusion of academic sources also makes arguing against bias less thinkable (due to a ‘respectful/less questioning’ approach of academics’ claims and trust in their neutrality and comprehensive coverage of important topics—this makes me think—is the academic text selected based on if it is a ‘conversation ender,’ including based on biased norm perpetuation, rather than an invitation for an inclusive solutions-oriented discourse about topics that concern especially disadvantaged groups?). However, it can be a positive step toward GPT-n, which uses 50% academic sources (international), 15% investigative journalism, 10% non-western newspapers and the UN website with its links, and 5% impact investors’ sites, NGO sites, and anything nodal in rationality thinking.
Also, I must be biased about the GPT-J name stepping up aggression or threat (the category paying attention and renarrating it’s cool). I mean it’s possibly just a bias don’t worry about it.
Hmmm .. that is a great question—I have not reviewed the SSC or similar websites in detail but would imagine that the posts get people start thinking about EA-related topics (rather than being for those already up to speed). It can make sense that a post which only hints on some EA topics would not get on the EA Forum (or not be highly upvoted), however, it is also possible that these posts talk about important EA-related topics but are just not linked (such as Beware Systemic Change). Sure, the frequency of linking (e. g. Beware of Systemic change seems popular) can work for external pieces that are not linked or summarized as posts. Even though the Meditations on Moloch and Seeing Like a State summaries can seem as more on the ‘starting to think about EA’ side, they are also linked on the Forum, so maybe the current thinking in EA includes a range of viewpoints based on different experiences with EA.
Cool cool. So simple to just read everything at once …
Thanks for your thoughts.