First, you have to be aware that how some EAs think does not need to be the ultimate most good. So, if you are training a computer, make sure it optimizes for thought development based on fundamental EA principles, ideally including an increasing variety, relevance, quality, and complexness of perspectives, rather than reflecting specific ideas (e. g. distribute bednets) or (if this is not possible) make such software the logically following generation.
Then, there can be a great diversity of thoughts within EA (e. g. Healthier Hens and Aligned AI representatives can think quite differently) and not all (broader EA) community members can have (thought-through) perspectives on different topics in EA. So, rather than the ‘typical’ EA, this would be really ‘the EA community’ thinking, unless you already mean that.
The weighing of perspectives can be the tricky part. I would not go with a simple upvote (or that as a fraction of viewership, or an expression that weights upvote and downvote (fractions of viewership) in some way), because you are not sure why people upvote of downvote a post. It can be that a downvote is because of the post fails to confirm biases (e. g.) that people do not like challenged or that an upvote because a post motivates fear and ‘siding with the author’ due to their allusion to abuse (1, 2). Rather, I would try to observe the positive development and scale up of the EA community (weighted) impact of the post. One idea, for inspiration, is:
upbias (-1, 1) is the Forum editors’ or users’ perspective on the fraction of upvotes that happened due to fear, other negative emotions, or limited critical thinking that the post motivated otherwise
downbias (-1,1) is the fraction of downvotes due to voters’ biases, such as confirmation bias, which the post usually did not motivate (but could have)
ref is the sum of the {upvote-downvote expression} of all posts that mention this post
n is the number of posts in which this post is mentioned
fw() is a weighting function that reaches about 90% of a horizontal asymptote after 6 posts
This makes downvotes more serious (quadratic) than upvotes (linear) and biased upvotes and downvotes very serious (cubic). It adjusts for less views earlier by making views a fraction of quarterly average views. The impact of the post is estimated by the average score expression of the posts weighted in a way that makes little difference if a post is mentioned 6 or 20 times but significant difference if it is mentioned 1 or 2 times.
Biasing toward newer posts: this makes sense in terms of detecting latest thinking but then you have the fundamentals which are not being reposted and should be weighted heavily. So, if you can figure how to differentiate the posts that together express the EA fundamental principles from those that maybe were once popular because thinking was developing or were just randomly upvoted, then that would be great. Would you go with the older posts’ author’s karma, maybe, assuming that if they were less consistent with EA they would not gain as much karma (but then there is again the problem of some authors possibly gaining upvotes because of ‘marketing tricks,’ aggression etc rather than setting EA fundamentals—or, worse yet, interpreting EA fundamentals in a biased manner and going unnoticed).
The main issue with using written records is that a lot of EA material, especially introductory one, can be used to stimulate one’s thinking rather than specify the principles or provide answers. So, the best way to go about detecting the community’s thinking is listening to the perspectives, which may not be written, until saturation at various ‘depths’ of EA understanding. For example, if you attend (or transcribe the audio of) the EA intro fellowships, in-depth fellowships, specialized programs, events, etc then you could gain a more accurate snapshot of people’s thinking. Ideally, you would also analyze the tone of the voice and non-verbal expressions to better estimate the community members’ thinking. Then, it would be great to gather info on the rationale of expressing in a certain way (e. g. people’s thinking about what was going on—some fellowships have feedback forms).
Books, I think better general fundamentals are:
The Expanding Circle (expanding moral circle)
The Scout Mindset/The Big Picture: On the Origins of Life, Meaning and the Universe Itself (both about openmindedness but one for quite young and the other experienced audience)
The Most Good You Can Do (cost-effectiveness and impartiality)
Doing Good Better (only the “five key questions” Appendix because even the examples of EA principles in Chapters 2-6 are random, so could make the software reflect e. g. heath triage in Rwanda as a fundamental part of EA)
In addition, comprehensive introductory/fundamental/impartial cause area specific texts include:
(Global health and development): Poor Economics (cost-effectiveness, intended beneficiaries’ perspective, measurement discussion)
(GCRs) Global Catastrophic Risks (comprehensive neutral description of major GCRs)
(Animal welfare): nothing really describing the complexity and variety of thinking and actions that I can think of but Animal Liberation could be possibly used (although it can be outdated and ‘slow down’ thought development by its context where it not common to think that animals can be treated equally)
For spirit (if the software can read the spirit rather than the letter), I recommend
The Big Picture: On the Origins of Life, Meaning and the Universe Itself (inspires rationality and altruism by talking about quantum physics and related topics)
Moral Tribes (motivates cooperation and morality improvement in decisionmaking)
The Better Angels of our Nature (if the code cannot read spirit, do not at all include this text, which can be in a biased way exaggerated as 500 pages of Medieval Europe torture instruments)
Happiness: Lessons From a New Science (money does not make you happy maybe it is a complex set of external and internal factors)
Latest developments I think/was referred to
Being You: A New Science of Consciousness (2021) (deep thinking and understandable neuroscience of individuals’ subjective experiences)
I would refrain from including The Precipice because it could be interpreted as a way to get attention by allusions to risk and the possibility of temporary effort and an extended period of relaxation in a way that appeals to emotion rather than reason.
Most Important Century Sequence gains attention by allusions to abuse, including based on sexist or racist biases—try it: do you feel like you do not want to be reading but scrolling down maybe accepting due to fear or seeking reassurance that the author did not mean anything in a bad way or trying to gain power over others by gaining exclusive knowledge—or, are you feeling like you are in a cooperative environment where everyone tries to sincerely do the reasonable most good by developing thoughts and projects?
I think that GPT-3, just like other advanced marketing, is optimized to gain attention in such way, e. g. aggressive allusions where the reader feels shame for paying attention so then renarrates that they wanted to pay attention, fears to be abused so rather joins what the piece portrays as abusive (as if this was the environment), or portraying concepts that could be interpreted as abuse but this cannot be easily pinpointed. So, if you use a similar model to interpret texts and include writing such as the Most Important Century Sequence, you can gain abusive interpretation because this will be highlighted. You can, however, gain this regardless of the texts that you include. Note that it can be challenging to pinpoint the nuances in the expressions, especially if these are common in advertisement-heavy (affluent) environments. So, you may consider going though the code and seeing where it can pick up aggressive/fear motivating language (maybe if algorithms optimize for attention) and address it in the API or apply negative discrimination (cooperation and positive feelings writing that people would generally not pay as much attention to).
The EA Handbook can be focused on convincing people to expand their moral circles by different appeals (e. g. big numbers, morality) rather than specify fundamentals and equip people to develop solutions.
I would not include rationality/ACX etc pieces because there can be a disproportionate share of personally valuable (but not necessarily EA-related) posts—EA-related posts from these media get linked onto the EA Forum.
With the OPP database you need to be aware that it closely works with Good Ventures, and may thus, in addition to cost-effective research, look into the organization’s interests. For example, criminal justice reform in the US may not be the most cost-effective way to spend money, for example since they go further abroad or other issues are better solvable with financial resources—or, it may be the most cost-effective since a sound system where consequences are learning opportunities has to be developed somewhere so that it can be scaled up. So, conduct any weighting etc of grants keeping this in mind. Maybe, even examining the extent to which different grants can reflect different EA fundamentals, under different reading parameters—after thinking extensively about the reputational risk of your org.
Disclosed EA Funds can quite accurately reflect CEA’s current (public) preferences. Also, Founders Pledge donations/research, GWWC donations, Future Fund, …) grants can be considered.
I would go over the CEA website. Different EA groups’ pages can be great for idea synthesis. Event descriptions (on EA Forum or groups’ pages) can show the ongoing dialogues. You can also go over the EA-related orgs websites (some listed here) to read the scope of the focus.
The goal is not to create a model to create the most good. While aligning an AI with values and principles could be a potentially interesting project, the goal of this project is to create a descriptive model of the EA community, not a normative one of the idealized EA community.
I believe GPT-3 can do more than memorizing specific objectives like malaria nets. Infusing principles deeply would need to happen using more sophisticated techniques, probably post-finetuning.
upbias (-1, 1) is the Forum editors’ or users’ perspective on the fraction of upvotes that happened due to fear, other negative emotions, or limited critical thinking that the post motivated otherwise
How do I calculate upbias?
Thank you for the books to use in the dataset. I will review each of them.
The original GPT-3 was trained largely on a web crawl known as Common Crawl. Users on the internet, especially tend to optimize for attention. Unlike GPT-3, GPT-J’s training set is around a third academic sources.
SSC blog includes posts like Meditations on Moloch or the review of Seeing Like a State. These seem like perspectives important to the EA community. Are you suggesting I include posts based on if they’re linked from the EA Forum frequently?
I’ll try to crawl the EA Funds’ grant program as well.
create a descriptive model of the EA community, not a normative one of the idealized EA community.
Ok.
How do I calculate upbias?
Average of the values estimated by editors and users familiar with emotional reasoning/marketing tricks or hosting a focus group discussion and agreeing on a number (using human intelligence to calibrate and weigh participants’ estimates based on their arguments and relevant skills presentation).
Thanks for reviewing the books. In case you are interested I made reading questions for 5 of them.
GPT-3/J: I see. So, the 2⁄3 reduce critical reasoning by attention-captivating tricks using the legitimacy presentation of the 1⁄3 academic sources ah hah (can be read as an exaggeration). The inclusion of academic sources also makes arguing against bias less thinkable (due to a ‘respectful/less questioning’ approach of academics’ claims and trust in their neutrality and comprehensive coverage of important topics—this makes me think—is the academic text selected based on if it is a ‘conversation ender,’ including based on biased norm perpetuation, rather than an invitation for an inclusive solutions-oriented discourse about topics that concern especially disadvantaged groups?). However, it can be a positive step toward GPT-n, which uses 50% academic sources (international), 15% investigative journalism, 10% non-western newspapers and the UN website with its links, and 5% impact investors’ sites, NGO sites, and anything nodal in rationality thinking.
Also, I must be biased about the GPT-J name stepping up aggression or threat (the category paying attention and renarrating it’s cool). I mean it’s possibly just a bias don’t worry about it.
Hmmm .. that is a great question—I have not reviewed the SSC or similar websites in detail but would imagine that the posts get people start thinking about EA-related topics (rather than being for those already up to speed). It can make sense that a post which only hints on some EA topics would not get on the EA Forum (or not be highly upvoted), however, it is also possible that these posts talk about important EA-related topics but are just not linked (such as Beware Systemic Change). Sure, the frequency of linking (e. g. Beware of Systemic change seems popular) can work for external pieces that are not linked or summarized as posts. Even though the Meditations on Moloch and Seeing Like a State summaries can seem as more on the ‘starting to think about EA’ side, they are also linked on the Forum, so maybe the current thinking in EA includes a range of viewpoints based on different experiences with EA.
Cool cool. So simple to just read everything at once …
First, you have to be aware that how some EAs think does not need to be the ultimate most good. So, if you are training a computer, make sure it optimizes for thought development based on fundamental EA principles, ideally including an increasing variety, relevance, quality, and complexness of perspectives, rather than reflecting specific ideas (e. g. distribute bednets) or (if this is not possible) make such software the logically following generation.
Then, there can be a great diversity of thoughts within EA (e. g. Healthier Hens and Aligned AI representatives can think quite differently) and not all (broader EA) community members can have (thought-through) perspectives on different topics in EA. So, rather than the ‘typical’ EA, this would be really ‘the EA community’ thinking, unless you already mean that.
The weighing of perspectives can be the tricky part. I would not go with a simple upvote (or that as a fraction of viewership, or an expression that weights upvote and downvote (fractions of viewership) in some way), because you are not sure why people upvote of downvote a post. It can be that a downvote is because of the post fails to confirm biases (e. g.) that people do not like challenged or that an upvote because a post motivates fear and ‘siding with the author’ due to their allusion to abuse (1, 2). Rather, I would try to observe the positive development and scale up of the EA community (weighted) impact of the post. One idea, for inspiration, is:
{[upvote/views-upbias^(1/3)]-[downvote/views-downbias^(1/3)]^2}
*postviews/quarteraverageviews
+fw(ref/n)
where
upbias (-1, 1) is the Forum editors’ or users’ perspective on the fraction of upvotes that happened due to fear, other negative emotions, or limited critical thinking that the post motivated otherwise
downbias (-1,1) is the fraction of downvotes due to voters’ biases, such as confirmation bias, which the post usually did not motivate (but could have)
ref is the sum of the {upvote-downvote expression} of all posts that mention this post
n is the number of posts in which this post is mentioned
fw() is a weighting function that reaches about 90% of a horizontal asymptote after 6 posts
This makes downvotes more serious (quadratic) than upvotes (linear) and biased upvotes and downvotes very serious (cubic). It adjusts for less views earlier by making views a fraction of quarterly average views. The impact of the post is estimated by the average score expression of the posts weighted in a way that makes little difference if a post is mentioned 6 or 20 times but significant difference if it is mentioned 1 or 2 times.
Biasing toward newer posts: this makes sense in terms of detecting latest thinking but then you have the fundamentals which are not being reposted and should be weighted heavily. So, if you can figure how to differentiate the posts that together express the EA fundamental principles from those that maybe were once popular because thinking was developing or were just randomly upvoted, then that would be great. Would you go with the older posts’ author’s karma, maybe, assuming that if they were less consistent with EA they would not gain as much karma (but then there is again the problem of some authors possibly gaining upvotes because of ‘marketing tricks,’ aggression etc rather than setting EA fundamentals—or, worse yet, interpreting EA fundamentals in a biased manner and going unnoticed).
The main issue with using written records is that a lot of EA material, especially introductory one, can be used to stimulate one’s thinking rather than specify the principles or provide answers. So, the best way to go about detecting the community’s thinking is listening to the perspectives, which may not be written, until saturation at various ‘depths’ of EA understanding. For example, if you attend (or transcribe the audio of) the EA intro fellowships, in-depth fellowships, specialized programs, events, etc then you could gain a more accurate snapshot of people’s thinking. Ideally, you would also analyze the tone of the voice and non-verbal expressions to better estimate the community members’ thinking. Then, it would be great to gather info on the rationale of expressing in a certain way (e. g. people’s thinking about what was going on—some fellowships have feedback forms).
Books, I think better general fundamentals are:
The Expanding Circle (expanding moral circle)
The Scout Mindset/The Big Picture: On the Origins of Life, Meaning and the Universe Itself (both about openmindedness but one for quite young and the other experienced audience)
The Most Good You Can Do (cost-effectiveness and impartiality)
Doing Good Better (only the “five key questions” Appendix because even the examples of EA principles in Chapters 2-6 are random, so could make the software reflect e. g. heath triage in Rwanda as a fundamental part of EA)
In addition, comprehensive introductory/fundamental/impartial cause area specific texts include:
(Global health and development): Poor Economics (cost-effectiveness, intended beneficiaries’ perspective, measurement discussion)
(GCRs) Global Catastrophic Risks (comprehensive neutral description of major GCRs)
(Animal welfare): nothing really describing the complexity and variety of thinking and actions that I can think of but Animal Liberation could be possibly used (although it can be outdated and ‘slow down’ thought development by its context where it not common to think that animals can be treated equally)
For spirit (if the software can read the spirit rather than the letter), I recommend
The Big Picture: On the Origins of Life, Meaning and the Universe Itself (inspires rationality and altruism by talking about quantum physics and related topics)
Moral Tribes (motivates cooperation and morality improvement in decisionmaking)
The Better Angels of our Nature (if the code cannot read spirit, do not at all include this text, which can be in a biased way exaggerated as 500 pages of Medieval Europe torture instruments)
Happiness: Lessons From a New Science (money does not make you happy maybe it is a complex set of external and internal factors)
Latest developments I think/was referred to
Being You: A New Science of Consciousness (2021) (deep thinking and understandable neuroscience of individuals’ subjective experiences)
I would refrain from including The Precipice because it could be interpreted as a way to get attention by allusions to risk and the possibility of temporary effort and an extended period of relaxation in a way that appeals to emotion rather than reason.
The 80k guide can be specific to a certain audience. To balance it somewhat out (although still audiences can be specific), I would also include Animal Advocacy Careers resources, the Probably Good Guide, and CE resources, and AI Safety Support links.
Most Important Century Sequence gains attention by allusions to abuse, including based on sexist or racist biases—try it: do you feel like you do not want to be reading but scrolling down maybe accepting due to fear or seeking reassurance that the author did not mean anything in a bad way or trying to gain power over others by gaining exclusive knowledge—or, are you feeling like you are in a cooperative environment where everyone tries to sincerely do the reasonable most good by developing thoughts and projects?
I think that GPT-3, just like other advanced marketing, is optimized to gain attention in such way, e. g. aggressive allusions where the reader feels shame for paying attention so then renarrates that they wanted to pay attention, fears to be abused so rather joins what the piece portrays as abusive (as if this was the environment), or portraying concepts that could be interpreted as abuse but this cannot be easily pinpointed. So, if you use a similar model to interpret texts and include writing such as the Most Important Century Sequence, you can gain abusive interpretation because this will be highlighted. You can, however, gain this regardless of the texts that you include. Note that it can be challenging to pinpoint the nuances in the expressions, especially if these are common in advertisement-heavy (affluent) environments. So, you may consider going though the code and seeing where it can pick up aggressive/fear motivating language (maybe if algorithms optimize for attention) and address it in the API or apply negative discrimination (cooperation and positive feelings writing that people would generally not pay as much attention to).
The EA Handbook can be focused on convincing people to expand their moral circles by different appeals (e. g. big numbers, morality) rather than specify fundamentals and equip people to develop solutions.
I would include the definitions of EA (Effective Altruism—Introduction, The Definition of Effective Altruism), weighing the newer one a bit more.
I would not include rationality/ACX etc pieces because there can be a disproportionate share of personally valuable (but not necessarily EA-related) posts—EA-related posts from these media get linked onto the EA Forum.
With the OPP database you need to be aware that it closely works with Good Ventures, and may thus, in addition to cost-effective research, look into the organization’s interests. For example, criminal justice reform in the US may not be the most cost-effective way to spend money, for example since they go further abroad or other issues are better solvable with financial resources—or, it may be the most cost-effective since a sound system where consequences are learning opportunities has to be developed somewhere so that it can be scaled up. So, conduct any weighting etc of grants keeping this in mind. Maybe, even examining the extent to which different grants can reflect different EA fundamentals, under different reading parameters—after thinking extensively about the reputational risk of your org.
Disclosed EA Funds can quite accurately reflect CEA’s current (public) preferences. Also, Founders Pledge donations/research, GWWC donations, Future Fund, …) grants can be considered.
I would go over the CEA website. Different EA groups’ pages can be great for idea synthesis. Event descriptions (on EA Forum or groups’ pages) can show the ongoing dialogues. You can also go over the EA-related orgs websites (some listed here) to read the scope of the focus.
The goal is not to create a model to create the most good. While aligning an AI with values and principles could be a potentially interesting project, the goal of this project is to create a descriptive model of the EA community, not a normative one of the idealized EA community.
I believe GPT-3 can do more than memorizing specific objectives like malaria nets. Infusing principles deeply would need to happen using more sophisticated techniques, probably post-finetuning.
How do I calculate upbias?
Thank you for the books to use in the dataset. I will review each of them.
The original GPT-3 was trained largely on a web crawl known as Common Crawl. Users on the internet, especially tend to optimize for attention. Unlike GPT-3, GPT-J’s training set is around a third academic sources.
SSC blog includes posts like Meditations on Moloch or the review of Seeing Like a State. These seem like perspectives important to the EA community. Are you suggesting I include posts based on if they’re linked from the EA Forum frequently?
I’ll try to crawl the EA Funds’ grant program as well.
Ok.
Average of the values estimated by editors and users familiar with emotional reasoning/marketing tricks or hosting a focus group discussion and agreeing on a number (using human intelligence to calibrate and weigh participants’ estimates based on their arguments and relevant skills presentation).
Thanks for reviewing the books. In case you are interested I made reading questions for 5 of them.
GPT-3/J: I see. So, the 2⁄3 reduce critical reasoning by attention-captivating tricks using the legitimacy presentation of the 1⁄3 academic sources ah hah (can be read as an exaggeration). The inclusion of academic sources also makes arguing against bias less thinkable (due to a ‘respectful/less questioning’ approach of academics’ claims and trust in their neutrality and comprehensive coverage of important topics—this makes me think—is the academic text selected based on if it is a ‘conversation ender,’ including based on biased norm perpetuation, rather than an invitation for an inclusive solutions-oriented discourse about topics that concern especially disadvantaged groups?). However, it can be a positive step toward GPT-n, which uses 50% academic sources (international), 15% investigative journalism, 10% non-western newspapers and the UN website with its links, and 5% impact investors’ sites, NGO sites, and anything nodal in rationality thinking.
Also, I must be biased about the GPT-J name stepping up aggression or threat (the category paying attention and renarrating it’s cool). I mean it’s possibly just a bias don’t worry about it.
Hmmm .. that is a great question—I have not reviewed the SSC or similar websites in detail but would imagine that the posts get people start thinking about EA-related topics (rather than being for those already up to speed). It can make sense that a post which only hints on some EA topics would not get on the EA Forum (or not be highly upvoted), however, it is also possible that these posts talk about important EA-related topics but are just not linked (such as Beware Systemic Change). Sure, the frequency of linking (e. g. Beware of Systemic change seems popular) can work for external pieces that are not linked or summarized as posts. Even though the Meditations on Moloch and Seeing Like a State summaries can seem as more on the ‘starting to think about EA’ side, they are also linked on the Forum, so maybe the current thinking in EA includes a range of viewpoints based on different experiences with EA.
Cool cool. So simple to just read everything at once …
Thanks for your thoughts.