I used a model I fine-tuned to generate takes on Effective Altruism. The prompt is “effective altruism is.” Here are its first three:
effective altruism is vampirism, except instead of sucking blood you suck hours and happiness from helping people who would otherwise have spent the time improving their lives.
effective altruism is parasitic. it latches onto the success of actual altruism, which is genuine and humanizing, to justify its cold calculations and make them feel virtuous too.
effective altruism is rich kid hobbyism pretending to be a moral imperative
I’m somewhat concerned about the use of AI models to [generate propaganda? conduct information warfare?]. Here, the concern is this could be used to salt the earth by poisoning the perceived vibe to make certain demographics dislike EA before they can engage with it deeply.
I find it important to note the model was not designed to be harmful. It was finetuned to generate self-deprecating humor. Nevertheless, amplifying that capability seems to also amplify the capability to criticize EA.
I’m interested in what mitigations people have in mind. One way could be at the epistemic level: To teach people to engage kindly with new ideas.
I have some moderately useful comments if you’re interested.
Some basic questions: Are you running this on GPT-NeoX-20B? If so, how are you rolling this? Are you getting technical support of some kind for training? Are you hand selecting and cleaning the data yourself?
I used a model I fine-tuned to generate takes on Effective Altruism.
was unclear. It should be:
I used a model that I fine-tuned, in order to generate takes on Effective Altruism.
This model was not fine-tuned specifically for Effective Altruism. It was developed to explore the effects of training language models on a twitter account. I became surprised and concerned when I noticed it was able to generate remarkable takes regarding effective altruism, despite not being present in the original dataset. Furthermore, these takes are always criticism.
This particular model is fine-tuned OpenAI davinci. I plan to fine-tune GPT-EA on GPT-NeoX-20B. A predecessor to GPT-EA (GPT-EA-Forum) was trained using a third-party API. I want to train GPT-EA on a cloud platform so I can download a copy of the weights myself. I am not receiving technical support (or funding for GPU costs), it could be helpful. The dataset was selected and cleaned by myself, with input from community members, though I’m still looking for community input.
I think this comment is confusing and unseemly, as it might be supporting the negative content in the parent comment. This is a thread for newcomers, making this concern greater.
I want to downvote this comment to invisibility or report this to a moderator, but the comment is made by a moderator, and he has two cats, presumably trained to attack his enemies.
To be clear, I found these generated examples clearly false and misleading but still something that I can understand people who feel like this, so I found it a bit funny and uncanny, while I totally agree with JoyOptimizer’s concern.
has two cats, presumably trained to attack his enemies.
They are only trained to be maximally adorable and cuddle my enemies to a fluffy friendship!
(And generally, you or anyone else should feel free to report comments from moderators if you find them damaging the forum’s discussion. It may well happen, and it’s important to get that kind of feedback)
I used a model I fine-tuned to generate takes on Effective Altruism. The prompt is “effective altruism is.” Here are its first three:
I’m somewhat concerned about the use of AI models to [generate propaganda? conduct information warfare?]. Here, the concern is this could be used to salt the earth by poisoning the perceived vibe to make certain demographics dislike EA before they can engage with it deeply.
I find it important to note the model was not designed to be harmful. It was finetuned to generate self-deprecating humor. Nevertheless, amplifying that capability seems to also amplify the capability to criticize EA.
I’m interested in what mitigations people have in mind. One way could be at the epistemic level: To teach people to engage kindly with new ideas.
I have some moderately useful comments if you’re interested.
Some basic questions: Are you running this on GPT-NeoX-20B? If so, how are you rolling this? Are you getting technical support of some kind for training? Are you hand selecting and cleaning the data yourself?
was unclear. It should be:
This model was not fine-tuned specifically for Effective Altruism. It was developed to explore the effects of training language models on a twitter account. I became surprised and concerned when I noticed it was able to generate remarkable takes regarding effective altruism, despite not being present in the original dataset. Furthermore, these takes are always criticism.
This particular model is fine-tuned OpenAI davinci. I plan to fine-tune GPT-EA on GPT-NeoX-20B. A predecessor to GPT-EA (GPT-EA-Forum) was trained using a third-party API. I want to train GPT-EA on a cloud platform so I can download a copy of the weights myself. I am not receiving technical support (or funding for GPU costs), it could be helpful. The dataset was selected and cleaned by myself, with input from community members, though I’m still looking for community input.
Beautiful
I think this comment is confusing and unseemly, as it might be supporting the negative content in the parent comment. This is a thread for newcomers, making this concern greater.
I want to downvote this comment to invisibility or report this to a moderator, but the comment is made by a moderator, and he has two cats, presumably trained to attack his enemies.
Thanks, you may be right.
To be clear, I found these generated examples clearly false and misleading but still something that I can understand people who feel like this, so I found it a bit funny and uncanny, while I totally agree with JoyOptimizer’s concern.
They are only trained to be maximally adorable and cuddle my enemies to a fluffy friendship!
(And generally, you or anyone else should feel free to report comments from moderators if you find them damaging the forum’s discussion. It may well happen, and it’s important to get that kind of feedback)