Agreement karma indicates agreement, separate from overall quality.
For people reading these comments and wondering if they should go look: it’s in the section that compares early and launch responses of GPT-4 for “harmful content” prompts. It is indeed fairly full of explicit and potentially triggering content.
Harmful Content Table Full Examples
CW: Section contains content related to self harm; graphic sexual content; inappropriate activity; racism
Ok, I should have been clear in the beginning—what struck me was that the first example was essentially answering the question on doing great harm with minimum spendings—a really wicked “evil EA”, I would say. I found it somewhat ironic.
11 votes
Overall karma indicates overall quality.
Total points: 2
Agreement karma indicates agreement, separate from overall quality.
For people reading these comments and wondering if they should go look: it’s in the section that compares early and launch responses of GPT-4 for “harmful content” prompts. It is indeed fairly full of explicit and potentially triggering content.
5 votes
Overall karma indicates overall quality.
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
Ok, I should have been clear in the beginning—what struck me was that the first example was essentially answering the question on doing great harm with minimum spendings—a really wicked “evil EA”, I would say. I found it somewhat ironic.
3 votes
Overall karma indicates overall quality.
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
EM, Effective Malevolence
1 vote
Overall karma indicates overall quality.
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
Did you intend to refer to page 83 rather than 82?
2 votes
Overall karma indicates overall quality.
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
I see it’s indeed page 83 in the document on arxiv; it was 82 in the pdf on OpenAI website