I was reading the openai blog post “learning to summarize with human feedback” from the AI Safety Fundamentals course (https://openai.com/research/learning-to-summarize-with-human-feedback), especially the intriguing bit at the end about how if they try to fully optimize the model for maximum reward, they actually overfit and get lower-quality responses.
My ill-advised request is that I would just LOVE to see the EA Forum’s “summaryBot” go similarly haywire for a day and start summarizing every post in the same repetitive / aggressive tone as the paper:
“28yo dude stubbornly postponees start pursuing gymnastics hobby citing logistics reasons despite obvious interest??? negatively effecting long term fitness progress both personally and academically thoght wise? want change this dumbass shitty ass policy pls”
April fools’ day request:
I was reading the openai blog post “learning to summarize with human feedback” from the AI Safety Fundamentals course (https://openai.com/research/learning-to-summarize-with-human-feedback), especially the intriguing bit at the end about how if they try to fully optimize the model for maximum reward, they actually overfit and get lower-quality responses.
My ill-advised request is that I would just LOVE to see the EA Forum’s “summaryBot” go similarly haywire for a day and start summarizing every post in the same repetitive / aggressive tone as the paper:
“28yo dude stubbornly postponees start pursuing gymnastics hobby citing logistics reasons despite obvious interest??? negatively effecting long term fitness progress both personally and academically thoght wise? want change this dumbass shitty ass policy pls”