an overview of simple AI safety concepts and their easily explainable real-life demonstrations
For instance, to explain sycophancy, I tend to mention the one random finding from this paper that hallucinations are more frequent, if a model deems the user uneducated
more empirical posts on near-term destabilization (concentration of power, super-persuasion bots, epistemic collapse)
Iād like to see
an overview of simple AI safety concepts and their easily explainable real-life demonstrations
For instance, to explain sycophancy, I tend to mention the one random finding from this paper that hallucinations are more frequent, if a model deems the user uneducated
more empirical posts on near-term destabilization (concentration of power, super-persuasion bots, epistemic collapse)