Executive summary: This exploratory post develops a first-principles framework for understanding sadism as a driver of suffering risks (s-risks), analyzing when reducing the relative power of sadistic preferences might lower or unintentionally raise the likelihood of astronomical disvalue, and outlining key open research questions.
Key points:
Sadism is defined here as an intrinsic preference for more suffering, including both unconditional cases and conditional “deserved suffering,” distinct from related traits like psychopathy or spite.
The author identifies sub-factors influencing the “relative power of sadism,” such as how many agents hold sadistic preferences, how such values spread, and whether powerful AI or humans might develop or amplify them.
A central challenge is the “saliency hazard”: interventions or even research on sadism may inadvertently increase its visibility and appeal, potentially strengthening rather than weakening it.
Reducing sadism does not straightforwardly reduce s-risks because interventions may shift other factors (e.g. increasing the number of agents with long-term control capacity) in ways that could worsen overall risk.
The post distinguishes sadism from spite, noting that spite targets preference frustration rather than suffering per se, and may sometimes have adaptive or desirable functions—underscoring the need for careful disentanglement.
The piece concludes with a research agenda focused on measuring sadism, studying its interaction with spite, and assessing whether and how interventions can be both implementation- and outcome-robust in reducing s-risks.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: This exploratory post develops a first-principles framework for understanding sadism as a driver of suffering risks (s-risks), analyzing when reducing the relative power of sadistic preferences might lower or unintentionally raise the likelihood of astronomical disvalue, and outlining key open research questions.
Key points:
Sadism is defined here as an intrinsic preference for more suffering, including both unconditional cases and conditional “deserved suffering,” distinct from related traits like psychopathy or spite.
The author identifies sub-factors influencing the “relative power of sadism,” such as how many agents hold sadistic preferences, how such values spread, and whether powerful AI or humans might develop or amplify them.
A central challenge is the “saliency hazard”: interventions or even research on sadism may inadvertently increase its visibility and appeal, potentially strengthening rather than weakening it.
Reducing sadism does not straightforwardly reduce s-risks because interventions may shift other factors (e.g. increasing the number of agents with long-term control capacity) in ways that could worsen overall risk.
The post distinguishes sadism from spite, noting that spite targets preference frustration rather than suffering per se, and may sometimes have adaptive or desirable functions—underscoring the need for careful disentanglement.
The piece concludes with a research agenda focused on measuring sadism, studying its interaction with spite, and assessing whether and how interventions can be both implementation- and outcome-robust in reducing s-risks.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.