I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN.
It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the ‘sentiment of disapproval.’ This is uniquely perceived by humans who act according to emotions.
People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace.
That makes sense: as an oversimplification, if AGI is trained to optimize for the expression ‘extreme pain’ then humans could learn to use the scale of ‘pain’ to denote pleasure. This would be an anti-alignment failure.
That makes a lot of sense too: I think that one’s capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovatively and cooperate worse than the latter. So, the regulation should be better than the aggression.
But, what if one ‘quality unit’ of the torture technology causes 100x more harm than one ‘quality unit’ of the regulatory technology can prevent. For instance, consider releasing an existing virus vs. preventing transmissions. Then, you need some institutional norms to prevent people from aggression. Ideally, there would be no so malevolent people: which can be either made by AI (e. g. recommendation algorithms for pleasure competitive with/much better than hurting others which is not pleasant just traumatizing and possibly memorable) or by humans (e. g. programs for at-risk children).
Yeah! That seems almost like an existential catastrophe if people are great (and) alive but actually suffering significantly. Considering that AI could just do that with one misspecification that somehow augments, AGI is risky. Humans would not develop in this way because they perceive so would stop doing what they dislike, if there is no ‘greater manipulative power’ which would compel them otherwise.
I am optimistic about humans being able to develop an AGI that improves wellbeing better than what would have happened without such technology while keeping control over it to make adjustments if they perceive a decrease in wellbeing. But, if it is not necessary, then perhaps why take the risk.
I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN.
It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the ‘sentiment of disapproval.’ This is uniquely perceived by humans who act according to emotions.
People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace.
That makes sense: as an oversimplification, if AGI is trained to optimize for the expression ‘extreme pain’ then humans could learn to use the scale of ‘pain’ to denote pleasure. This would be an anti-alignment failure.
That makes a lot of sense too: I think that one’s capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovatively and cooperate worse than the latter. So, the regulation should be better than the aggression.
But, what if one ‘quality unit’ of the torture technology causes 100x more harm than one ‘quality unit’ of the regulatory technology can prevent. For instance, consider releasing an existing virus vs. preventing transmissions. Then, you need some institutional norms to prevent people from aggression. Ideally, there would be no so malevolent people: which can be either made by AI (e. g. recommendation algorithms for pleasure competitive with/much better than hurting others which is not pleasant just traumatizing and possibly memorable) or by humans (e. g. programs for at-risk children).
Yeah! That seems almost like an existential catastrophe if people are great (and) alive but actually suffering significantly. Considering that AI could just do that with one misspecification that somehow augments, AGI is risky. Humans would not develop in this way because they perceive so would stop doing what they dislike, if there is no ‘greater manipulative power’ which would compel them otherwise.
I am optimistic about humans being able to develop an AGI that improves wellbeing better than what would have happened without such technology while keeping control over it to make adjustments if they perceive a decrease in wellbeing. But, if it is not necessary, then perhaps why take the risk.