In the following, I consider strong cognitive enhancement as a form of AGI. AGI not being developped is a catastrophically bad outcome, since humans will still be able to develop bio and nuclear weapons and other things that we don’t know yet, and therefore I put a rather small probability that we survive the next 300 years without AGI, and an extremally small probability that we survive the next 1000 years. This means, in particular, no expansion through out the galaxy, so not developing AGI implies that we kill almost all the potential people. However, if I could stop AGI research for 30 years, I would do it, so that alignment research can perhaps catch up.
But if human institutions make it so that weapons are not deployed, then this can be equivalent to an AGI ‘code’ of safety? Also, if AGI is deployed by malevolent humans (or those who do not know pleasure but mostly abuse), this can be worse than no AGI.
International institutions cannot make it so that weapons are not deployed. They failed at controlling nuclear weapons, and this was multiple orders of magnitude easier than controlling bio weapons, and that’s only the technologies we know of. Each year, it becomes easier for a small group of smart people to destroy humanity. Moreover, the advancements in AI today make it easier to manipulate and control massively via internet, and so I put a probability of at least 30% that the world will become less stable, not more, even without considering anything new outside of bio/nuclear.
For the risk of AGI being used to torture people, I’m not entirely sure of my position, but I think that the anti-alignment (creating an AGI to torture people) is as hard as alignment, because it has the same problems. Moreover, my guess is that people that want to torture people will be a lot less careful than good people, and so AGI torturing people because it was developped by malevolent humans is a lot less probable than AGI being good because it was developped by good humans, and so the expected value is still positive. However, there is a similar risk that I find more worrying: the uncanny valley of almost alignment. it is possible that near misses are a lot worse than complete misses, because we would an AGI keeping us alive and conscious, but in a really bad way, and it is possible that ending up in the uncanny valley is more probable than solving alignment, and that would mean that AGI has a negative expected value.
I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN.
It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the ‘sentiment of disapproval.’ This is uniquely perceived by humans who act according to emotions.
People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace.
That makes sense: as an oversimplification, if AGI is trained to optimize for the expression ‘extreme pain’ then humans could learn to use the scale of ‘pain’ to denote pleasure. This would be an anti-alignment failure.
That makes a lot of sense too: I think that one’s capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovatively and cooperate worse than the latter. So, the regulation should be better than the aggression.
But, what if one ‘quality unit’ of the torture technology causes 100x more harm than one ‘quality unit’ of the regulatory technology can prevent. For instance, consider releasing an existing virus vs. preventing transmissions. Then, you need some institutional norms to prevent people from aggression. Ideally, there would be no so malevolent people: which can be either made by AI (e. g. recommendation algorithms for pleasure competitive with/much better than hurting others which is not pleasant just traumatizing and possibly memorable) or by humans (e. g. programs for at-risk children).
Yeah! That seems almost like an existential catastrophe if people are great (and) alive but actually suffering significantly. Considering that AI could just do that with one misspecification that somehow augments, AGI is risky. Humans would not develop in this way because they perceive so would stop doing what they dislike, if there is no ‘greater manipulative power’ which would compel them otherwise.
I am optimistic about humans being able to develop an AGI that improves wellbeing better than what would have happened without such technology while keeping control over it to make adjustments if they perceive a decrease in wellbeing. But, if it is not necessary, then perhaps why take the risk.
In the following, I consider strong cognitive enhancement as a form of AGI.
AGI not being developped is a catastrophically bad outcome, since humans will still be able to develop bio and nuclear weapons and other things that we don’t know yet, and therefore I put a rather small probability that we survive the next 300 years without AGI, and an extremally small probability that we survive the next 1000 years. This means, in particular, no expansion through out the galaxy, so not developing AGI implies that we kill almost all the potential people.
However, if I could stop AGI research for 30 years, I would do it, so that alignment research can perhaps catch up.
But if human institutions make it so that weapons are not deployed, then this can be equivalent to an AGI ‘code’ of safety? Also, if AGI is deployed by malevolent humans (or those who do not know pleasure but mostly abuse), this can be worse than no AGI.
International institutions cannot make it so that weapons are not deployed. They failed at controlling nuclear weapons, and this was multiple orders of magnitude easier than controlling bio weapons, and that’s only the technologies we know of. Each year, it becomes easier for a small group of smart people to destroy humanity. Moreover, the advancements in AI today make it easier to manipulate and control massively via internet, and so I put a probability of at least 30% that the world will become less stable, not more, even without considering anything new outside of bio/nuclear.
For the risk of AGI being used to torture people, I’m not entirely sure of my position, but I think that the anti-alignment (creating an AGI to torture people) is as hard as alignment, because it has the same problems. Moreover, my guess is that people that want to torture people will be a lot less careful than good people, and so AGI torturing people because it was developped by malevolent humans is a lot less probable than AGI being good because it was developped by good humans, and so the expected value is still positive. However, there is a similar risk that I find more worrying: the uncanny valley of almost alignment. it is possible that near misses are a lot worse than complete misses, because we would an AGI keeping us alive and conscious, but in a really bad way, and it is possible that ending up in the uncanny valley is more probable than solving alignment, and that would mean that AGI has a negative expected value.
I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN.
It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the ‘sentiment of disapproval.’ This is uniquely perceived by humans who act according to emotions.
People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace.
That makes sense: as an oversimplification, if AGI is trained to optimize for the expression ‘extreme pain’ then humans could learn to use the scale of ‘pain’ to denote pleasure. This would be an anti-alignment failure.
That makes a lot of sense too: I think that one’s capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovatively and cooperate worse than the latter. So, the regulation should be better than the aggression.
But, what if one ‘quality unit’ of the torture technology causes 100x more harm than one ‘quality unit’ of the regulatory technology can prevent. For instance, consider releasing an existing virus vs. preventing transmissions. Then, you need some institutional norms to prevent people from aggression. Ideally, there would be no so malevolent people: which can be either made by AI (e. g. recommendation algorithms for pleasure competitive with/much better than hurting others which is not pleasant just traumatizing and possibly memorable) or by humans (e. g. programs for at-risk children).
Yeah! That seems almost like an existential catastrophe if people are great (and) alive but actually suffering significantly. Considering that AI could just do that with one misspecification that somehow augments, AGI is risky. Humans would not develop in this way because they perceive so would stop doing what they dislike, if there is no ‘greater manipulative power’ which would compel them otherwise.
I am optimistic about humans being able to develop an AGI that improves wellbeing better than what would have happened without such technology while keeping control over it to make adjustments if they perceive a decrease in wellbeing. But, if it is not necessary, then perhaps why take the risk.
OK, thank you, you prompted a related question.