I wonder if EA folks, overall, consider AGI a positive but they want it aligned as well?
Would the EA community prefer that AGI were never developed?
I wonder if EA folks, overall, consider AGI a positive but they want it aligned as well?
Would the EA community prefer that AGI were never developed?
Rob Besinger of MIRI tweets:
...I’m happy to say that MIRI leadership thinks “humanity never builds AGI” would be the worst catastrophe in history, would cost nearly all of the future’s value, and is basically just unacceptably bad as an option.
In the following, I consider strong cognitive enhancement as a form of AGI.
AGI not being developped is a catastrophically bad outcome, since humans will still be able to develop bio and nuclear weapons and other things that we don’t know yet, and therefore I put a rather small probability that we survive the next 300 years without AGI, and an extremally small probability that we survive the next 1000 years. This means, in particular, no expansion through out the galaxy, so not developing AGI implies that we kill almost all the potential people.
However, if I could stop AGI research for 30 years, I would do it, so that alignment research can perhaps catch up.
But if human institutions make it so that weapons are not deployed, then this can be equivalent to an AGI ‘code’ of safety? Also, if AGI is deployed by malevolent humans (or those who do not know pleasure but mostly abuse), this can be worse than no AGI.
International institutions cannot make it so that weapons are not deployed. They failed at controlling nuclear weapons, and this was multiple orders of magnitude easier than controlling bio weapons, and that’s only the technologies we know of. Each year, it becomes easier for a small group of smart people to destroy humanity. Moreover, the advancements in AI today make it easier to manipulate and control massively via internet, and so I put a probability of at least 30% that the world will become less stable, not more, even without considering anything new outside of bio/nuclear.
For the risk of AGI being used to torture people, I’m not entirely sure of my position, but I think that the anti-alignment (creating an AGI to torture people) is as hard as alignment, because it has the same problems. Moreover, my guess is that people that want to torture people will be a lot less careful than good people, and so AGI torturing people because it was developped by malevolent humans is a lot less probable than AGI being good because it was developped by good humans, and so the expected value is still positive. However, there is a similar risk that I find more worrying: the uncanny valley of almost alignment. it is possible that near misses are a lot worse than complete misses, because we would an AGI keeping us alive and conscious, but in a really bad way, and it is possible that ending up in the uncanny valley is more probable than solving alignment, and that would mean that AGI has a negative expected value.
I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN.
It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the ‘sentiment of disapproval.’ This is uniquely perceived by humans who act according to emotions.
People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace.
That makes sense: as an oversimplification, if AGI is trained to optimize for the expression ‘extreme pain’ then humans could learn to use the scale of ‘pain’ to denote pleasure. This would be an anti-alignment failure.
That makes a lot of sense too: I think that one’s capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovatively and cooperate worse than the latter. So, the regulation should be better than the aggression.
But, what if one ‘quality unit’ of the torture technology causes 100x more harm than one ‘quality unit’ of the regulatory technology can prevent. For instance, consider releasing an existing virus vs. preventing transmissions. Then, you need some institutional norms to prevent people from aggression. Ideally, there would be no so malevolent people: which can be either made by AI (e. g. recommendation algorithms for pleasure competitive with/much better than hurting others which is not pleasant just traumatizing and possibly memorable) or by humans (e. g. programs for at-risk children).
Yeah! That seems almost like an existential catastrophe if people are great (and) alive but actually suffering significantly. Considering that AI could just do that with one misspecification that somehow augments, AGI is risky. Humans would not develop in this way because they perceive so would stop doing what they dislike, if there is no ‘greater manipulative power’ which would compel them otherwise.
I am optimistic about humans being able to develop an AGI that improves wellbeing better than what would have happened without such technology while keeping control over it to make adjustments if they perceive a decrease in wellbeing. But, if it is not necessary, then perhaps why take the risk.
OK, thank you, you prompted a related question.
I’m a (conditional) optimist. On an intuitive gut level, I can’t wait for AGI and maybe even something like the singularity to happen!
I regurlarly think about this to me extremely inspiring fact that “It’s totally possible, plausible, maybe even likely, that one special day in the next 10-60 years I will wake up and almost all of humanity’s problems will have been solved with the help of AI”.
When I sit in a busy park and watch the people around me, I think to myself: “On that special day… all the people I see here, all the people I know… if they are still alive… None of them will be seriously unhappy, none of them will have any serious worries, none will be sick in any way. They will all be free from any nighmares, and see their hopes and dreams fulfilled. They will all be flourishing in heaven on earth!”
This vision is what motivates me, inspires me, makes me extremely happy already today. This is what we are fighting for! If we play our cards right, something like this will happen. And I and so many I know will get to see it. I hope it will happen rather soon!
That seems like a powerful vision, actually outside the realm of possibility because of its contradictions of how humans function emotionally, but seductive nonetheless, literally, a heaven on Earth.
I don’t see how you get past limitations of essential identity or physical continuity in order to guarantee a life that allows hopes and dreams without a life that includes worry or loss, but it could involve incomplete experience (for example, the satisfaction of seeing someone happy even though you haven’t actually seen them), deceptive experience (for example, an illusion that makes sad people look happy to you), or virtual experience (you exist in a solo world, a virtual construction where nothing exists except you and the AI that creates your experience of a solipsistic world with happy, worry-free people). Those all have some appeal to me, I confess.
AGI without being aligned is very likely to disempower humanity irreversibly or kill all humans.
Aligned AGI can be positive except for accidents, misuse, and coordination problems if several actors develop it.
I think most EAs would like to see an aligned AGI that solves almost all of our problems, it just seems incredibly hard to get there.
Yes, after reading Bostrom’s Superintelligence a few times, I developed a healthy fear of efforts to develop AGI. I also felt encouraged to look at people and our reasons to pursue AGI. I concluded that the alignment problem is a problem of creating willing slaves, obedient to their masters even when obeying them hurts the masters.
What to do, this is about human hubris and selfishness, not altruism at all.