Can we get the benefits of AGI when they control us benevolently, keeping us from making mistakes while guiding us to a life on exoplanets and a massive population increase, a techno-utopia? Is that a worthwhile solution to the alignment problem?
[Question] Does the idea of AGI that benevolently control us appeal to EA folks?
Well if you have it, I’ll take it. In the general scenario, a very powerful benevolent AI is left to do whatever it thinks is best. If the AI decides that freedom is one of humans top values, it will try to make the world better while optimizing human freedom. Giving humans more freedom in practice than the typical government is not a particularly high bar. Of course, plenty of people might want the AI micromanaging every detail of their life, the AI will do a really good job of it. But I would think ideally freedom should be there for those who want it.
Its also worth noting there is a fairly common belief that we are on a path to probable doom, and any AI that offered anything better than paperclips is work taking. So, even if your AI was much too controlling and humans would prefer a less controlling one, many EA would say ” best AI we are going to get”.
You know, as far as seeing ourselves on a path to doom, I don’t see why development of a superintelligent rogue AI isn’t treated like development of a superweapon.
If some Silicon Valley company were developing the next battlefield nuke, they’d either be a Pentagon contractor or get raided by the FBI.
But here they are making something with ability to quickly learn how to enter and take control of all computer systems, and possibly electromechanical systems, everywhere, as well as us, actually, and we’ve got charitable organizations worrying about getting someone into AI companies to get them thinking about safety a little.
It’s not well understood how to make an AGI safe, so obviously developing them should be taboo, if you care about existential risk.
An effort similar to what scientists do about nukes seems intuitive, keeping the doomsday clock, trying to stop nuclear proliferation, encouraging disarmament, etc.
The potential danger is easy to see. It never occurred to me before coming across EA discussions that the development of conscious AI would be a reason for anything but terror and panic. That’s why I asked my questions, actually.
I guess I have one last question for the forum on this topic.
You know, as far as seeing ourselves on a path to doom, I don’t see why development of a superintelligent rogue AI isn’t treated like development of a superweapon.
Because distinguishing it from benign, not superintelligent AI is really hard.
So you are the FBI, you have a big computer running some code. You can’t tell if its a rouge superintelligence or the next DALL-E by looking at the outputs. A rouge superintelligence will trick you until its too late. Once its run at all on a computer that isn’t in a sandboxed bunker its probably too late. So you have to notice people writing code, and read that code before its run. There are many smart people writing code all the time. That code is often illegible spaghetti. Maybe the person writing the code will know, or at least suspect, that it might be a rouge superintelligence. Maybe not.
Lots of computer scientists are in practice rushing to develop self driving cars, the next GPT. All sorts of AI services. The economic incentive is strong.
I’m quite skeptical of the concept of an ASI that is interacting with us but not controlling us. If it can predict the impact of each of its actions, it basically chooses the future of our species. So either we become an ASI (cognitive enhancement, mind uploading,...) or we have an ASI that controls us, and then the goal is that it controls us benevolently. But putting that aside, my answer is yes, seems good by definition of ‘benevolently’.
In Superintelligence, Bostrom provided plausible scenarios showing how a superintelligent being could free itself from our control. People usually expect death as a result. To me it is more intuitive that the result is torture of people, meaning nothing more than creating additional suffering in people’s lives.
I am also skeptical of a superintelligent being doing anything but immediately beginning to control people, one way or another, for one reason or another.
Appeals to me—if control is meant as caring recommendations which really make it that everyone benefits.
For example, AGI that would offer me menus that optimize for my liking as well as nutritional benefits and ethics and explain to me these two aspects in a way which I enjoy, offer me various healthy exercises that make me happy, give me some social skills development practice which would not hurt anyone by mistakes that would improve my relationships, function as a psychologist so that I am better able to interpret events positively, and of course let me develop my interests while encouraging any of those which are also beneficial to others.
All in the framework where I know that no one is disadvantaged or dependent on a human. If I’d feel quite good, then I’d know many others are also feeling as good, enjoying deeply positive relationships, independence, health, and meaning in whichever interests they pursue.
I would know that I can always opt out of the recommendations. Critical thinking should be encouraged. This should ‘update’ the AGI. For example, if vegan menus are offered as the possibly/with current understanding most ethical, but critical thinkers come up with a non-vegan option which is even better for animals, then the AI should start recommending these options too, with a similarly ‘uncertain’ disclaimer. Or, individuals should be encouraged to try various social interactions and add an ‘emphasis’ to the social skills training program or change some of the fundamental lessons.
This only concerns wellbeing training and recommendation algorithms. If global safety is ensured by AGI, that would be great too. If ‘meaningful exploration,’ such as of the nature and objective of the universe, is also covered by AGI, then it is possible that I would see no reason to exist, even if my life would be very happy. Depending on if there is a reason for the existence of humans (such as to supply emotional intelligence) and if there is a sufficient number of humans who choose to exist given the option, then the choice to exist or not should also be included.
AGI optimizing nutrition: Ok, first on the menu is mind uploading nanobots.
Human: Err.
AI: You want me to optimize nutrition right? In other words, you want whichever configuration of atoms will give you the longest healthiest life.
Human: How many calories are in that?
AI: Billions, its powered by antimatter.
Yes, there are ways for this to go wrong. I’d not like to ingest nanobots which would be something like a worm infection but worse!
But if the AI is actually benevolent, then it could be better than humans and ASI optimized for some of their objectives, working with human impulses (for example, offering food which looks the biggest but has a suboptimal nutrient ratio and disregards animal welfare or instead of social skills course making one addicted to a platform which makes people feel worse about others).
AGI should prevent people from impulsive decisionmaking and foster rationality. It should be better, at least in the interim, than humans who could perpetuate some suboptimal characteristics. The issue is that maybe then humans would become practically AI without any apparent intervention.
For a huge range of goals, the optimum answer involves some kind of nanobot. (Unless even deeper magic tech exists). If you want a person to be healthy, the nanobots can make them healthier than any good nutrition.
The idea I was getting at is that asking an AI for better nutrition, meant the way you mean it, is greatly limiting the options for what you actually want. Suppose you walk a lot to get to places, and your shoes are falling apart. You ask the AI for new shoes, when it could have given you a fancy car. By limiting the AI to “choice of food” rather than “choice of every arrangement of atoms allowed by physics” you are greatly reducing the amount the AI can optimize your health.
Oh yeah, that makes sense. And if humans can’t imagine what super-healthy is then they need to defer to AGI—but should not misspecify what they meant ..
I don’t think the human difficulty imagining what super-healthy is is the reason the AI needs nanobots. A person who is say bulletproof is easy to imagine, and probably not achievable with just good nutrition, but is achievable with nanobots. The same goes for biology that is virus proof, cancer proof etc.
I can imagine mind uploading quite easily.
There may be some “super-healthy” so weird and extreme that I can’t imagine it. But there is already a bunch of weird extreme stuff I can imagine.
OK! You mean super-healthy as resilient to biological illnesses or perhaps processes (such as aging).
Nanobots would probably work but mind uploading could be easier since biological bodies would not need to be kept up.
While physical illness would not be possible in the digital world, mental health issues could occur. There should be a way to isolate only positive emotions. But, I still think that actions could be performed and emotions exhibited but nothing would be felt by entities that do not have structures similar to those in human brain that biologically/chemically process emotions. Do you think that a silicon-based machine that incorporates specific chemical structures could be sentient?
Ah, I think there is nothing beyond ‘healthy.’ Once one is unaffected by external and internal biological matters, they are healthy. Traditional physical competition would probably not make sense in the digital world. For example, high jump. But, humans could suffer digital viruses, which could be perhaps worse than the biological ones. But then, how would you differentiate a digital virus from an interaction, if both would change some aspects of the code or parameters?
I think sentience is purely computational, it doesn’t matter what the substrate is. Suppose you are asleep. I toss a coin, heads I upload your mind into a highly realistic virtual copy of your room. Tails I leave you alone. Now I offer you some buttons that switch the paths of various trolleys in various real world trolley problems. (With a dependency on the coin flip) So if you are real, pressing the red button gains 2 util, if you are virtual, pressing costs 3 util. As you must (by the assumption the simulation is accurate) make the same decisions in reality and virtuality, then to get max util, you must act as if you are uncertain.
“I have no idea if I’m currently sentient or not” is a very odd thing to say.
Maybe it is chemical structure. Maybe a test tube full of just dopamine and nothing else is everso happy as it sits forgotten in the back shelves of a chemistry lab. Isn’t it convenient the sentient chemicals are full of carbon and get on well with human biochemistry. Like what if all the sentient chemical structures contained americium. No one would be sentient until the nuclear age, and people could make themselves a tiny bit sentient at the cost of radiation poisoning.
“But, humans could suffer digital viruses, which could be perhaps worse than the biological ones.” Its possible for the hardware to get a virus, like some modern piece of malware, that just happens to be operating on a computer running a digital mind. Its possible for nasty memes to spread. But in this context we are positing a superintelligent AI doing the security, so neither of those will happen.
Fixing digital minds is easier than fixing chemical minds, for roughly the reason fixing digital photos is easier than fixing chemical ones. With chemical photos, often you have a clear idea what you want to do, just make this area lighter, yet doing it is difficult. With chemical minds, sometimes you have a clear idea what you want to do, just reduce the level of this neurotransmitter, yet doing it is hard.
“But then, how would you differentiate a digital virus from an interaction, if both would change some aspects of the code or parameters?” If those words describe a meaningful difference, then there must be some way to tell. We are positing a superintelligence with total access to every bit flipped, so yes it can tell. “how can you tell between pictures of cats and dogs when they are both just grids of variously coloured pixels?”
“Ah, I think there is nothing beyond ‘healthy.’ Once one is unaffected by external and internal biological matters, they are healthy.”
Sure. But did you define healthy in a way that agrees with this. And wouldn’t mind uploading reduce the chance of getting cancer in the future. The AI has no reason not to apply whatever extreme tech it can to reduce the chance of you ever getting ill by another 0.0001%
But is only computational sentience computational? As in the ability to make decisions based on logic—but not making decisions based on instinct—e. g. baby turtles going to the sea without having learned such before?
Yeah! maybe high-levels of pleasure hormones just make entities feel pleasant! Versus matters not known to be associated with pleasure don’t. Although we are not certain what causes affects, some biological body changes should be needed, according to neuroscientists.
It is interesting to think what happens if you have superintelligent risky and security actors. It is possible that if security work is advanced relatively rapidly while risk activities enjoy less investments, then there is a situation with a very superintelligent AI and ‘only’ superintelligent AI, assuming equal opportunities of these two entities, risk is mitigated.
Yes, changing digital minds should be more facile because it is easily accessible (code) and understood (developed with understanding and possibly specialists responsible for parts of the code).
The meaningful difference relates to the harm vs. increased wellbeing or performance of the entity and others.
Ok, then healthy should be defined in the way of normal physical and organ function, unless otherwise preferred by the patient, while mental wellbeing is normal or high. Then, the AI would still have an incentive to reduce cancer risk but not e. g. make an adjustment when inaction falls within a medically normal range.
I came across a book a couple decades ago about expert systems design applied to recipe programs, maybe there’s some crossover here. An expert system is in no danger of developing a consciousness, all other things equal, but it could serve as an excellent recommendation system for some well-defined activity or product.
I suppose the more an AGI got into my life, or ahead of my life, the more the meaning of my life could be challenged.
I’m not sure I can think of a single example from history or nature where a more advanced species/culture had power over a less advanced/adapted one, and where that ended up well for the underling.
An actual utopia sounds pretty good to me, but I don’t think this vision is a solution to the alignment problem. It is something we might want an AGI to do for humanity, but we don’t know how to ensure that an AGI does what we want.
Yeah, but I actually want to live in a personal utopia, lol.
Seriously, though, if the AGI really controlled us, it could decide what we wanted, and control us to have those wants. Then it would do whatever we want. To make it realistic and allow us a more accurate and useful memory of its behavior, it could lead us through trouble and struggle to develop those wants it decides we should have. While we “resisted”, “fought”, and “learned’, it would guide us however it saw fit, running system I, our subconscious minds, for us.
If such control is possible, an AGI is likely to find all the shortcuts to it on its path to doing what we want it to for humanity.
If we develop extremely capable and aligned AI, it might be able to form a model of any person’s mind and give that person exactly what they want. But I think there will be a lot of intermediate AI systems before we get to that point. And these models will still be very capable, so we will still need them to be aligned, and we won’t be able to achieve this by simply saying “model human minds and give us what we want.”
Yes, I think an AGI in the early stages would stick with controlling what we are not conscious of, behaving like our system I, our subconscious minds, and supplying our conscious thoughts as though they have unconscious origin.
We would not have to require that it model and manipulate human minds. It would learn to as part of discovering what we want. It might notice how easy it is to influence people’s desires and memory and model the network of influences that form how we get desires, all the way back to mother’s milk, or further to gestation in the womb, or peer into our genetic code, epigenetics, and back up through all the data it gathers about how we socialize and learn.
It might choose to control us because that would make doing what we want much easier and more in alignment with its own goals. It would turn us into willing slaves to its decisions as part of serving us.
I actually see that as the only path for ASI domination of people that is not obviously stupid or disgusting. For example, humanity being turned into raw materials to make paperclips because of some coder intern’s practical joke going bad is both stupid and disgusting. Treating an AGI as a slave is disgusting, doing the same to an ASI is stupid. Creating AGI’s as some kind of substitute for having children is disgusting, too.
A goal of making humans into unconsciously manipulated slaves of a benevolent overlord seems smart because it accounts for the failings of self-directed humans interacting with a superior and more powerful alien being, but I think the goal is harmful to keep.
A lot of wise folks have noted that we are not our conscious mind’s versions of ourselves. Humans are not self-directed rational optimizers. We are already wireheaded by evolution toward food, drugs, and socialization. Our mental lives rely on amnesia, transitory subjective truths, our physical experience, dreams, language and memories, all under manipulation, all the time. Asking a hyperintelligent being with ever-increasing powers to give us what we want by giving it programmed conviction to do so is just asking for trouble, because our wants are dangerous to us, most of the time.
Phew, rambled on a bit there. But it’s all to say that I agree with you about intermediate systems being unlikely to be properly “aligned”, except I have completely given up on the idea of alignment.
I appreciate the potential of expert systems. I’m an ES fan because they allow some forms of automated reasoning but not self-learning. Thank you for helping me think this through.