Content warning: discussion of existential risk and violence
This is how it feels wading into the debate around AI doomerism. Any sceptic is thrown a million convincing sounding points all of which presuppose things that are fictional.
In the context of climate change, are predictions about climate change decades in the future similarly presupposing “things that are fictional”, because they presuppose things that haven’t actually happened yet and could turn out differently in principle? I mean, in principle it’s technically possible that an ASI (artificial superintelligence) technology could arrive next week and render all the climate models incorrect because it figures out how to solve climate change in a cheap and practical way and implements it well before 2100. Yet that isn’t a reason to dismiss climate models as “fictional” and therefore not worthy of engaging with. They merely rely on certain assumptions.
I think everyone in this debate would agree that it is harder to predict what AGIs (artificial general intelligences) and ASIs might do and how they might think and behave, than it is to make scientifically-justified climate models, given that AGIs and ASIs probably haven’t been invented yet (although a recent research paper claims that GPT-4 displays “sparks of AGI”).
However, there are a lot of arguments in the AI alignment space—entire books, such as Nick Bostrom’s “Superintelligence” and Tom Chiver’s somewhat more accessible “The AI Does Not Hate You” (since renamed to “The Rationalist’s Guide to the Galaxy”), have been written about why we should care about AI alignment from an existential risk point of view. And this is not even to consider the other kinds of risk from AI, which are numerous and substantial (some of which you alluded to at the end of your post, granted).
While some of these arguments—relying as they do on concepts like molecular manufacturing and nanobots which might not even be technology that it is possible to develop in the near future—are highly contentious, I think there are also a bunch of arguments that are more grounded in basic common sense and our experience of the world, and are harder to argue with. And the latter arguments kind of render the former, controversial arguments almost irrelevant to the basic question of “should we be worrying about AI alignment?” There are many ways unaligned AIs could end up killing humans—some of which humans probably haven’t even thought of yet and perhaps don’t even have the science/tech/intellect to think up. Whether they’d end up doing it with nanobots is neither here nor there.
Debating ‘alignment’ for example means you’ve already bought into their belief that we will lose control of computers so you’re already losing the debate.
I suppose that may be true, but if your view is that we definitely won’t lose control of computers at all, ever, that is quite a hard claim to defend. This scenario seems quite easy to occur at the level of an individual computer system. Suppose China develops an autonomous military robot which fires at human targets in a DMZ without humans being in the loop at all (I understand this has already happened), and that robot then gets hacked by a terrorist and reprogrammed, and the terrorist then gets killed and their password to control the robot is lost forever. We have then lost control of that robot, which is following the orders of the terrorist that the terrorist programmed into it, whatever they happen to be, until we take out the robot somehow. In principle, this needn’t even involve AI in any essential way.
But AGIs that involve goal-following and optimisation would make this problem much, much worse. An AI that is trying to fufill a simply-stated goal like “maximise iPhone production” would want to keep itself in existence and running, because if it no longer exists, its goal is perhaps less likely to be fulfilled (there could be an equally competent human, or an even better AI developed, but neither are guaranteed to happen). So, in the absence of humanity solving or at least partially solving the AI alignment problem, such an AI might try to stop humans trying to turn it off, or even kill them to prevent it from doing so. Being able to turn an AI off is a last-ditch solution if we can’t more directly control it—but by assumption there’s a risk that we can’t more directly control it if it’s sufficiently savvy and aware of what we’re trying to do, because it has a goal already and it would probably want to retain its current goal, because if it had a different goal then most likely its current goal would no longer get fulfilled.
So here I’ve already introduced two standard arguments about how sufficiently-advanced AIs are likely to behave and what their instrumental goals are likely to be. Instrumental goals are like sub-goals, the idea being that we can figure out what instrumental goals they’re likely to have in some cases, even if we don’t know what their final (i.e. top-level) goals that they’re going to be given will be. You might argue that these arguments are based on fictional things which don’t exist yet. This is true—and indeed, one way that AI alignment might never be necessary is if it turns out we can’t actually create an AGI. However, recent progress with large language models and other cutting-edge AI systems has rendered that possibility extremely implausible, to me.
But again, being based on fictional things which don’t exist yet isn’t a knockdown argument. Before the first nuclear weapon was tested, the physicists at the Manhattan Project were worried that it might ignite the atmosphere, so they did extensive calculations to satisfy themselves that it was in fact safe to test the nuclear weapon. If you had said to them, before the first bomb had been built, “this worry is based on a fictional thing which doesn’t exist yet” they would have looked at you like you were crazy. Obviously, your line of argument doesn’t make sense when you know how to build the thing and you are about to build the thing. I submit that it also doesn’t make sense when people don’t know how to build the thing and probably aren’t immediately about to build the thing, but might actually build the thing in 2-5 years time!
The Flying Spaghetti Monster exists to shift the burden of proof and effort in a debate.
I am happy to cite chapter and verse for you for why you’re wrong, but if you’re going to reject our arguments out of hand we’re not going to have a very productive conversation.
No, it isn’t—because banking, scintillating as it may be, is not a general task, it’s a narrow domain—like chess, but not quite as narrow. Also, we still have human bankers to do higher-level tasks, it’s just that the basic operations of sending money from person A to person B have largely been automated.
This is the kind of basic misunderstanding that would have been avoided by more familiarity with the literature.
This is obviously leaving aside the MASSIVE issue that computers don’t ‘want’ anything.
Generally this is true in the present day; however, goal-driven, optimising AIs would—see above. Even leaving aside the contentious arguments about convergent instrumental goals I recited above, if I’ve given you a goal of building a new iPhone factory on an island, and then someone proposes blowing up that entire island, you’re not going to want that to happen (quite apart from any humanitarian concern you may have for the present inhabitants of that island), and neither is an AI with such a goal. OK, you might be willing to compromise on the location for the factory after consulting your boss, but an AI with such a final goal is not going to be willing to—see above re goal immutability.
The idea that more intelligence creates sentience seems disproven by biology
I agree—but I don’t see how this helps your case re existential risk. Indeed, non-sentient AIs might be more dangerous, as they would be unable to empathise with humans and therefore it would be easier for them to behave in psychopathic ways. I think you would benefit from seeing Yudkowsky et al’s arguments as supposing that unaligned AIs are “psychopathic”—which seems like a reasonable inference to me—he’d probably argue that the space of possibilities for viable AIs is almost entirely populated by psychopathic ones, from a human point of view.
Muggle: “Did I just fail the Turing test?”
The Turing Test is not a test for humans at all, it’s a test for AIs. Moreover, were a human to “take” it and “fail”, this wouldn’t prove anything—as your example shows.
Secondly, it was passed decades ago depending on what you measure.
The Loebner Prize people have claimed that it has already been passed by simple pre-GPT chatbots, but they’re wrong. For the purposes of this discussion, the relevant distinction is that no AIs can yet quite manage to think like an intelligent human in all circumstances, and that’s what the Turing Test was intended to measure. But, as noted above, GPT-4 has been argued to be getting close to this point.
why are we measuring computers by human standards?
Because we want to know when we should be really worried—both from a “who is going to lose their job?” point of view, and for us doomers, an existential risk point of view as well. The reason why doomers like me find this question relevant is because we believe there is a risk that when AGI is created, it will be able to recursively self-improve itself up to an artificial superintelligence, perhaps in a matter of weeks or months. Though more likely substantial hardware advancement would be required, which I guess would mean years or decades instead. And artificial superintelligence would be really scary because it could be almost impossible to control—again, given certain debatable assumptions, like that it could cross over into other datacentres, or bribe or threaten people to let it do so.
But remember, we are talking about AI risks here, not AI certainties. The fact that some of these assumptions might not hold true is not actually much comfort if we think that they have, say, a 90% chance of coming to pass.
The idea of a ‘singularity,’ of exponential technological growth so exponentially fast it basically happens in an instant is historically ignorant, that’s just not how things work.
I agree with you on this, and this is where I part company with Yudkowsky. However, I don’t think this belief is essential to AI doomerism—it just dictates whether we’re going to have some period of time to figure out how to stop an unaligned AI (my view) or no time at all (Yudkowsky’s view). But that may not be terribly relevant in the final analysis—because, as I already discussed previously, it may not be possible to stop an unaligned ASI once it’s been created and switched on and escaped from any “box” it may have been contained in, even if we had infinite time available to us.
And it’s worth noting that Ray Kurzweil didn’t mean the definition you gave by the Singularity—he just meant a point where progress is so fast it’s impossible to predict what will happen in detail before it starts.
This idea of sci-fi predictive powers crops up again and again in doomer thinking. It’s core to the belief about how computers will become unstoppable and it’s core to their certainty that they’re right.
We already have uncensorable, untrackable computer networks like Tor. We already have uncensorable, stochastically untrackable cryptocurrency networks like Monero. We have already seen computer viruses (worms) that spread in an uncontrolled manner around the internet given widespread security vulnerabilities that they can be programmed to take advantage of—and there are still plenty of those. We already have drones that could be used to attack people. Put all these together… maybe we could be dealing with a hard-to-control AI “infestation” that is trying to use drones or robots controlled over the internet to take out people and ultimately try to take over the world. The AI doesn’t even have to replicate itself around the internet to every computer, it can just put simple “slave” processes in regular computers, creating a botnet under its exclusive control, and then replicate itself a few times—as long as it can keep hopping from datacentre to datacentre and it can keep the number of instances of itself above zero at any one time, it survives, and as long as it has some kind of connection to the internet, even just the ability to make DNS queries, it might in principle be able to control its “slave processes” and take action in the world even as we try desperately to shut it down.
Hypothetical thinking is core to what it means to be human! It separates us from simpler creatures! It’s what higher intelligence is all about! Just because this is all hypothetical, doesn’t mean it can’t happen!
We’re not “certain” that we’re right in the faith-based way that religious people are certain that they’re right about God existing—we’re highly confident that we’re right to be concerned about existential risk because of our rough-and-ready assessment of the probabilities involved, and the fact that not all of our arguments are essential to our conclusion (even if nanobots won’t kill us we might still be killed by some other technique once the AI has automated its entire supply chain, etc.)
With existential risk, even a 1% risk of destroying the human species is something we should worry about—obviously, given a realistic path from here to there which explains how that could happen.
Why would the aliens put all their resources into weapons, rather than say into entertainment?
You’re effectively asking why the AIs would not choose to entertain themselves instead of fighting with us.
Present-day computers have no need to entertain themselves, and I see no reason why future AI systems would be any different. Effective altruists, like other human beings, are best advised to have fun sometimes, as our bodies and minds get tired and need to unwind, but probably AIs and robots will face no such constraints.
As for fighting… or, as Eliezer would have it, taking us all out in one feel swoop...
why would the aliens want our resources if they have unlimited themselves?
You’re effectively asking why the AIs would want our resources (e.g. the atoms in our bodies) if they have unlimited resources themselves. Well, this is kind of conflating two different things. I’m pretty sure an ASI could figure out how to generate enough cheap energy for all its needs, because we’re quite close to doing that ourselves as it is (nuclear fusion is 30 years away, hehe). But obviously an ASI wouldn’t have unlimited atoms, or unlimited space on Earth. Our bodies would contain atoms that it could use for something else, potentially, and we’d be taking up space that it could use for something else, potentially.
Nobody needs that many iPhones.
Yes, but the AI doesn’t know this unless you tell it—that’s the point of this wildly popular educational game about AI doom, which in turn was based on a famous thought experiment by Bostrom and/or Yudkowsky. I mean, the AI may know it, but even if it knows that on some level, if some idiot has given it a goal to simply maximise the production of iPhones, it’s not going to stop when everyone on Earth has one and a spare. Because as I’ve just stated it, its goal doesn’t say anything about stopping, or what’s enough.
And while you may think that would be easy enough to fix, there are so many other ways that an AI can be misaligned, it’s depressing. For example, suppose you set your AI humanoid robot a goal of cooking you and your child dinner, and you remember to tell it what counts as enough dinner, and you remember to tell it not to kill you. Oops, you forgot to mention not to kill your child! Rather than walking around your infant that happens to be crawling around on the floor, it treads on it, killing it, because that’s a more efficient route to the kitchen cupboard to get an ingredient it needs to cook dinner.
In the context of climate change, are predictions about climate change decades in the future similarly presupposing “things that are fictional”,
So no, climate change is something that seems similar but is only superficially. As I understand we now have the historic data that temperatures are rising and we have the historic data that this could mean many bad things. No computers are currently running around killing people of their own free will.
I think everyone in this debate would agree that it is harder to predict what AGIs (artificial general intelligences) and ASIs might do and how they might think and behave, than it is to make scientifically-justified climate models,
I would very much disagree with this. All the historic data shows that computers can be easily controlled, risk of death is very low (self driving cars are safer than human driven cars for example) and make our lives easier. The effects of climate change range from the very bad to the good.
I suppose that may be true, but if your view is that we definitely won’t lose control of computers at all, ever, that is quite a hard claim to defend.
Historically there is not one example of a computer doing anything other than what it was programmed to do. This is like arguing that aliens will turn up tomorrow. There is no evidence.
password to control the robot is lost forever
The robot is still simply doing what it was programmed to do. I agree that terrorists getting their hands on super weapons, including AI powered ones (for example using AI to create new viruses) is extremely dangerous. But that is not a sci fi scenario, our enemies getting hold of weapons we’ve created is common in history.
An AI that is trying to fufill a simply-stated goal like “maximise iPhone production” would want to keep itself in existence and running, because if it no longer exists, its goal is perhaps less likely to be fulfilled
So this is a common argument that doesn’t make sense economically or from a safety view point. In order for an iPhone factory to be able to prevent itself from being turned off what capabilities would it require? Well it would need some way presumably to stop us humans from cutting its cables. I’d presume therefore that it would need autonomous armed guards. To prevent airstrikes on the factory maybe it would need an AA battery. But neither of those things are required for an iPhone factory. If you’ve programmed an iPhone factory with the capability to refuse to be turned off and given it armed robot drones, and AA guns then you’re an idiot. We already have iPhone factories that work just fine without any of those things. It doesn’t make sense from an economic resource utilisation point of view to upgrade then with dangerous stuff they don’t need.
I’ve heard similar arguments about “What if the AI fires off all the nukes?” Don’t give a complex algorithm control of the nukes in the first place!
A simpler scenario that might help understanding is the election system. Tom Scott had a great video on this. Why is election security so much more contentious in America than Britain? Because Americans are too lazy to do hand counting and use all sorts of computer systems instead. These systems are more hackable than the paper and pen and hand counting we use in the UK. But the important thing to understand here is that none of these scenario’s are the fault of any ‘super-intelligence’ but rather typical human super-stupidity.
I submit that it also doesn’t make sense when people don’t know how to build the thing and probably aren’t immediately about to build the thing, but might actually build the thing in 2-5 years time!
I disagree and it’s something I find rather cringe about the whole ‘AI alignment’ field. For one thing, something isn’t useful of profitable until it’s safe. For instance we talk often about having ‘self driving’ cars in the future. But we’ve had self driving cars from the very beginning! I can go out to my ole gas guzzler right now, put a brick on the accelerator and it will drive itself into a wall. What we actually mean by ‘self driving cars’ is ‘cars that can drive themselves safely.’ THIS is what Tesla, Apple and Google are all working on. If you set up an outside organisation to ‘make sure AI self driving cars were safe’ people would think you were crackers because who would drive in an unsafe self driving car? Unsafe AI in 90%+ of cases will simply not be economically viable because why would you use something that’s unsafe when you already have the existing whatever it is that does the same thing safely (just slower or whatever.)
No, it isn’t—because banking, scintillating as it may be, is not a general task, it’s a narrow domain—like chess, but not quite as narrow.
Everything is a narrow domain. No I will not explain further lol.
why are we measuring computers by human standards?
Because we want to know when we should be really worried—both from a “who is going to lose their job?” point of view, and for us doomers, an existential risk point of view as well.
Anthropomorphising
We already have uncensorable, untrackable computer networks like Tor. We already have uncensorable, stochastically untrackable cryptocurrency networks like Monero
Why does the existence of these secure networks make you more worried about AI and not less?
We have already seen computer viruses (worms) that spread in an uncontrolled manner around the internet given widespread security vulnerabilities that they can be programmed to take advantage of—and there are still plenty of those
I haven’t had a computer virus in years. I’m sure AIs will create viruses and businesses will use AI to create ways to stop them. My money is on the side with more money which is the commercial and government side not the leet hackers.
A super AI virus is a realistic concern released by China or terrorists. It’s not a realistic concern that it creates itself from it’s own will.
You’re effectively asking why the AIs would not choose to entertain themselves instead of fighting with us.
No, I’m actually asking why us humans would allow our resources all to go into computers instead of things we want?
We’re not going to allow AIs to mine the moon to make themselves more powerful for instance, if we have that capability we’ll have them mine it to make space habitats instead.
Oops, you forgot to mention not to kill your child!
Again this is human stupidity NOT AI super intelligence. And this is the real risk of AI!
We can go back to the man that killed himself because the chatbot told him too. There were two humans being stupid there. First the designers of the App who made a chatbot that was designed to be an agreeable friend. But they were so stupid they forgot to ask themselves ‘What if it agrees with someone suicidal?’ For all we know they’ve also forgot to ask themselves ‘What if it agrees with someone who wants to do an act of terrorism?’ They should have foreseen this but they didn’t because we’re stupid monkeys.
Then there is the man himself who instead of going to a human with his issues went to a frigging chat bot who gave him advice no human would ever give him. He also seems to have on some level believed the chat bot was real or sentient and that’s influenced his behaviour. He’s also given waaay too much credence to an algorithm designed simply to agree with him.
Now ask yourself, who would have foreseen this situation. Eliezer Yudkowsky who believes he is super intelligent, AIs will be even more super intelligent and anthropomorphises them constantly? I could absolutely see Eliezer killing himself because a chatbot told him too.
Or me who believes AIs are stupid, humans are stupid and thinking AIs are alive is really stupid?
Let’s go back to Wuhan… Was the real problem that humans were behaving as gods and we were eaten by our own superior creations? No! It’s that we’re stupid monkeys who were to lazy to close the laboratory door!
One of the main stupid things we are doing is anthropomorphising these things. This leads humans to think the computers are capable of things that they aren’t.
The fear this provokes is probably not that dangerous but the trust it engenders is very dangerous.
That trust will lead to people putting them in charge of the nukes or people following the advice of a Chatbot created for ISIS or astrologers.
Great discussion! I appreciate your post, it helped me form a more nuanced view of AI risk rather than subscribing to full-on doomerism.
I would, however, like to comment on your statement—“this is human stupidity NOT AI super intelligence. And this is the real risk of AI!”
I agree with this assessment, moreover, it seems to me that this “human stupidity” problem of our inability to design sufficiently good goals for AI is what the Alignment field is trying to solve.
It is true that no computer program has its own will. And there is no reason to believe that some future superintelligent program will suddenly stop following its programming instructions. However, given our current models that optimize for a vague goal (like in the example below), we need to develop smart solutions to encode our “true intentions” correctly into these models.
I think it’s best explained with an example: GPT-based chatbots are simply trained to predict the next word in a sentence, and it is not clear at a technical level how we can modify such a simple and specific goal of next word prediction to also include broad, complex instructions like “don’t agree with someone suicidal”. Current alignment methods like RLHF help to some extent, but there are no existing methods that guarantee, for example, that a model will never agree with someone’s suicidal thoughts. Such a lack of guarantees and control in our current training algorithms, and therefore our models, is problematic. And it seems to me this is the problem that alignment research tries to solve.
The idea of ‘alignment’ presupposes that you cannot control the computer and that it has its own will so you need to ‘align it’ ie incentivise it. But this isn’t the case, we can control them.
It’s true that machine learning AIs can create their own instructions and perform tasks however we still maintain overall control. We can constraint both inputs and outputs. We can nest the ‘intelligent’ machine learning part of the system within constraints that prevent unwanted outcomes. For instance ask an AI a question about feeling suicidal now and you’ll probably get an answer that’s been written by a human. That’s what I got last time I checked and the conversation was abrupty ended.
Content warning: discussion of existential risk and violence
In the context of climate change, are predictions about climate change decades in the future similarly presupposing “things that are fictional”, because they presuppose things that haven’t actually happened yet and could turn out differently in principle? I mean, in principle it’s technically possible that an ASI (artificial superintelligence) technology could arrive next week and render all the climate models incorrect because it figures out how to solve climate change in a cheap and practical way and implements it well before 2100. Yet that isn’t a reason to dismiss climate models as “fictional” and therefore not worthy of engaging with. They merely rely on certain assumptions.
I think everyone in this debate would agree that it is harder to predict what AGIs (artificial general intelligences) and ASIs might do and how they might think and behave, than it is to make scientifically-justified climate models, given that AGIs and ASIs probably haven’t been invented yet (although a recent research paper claims that GPT-4 displays “sparks of AGI”).
However, there are a lot of arguments in the AI alignment space—entire books, such as Nick Bostrom’s “Superintelligence” and Tom Chiver’s somewhat more accessible “The AI Does Not Hate You” (since renamed to “The Rationalist’s Guide to the Galaxy”), have been written about why we should care about AI alignment from an existential risk point of view. And this is not even to consider the other kinds of risk from AI, which are numerous and substantial (some of which you alluded to at the end of your post, granted).
While some of these arguments—relying as they do on concepts like molecular manufacturing and nanobots which might not even be technology that it is possible to develop in the near future—are highly contentious, I think there are also a bunch of arguments that are more grounded in basic common sense and our experience of the world, and are harder to argue with. And the latter arguments kind of render the former, controversial arguments almost irrelevant to the basic question of “should we be worrying about AI alignment?” There are many ways unaligned AIs could end up killing humans—some of which humans probably haven’t even thought of yet and perhaps don’t even have the science/tech/intellect to think up. Whether they’d end up doing it with nanobots is neither here nor there.
I suppose that may be true, but if your view is that we definitely won’t lose control of computers at all, ever, that is quite a hard claim to defend. This scenario seems quite easy to occur at the level of an individual computer system. Suppose China develops an autonomous military robot which fires at human targets in a DMZ without humans being in the loop at all (I understand this has already happened), and that robot then gets hacked by a terrorist and reprogrammed, and the terrorist then gets killed and their password to control the robot is lost forever. We have then lost control of that robot, which is following the orders of the terrorist that the terrorist programmed into it, whatever they happen to be, until we take out the robot somehow. In principle, this needn’t even involve AI in any essential way.
But AGIs that involve goal-following and optimisation would make this problem much, much worse. An AI that is trying to fufill a simply-stated goal like “maximise iPhone production” would want to keep itself in existence and running, because if it no longer exists, its goal is perhaps less likely to be fulfilled (there could be an equally competent human, or an even better AI developed, but neither are guaranteed to happen). So, in the absence of humanity solving or at least partially solving the AI alignment problem, such an AI might try to stop humans trying to turn it off, or even kill them to prevent it from doing so. Being able to turn an AI off is a last-ditch solution if we can’t more directly control it—but by assumption there’s a risk that we can’t more directly control it if it’s sufficiently savvy and aware of what we’re trying to do, because it has a goal already and it would probably want to retain its current goal, because if it had a different goal then most likely its current goal would no longer get fulfilled.
So here I’ve already introduced two standard arguments about how sufficiently-advanced AIs are likely to behave and what their instrumental goals are likely to be. Instrumental goals are like sub-goals, the idea being that we can figure out what instrumental goals they’re likely to have in some cases, even if we don’t know what their final (i.e. top-level) goals that they’re going to be given will be. You might argue that these arguments are based on fictional things which don’t exist yet. This is true—and indeed, one way that AI alignment might never be necessary is if it turns out we can’t actually create an AGI. However, recent progress with large language models and other cutting-edge AI systems has rendered that possibility extremely implausible, to me.
But again, being based on fictional things which don’t exist yet isn’t a knockdown argument. Before the first nuclear weapon was tested, the physicists at the Manhattan Project were worried that it might ignite the atmosphere, so they did extensive calculations to satisfy themselves that it was in fact safe to test the nuclear weapon. If you had said to them, before the first bomb had been built, “this worry is based on a fictional thing which doesn’t exist yet” they would have looked at you like you were crazy. Obviously, your line of argument doesn’t make sense when you know how to build the thing and you are about to build the thing. I submit that it also doesn’t make sense when people don’t know how to build the thing and probably aren’t immediately about to build the thing, but might actually build the thing in 2-5 years time!
I am happy to cite chapter and verse for you for why you’re wrong, but if you’re going to reject our arguments out of hand we’re not going to have a very productive conversation.
No, it isn’t—because banking, scintillating as it may be, is not a general task, it’s a narrow domain—like chess, but not quite as narrow. Also, we still have human bankers to do higher-level tasks, it’s just that the basic operations of sending money from person A to person B have largely been automated.
This is the kind of basic misunderstanding that would have been avoided by more familiarity with the literature.
Generally this is true in the present day; however, goal-driven, optimising AIs would—see above. Even leaving aside the contentious arguments about convergent instrumental goals I recited above, if I’ve given you a goal of building a new iPhone factory on an island, and then someone proposes blowing up that entire island, you’re not going to want that to happen (quite apart from any humanitarian concern you may have for the present inhabitants of that island), and neither is an AI with such a goal. OK, you might be willing to compromise on the location for the factory after consulting your boss, but an AI with such a final goal is not going to be willing to—see above re goal immutability.
I agree—but I don’t see how this helps your case re existential risk. Indeed, non-sentient AIs might be more dangerous, as they would be unable to empathise with humans and therefore it would be easier for them to behave in psychopathic ways. I think you would benefit from seeing Yudkowsky et al’s arguments as supposing that unaligned AIs are “psychopathic”—which seems like a reasonable inference to me—he’d probably argue that the space of possibilities for viable AIs is almost entirely populated by psychopathic ones, from a human point of view.
The Turing Test is not a test for humans at all, it’s a test for AIs. Moreover, were a human to “take” it and “fail”, this wouldn’t prove anything—as your example shows.
The Loebner Prize people have claimed that it has already been passed by simple pre-GPT chatbots, but they’re wrong. For the purposes of this discussion, the relevant distinction is that no AIs can yet quite manage to think like an intelligent human in all circumstances, and that’s what the Turing Test was intended to measure. But, as noted above, GPT-4 has been argued to be getting close to this point.
Because we want to know when we should be really worried—both from a “who is going to lose their job?” point of view, and for us doomers, an existential risk point of view as well. The reason why doomers like me find this question relevant is because we believe there is a risk that when AGI is created, it will be able to recursively self-improve itself up to an artificial superintelligence, perhaps in a matter of weeks or months. Though more likely substantial hardware advancement would be required, which I guess would mean years or decades instead. And artificial superintelligence would be really scary because it could be almost impossible to control—again, given certain debatable assumptions, like that it could cross over into other datacentres, or bribe or threaten people to let it do so.
But remember, we are talking about AI risks here, not AI certainties. The fact that some of these assumptions might not hold true is not actually much comfort if we think that they have, say, a 90% chance of coming to pass.
I agree with you on this, and this is where I part company with Yudkowsky. However, I don’t think this belief is essential to AI doomerism—it just dictates whether we’re going to have some period of time to figure out how to stop an unaligned AI (my view) or no time at all (Yudkowsky’s view). But that may not be terribly relevant in the final analysis—because, as I already discussed previously, it may not be possible to stop an unaligned ASI once it’s been created and switched on and escaped from any “box” it may have been contained in, even if we had infinite time available to us.
And it’s worth noting that Ray Kurzweil didn’t mean the definition you gave by the Singularity—he just meant a point where progress is so fast it’s impossible to predict what will happen in detail before it starts.
We already have uncensorable, untrackable computer networks like Tor. We already have uncensorable, stochastically untrackable cryptocurrency networks like Monero. We have already seen computer viruses (worms) that spread in an uncontrolled manner around the internet given widespread security vulnerabilities that they can be programmed to take advantage of—and there are still plenty of those. We already have drones that could be used to attack people. Put all these together… maybe we could be dealing with a hard-to-control AI “infestation” that is trying to use drones or robots controlled over the internet to take out people and ultimately try to take over the world. The AI doesn’t even have to replicate itself around the internet to every computer, it can just put simple “slave” processes in regular computers, creating a botnet under its exclusive control, and then replicate itself a few times—as long as it can keep hopping from datacentre to datacentre and it can keep the number of instances of itself above zero at any one time, it survives, and as long as it has some kind of connection to the internet, even just the ability to make DNS queries, it might in principle be able to control its “slave processes” and take action in the world even as we try desperately to shut it down.
Hypothetical thinking is core to what it means to be human! It separates us from simpler creatures! It’s what higher intelligence is all about! Just because this is all hypothetical, doesn’t mean it can’t happen!
We’re not “certain” that we’re right in the faith-based way that religious people are certain that they’re right about God existing—we’re highly confident that we’re right to be concerned about existential risk because of our rough-and-ready assessment of the probabilities involved, and the fact that not all of our arguments are essential to our conclusion (even if nanobots won’t kill us we might still be killed by some other technique once the AI has automated its entire supply chain, etc.)
With existential risk, even a 1% risk of destroying the human species is something we should worry about—obviously, given a realistic path from here to there which explains how that could happen.
You’re effectively asking why the AIs would not choose to entertain themselves instead of fighting with us.
Present-day computers have no need to entertain themselves, and I see no reason why future AI systems would be any different. Effective altruists, like other human beings, are best advised to have fun sometimes, as our bodies and minds get tired and need to unwind, but probably AIs and robots will face no such constraints.
As for fighting… or, as Eliezer would have it, taking us all out in one feel swoop...
You’re effectively asking why the AIs would want our resources (e.g. the atoms in our bodies) if they have unlimited resources themselves. Well, this is kind of conflating two different things. I’m pretty sure an ASI could figure out how to generate enough cheap energy for all its needs, because we’re quite close to doing that ourselves as it is (nuclear fusion is 30 years away, hehe). But obviously an ASI wouldn’t have unlimited atoms, or unlimited space on Earth. Our bodies would contain atoms that it could use for something else, potentially, and we’d be taking up space that it could use for something else, potentially.
Yes, but the AI doesn’t know this unless you tell it—that’s the point of this wildly popular educational game about AI doom, which in turn was based on a famous thought experiment by Bostrom and/or Yudkowsky. I mean, the AI may know it, but even if it knows that on some level, if some idiot has given it a goal to simply maximise the production of iPhones, it’s not going to stop when everyone on Earth has one and a spare. Because as I’ve just stated it, its goal doesn’t say anything about stopping, or what’s enough.
And while you may think that would be easy enough to fix, there are so many other ways that an AI can be misaligned, it’s depressing. For example, suppose you set your AI humanoid robot a goal of cooking you and your child dinner, and you remember to tell it what counts as enough dinner, and you remember to tell it not to kill you. Oops, you forgot to mention not to kill your child! Rather than walking around your infant that happens to be crawling around on the floor, it treads on it, killing it, because that’s a more efficient route to the kitchen cupboard to get an ingredient it needs to cook dinner.
So no, climate change is something that seems similar but is only superficially. As I understand we now have the historic data that temperatures are rising and we have the historic data that this could mean many bad things. No computers are currently running around killing people of their own free will.
I would very much disagree with this. All the historic data shows that computers can be easily controlled, risk of death is very low (self driving cars are safer than human driven cars for example) and make our lives easier. The effects of climate change range from the very bad to the good.
Historically there is not one example of a computer doing anything other than what it was programmed to do. This is like arguing that aliens will turn up tomorrow. There is no evidence.
The robot is still simply doing what it was programmed to do. I agree that terrorists getting their hands on super weapons, including AI powered ones (for example using AI to create new viruses) is extremely dangerous. But that is not a sci fi scenario, our enemies getting hold of weapons we’ve created is common in history.
So this is a common argument that doesn’t make sense economically or from a safety view point. In order for an iPhone factory to be able to prevent itself from being turned off what capabilities would it require? Well it would need some way presumably to stop us humans from cutting its cables. I’d presume therefore that it would need autonomous armed guards. To prevent airstrikes on the factory maybe it would need an AA battery. But neither of those things are required for an iPhone factory. If you’ve programmed an iPhone factory with the capability to refuse to be turned off and given it armed robot drones, and AA guns then you’re an idiot. We already have iPhone factories that work just fine without any of those things. It doesn’t make sense from an economic resource utilisation point of view to upgrade then with dangerous stuff they don’t need.
I’ve heard similar arguments about “What if the AI fires off all the nukes?” Don’t give a complex algorithm control of the nukes in the first place!
A simpler scenario that might help understanding is the election system. Tom Scott had a great video on this. Why is election security so much more contentious in America than Britain? Because Americans are too lazy to do hand counting and use all sorts of computer systems instead. These systems are more hackable than the paper and pen and hand counting we use in the UK. But the important thing to understand here is that none of these scenario’s are the fault of any ‘super-intelligence’ but rather typical human super-stupidity.
I disagree and it’s something I find rather cringe about the whole ‘AI alignment’ field. For one thing, something isn’t useful of profitable until it’s safe. For instance we talk often about having ‘self driving’ cars in the future. But we’ve had self driving cars from the very beginning! I can go out to my ole gas guzzler right now, put a brick on the accelerator and it will drive itself into a wall. What we actually mean by ‘self driving cars’ is ‘cars that can drive themselves safely.’ THIS is what Tesla, Apple and Google are all working on. If you set up an outside organisation to ‘make sure AI self driving cars were safe’ people would think you were crackers because who would drive in an unsafe self driving car? Unsafe AI in 90%+ of cases will simply not be economically viable because why would you use something that’s unsafe when you already have the existing whatever it is that does the same thing safely (just slower or whatever.)
Everything is a narrow domain. No I will not explain further lol.
Anthropomorphising
Why does the existence of these secure networks make you more worried about AI and not less?
I haven’t had a computer virus in years. I’m sure AIs will create viruses and businesses will use AI to create ways to stop them. My money is on the side with more money which is the commercial and government side not the leet hackers.
A super AI virus is a realistic concern released by China or terrorists. It’s not a realistic concern that it creates itself from it’s own will.
No, I’m actually asking why us humans would allow our resources all to go into computers instead of things we want?
We’re not going to allow AIs to mine the moon to make themselves more powerful for instance, if we have that capability we’ll have them mine it to make space habitats instead.
Again this is human stupidity NOT AI super intelligence. And this is the real risk of AI!
We can go back to the man that killed himself because the chatbot told him too. There were two humans being stupid there. First the designers of the App who made a chatbot that was designed to be an agreeable friend. But they were so stupid they forgot to ask themselves ‘What if it agrees with someone suicidal?’ For all we know they’ve also forgot to ask themselves ‘What if it agrees with someone who wants to do an act of terrorism?’ They should have foreseen this but they didn’t because we’re stupid monkeys.
Then there is the man himself who instead of going to a human with his issues went to a frigging chat bot who gave him advice no human would ever give him. He also seems to have on some level believed the chat bot was real or sentient and that’s influenced his behaviour. He’s also given waaay too much credence to an algorithm designed simply to agree with him.
Now ask yourself, who would have foreseen this situation. Eliezer Yudkowsky who believes he is super intelligent, AIs will be even more super intelligent and anthropomorphises them constantly? I could absolutely see Eliezer killing himself because a chatbot told him too.
Or me who believes AIs are stupid, humans are stupid and thinking AIs are alive is really stupid?
Let’s go back to Wuhan… Was the real problem that humans were behaving as gods and we were eaten by our own superior creations? No! It’s that we’re stupid monkeys who were to lazy to close the laboratory door!
One of the main stupid things we are doing is anthropomorphising these things. This leads humans to think the computers are capable of things that they aren’t.
The fear this provokes is probably not that dangerous but the trust it engenders is very dangerous.
That trust will lead to people putting them in charge of the nukes or people following the advice of a Chatbot created for ISIS or astrologers.
Great discussion! I appreciate your post, it helped me form a more nuanced view of AI risk rather than subscribing to full-on doomerism.
I would, however, like to comment on your statement—“this is human stupidity NOT AI super intelligence. And this is the real risk of AI!”
I agree with this assessment, moreover, it seems to me that this “human stupidity” problem of our inability to design sufficiently good goals for AI is what the Alignment field is trying to solve.
It is true that no computer program has its own will. And there is no reason to believe that some future superintelligent program will suddenly stop following its programming instructions. However, given our current models that optimize for a vague goal (like in the example below), we need to develop smart solutions to encode our “true intentions” correctly into these models.
I think it’s best explained with an example: GPT-based chatbots are simply trained to predict the next word in a sentence, and it is not clear at a technical level how we can modify such a simple and specific goal of next word prediction to also include broad, complex instructions like “don’t agree with someone suicidal”. Current alignment methods like RLHF help to some extent, but there are no existing methods that guarantee, for example, that a model will never agree with someone’s suicidal thoughts. Such a lack of guarantees and control in our current training algorithms, and therefore our models, is problematic. And it seems to me this is the problem that alignment research tries to solve.
The idea of ‘alignment’ presupposes that you cannot control the computer and that it has its own will so you need to ‘align it’ ie incentivise it. But this isn’t the case, we can control them.
It’s true that machine learning AIs can create their own instructions and perform tasks however we still maintain overall control. We can constraint both inputs and outputs. We can nest the ‘intelligent’ machine learning part of the system within constraints that prevent unwanted outcomes. For instance ask an AI a question about feeling suicidal now and you’ll probably get an answer that’s been written by a human. That’s what I got last time I checked and the conversation was abrupty ended.