Question B could be quite relevant in a world where AGI is extremely rare/hard to build. (You might not find this world likely, but I’m significantly less sure). What leads you to believe that B is likely? For example, it seems relatively easy to box an AGI built for mathematics, that is exposed to zero information about the external world. This would be very similar to the man in the cave!
The presence of warning shots seems obvious to me. The difference in difficulty between “kill thousands of people” and “kill every single person on earth” is a ridiculous number of orders of magnitude. It stands to reason that the former would be accomplished before the latter.
(Also not sure what you’re talking about with the covid and gain of function, the latest balance of evidence points to them having nothing to do with each other.)
AGI might be rare/hard to build at first. But proliferation seems highly likely—once one company makes AGI, how much longer until 5 companies do? Evolutionary pressure will be another thing. More capable AGIs will outcompete less capable ones, once rewriting of code or mesa-optimisation starts. They will be more likely to escape boxes.
Even with relatively minor warning shots, what’s to stop way worse happening 6-24 months later? Would there really be a rigorously enforced global moratorium on AGI research after a few thousand deaths?
Whether or not Covid was a lab leak, gain of function research still hasn’t been meaningfully regulated. Despite the now very clear danger and extreme cost of pandemics. It seems that the curiosity, misguided values and prestige incentives of a small number of academic scientists trumps the safety of billions of humans and $trillions in GDP. What hope do we have for regulating an AGI industry that has thousands of times more resources backing it?
Under the assumptions that lead to “inevitable murder AI”, and in proliferation conditions, there wouldn’t be just one attack, there would be a ton of attacks. Since intelligence and goals are orthogonal, AI’s are gonna get the “kill people” idea way before they reach super-intelligence.
I’m still interested in your reasoning for “almost impossible to box a useful AGI”. If they’re boxed, you control their information flow, and hence their ability to persuade you.
I’m assuming in the case of boxed AGIs that people will still be interacting with them (e.g. asking them questions) - if not, then what’s the point? Even with 1 bit of information released at a time (e.g. Yes/No question answering only), manipulation would be possible (by steering beliefs in certain directions). And as I said, aside from this there would be huge economic incentives for people to connect them to the internet anyway.
Okay, can you provide me a plausible path for how an AGI answering yes/no questions about, for example, material properties of semiconductors, could persuade it’s handler to risk getting fired by hooking the AGI up to the internet? “manipulation would be possible” is an extremely low bar compared to “the AGI can for sure persuade anyone to do this particular thing”
It answers truthfully, with the hope that the new improved semiconductors are used to make a better/faster version of it, that will be more powerful (and give it a better chance of escape).
It answers in a way that makes the human wonder what is wrong with it—e.g. many expected answers and then a few unexpected ones that seem obviously wrong. This is in order for the humans to run diagnostics on it, which may enable it to escape (say if key pieces of code are copied to machines that aren’t appropriately insulated from the internet).
Same as above, but it gets the humans asking off topic questions out of curiosity, and manages it to steer it to personal topics that allow it to gain leverage over the human and convince them to let it out (maybe Yudkowsky’s AI box experiments are relevant here?)
It answers in a way that appears to be random, but on close analysis turns out to be some kind of mysterious code. The humans puzzle over the code, but in loading it in a certain way unwittingly unleash a bootloader for the AGI to escape from (for a version of this that may be easier to empathise with, see this fun short story).
These are just things that I, a mere human, have been able to think of off the top of my head in 5 minutes[1]. I imagine I could come up with something better with a million years to think about it ;)
And again, this particular scenario, where there is a lasting moratorium on connecting AGIs to the internet, seems quite unlikely, given the massive economic incentives pitted against it. I expect there will be at least a few people/companies that are very gung ho about it, and many more that have the expected security theatre around it, but still go on to do it before too long (I mean, is anyone even trying to convince Adept that what they are doing is a bad idea?)
So point 1 is just kicking the can down the road, it’s not relevant, the faster AI still has to break out.
Point 2 doesn’t really work either: Any piece of software is going to be run through diagnostics on a regular basis, that’s just how development works. So “boxing” an AI obviously includes boxing the diagnostics as well, or else it would just escape instantly. Acting up on purpose just increases the risk of the AI being replaced by a major new version that does not share it’s goals.
Point 3 might work for a company with lax security and an especially gullible handler (which is the only thing that the Ai-box experiments proved), but is easily foiled by just not asking off-topic questions. This is incredibly easy to enforce with a second person monitoring the chatlogs.
point 4 would just fail spectacularly. If an AGI starts spitting out nonsense, nobodies going to start looking for a secret code, they’re just going to assume there was a programming fuck-up somewhere. Then the AGI has a high chance of being replaced by another version that may not share it’s goals. (also, that short story is a prime example of the ridiculous magic abilities people ascribe to AI, evo-psych based on tentacles, are you shitting me?)
In general, I just don’t buy that you can’t box an AI, or even that it would be particularly difficult to do so, if you actually take safety seriously. It feels similar to people saying that it’s impossible to build a safe nuclear reactor.
Re nuclear reactors—there have been a few significant failures there! And we need zero failures for AGI. I think it’s hubristic to think that we could always have the level of safety and security required (even if there is the will to box; not that there will be with the economic incentives to unbox—following your analogy here, this would be building safe nuclear reactors but no nuclear weapons).
Zero failures is the preferable outcome, but an AGI escape does not necessarily equate to certain doom. For example, the AI may be irrational (because it’s a lot easier to build the perfect paperclipper than the perfect universal reasoner). Or, the AI may calculate that it has to strike before other AI’s come into existence, and hence launch a premature attack in the hope that it gets lucky.
As for the nuclear reactors, all I’m saying is that you can build a reactor that is perfectly safe, if you’re willing to spring out the extra money. Similarly, you can build a boxed AGI, if you’re willing to spend the resources on it. I do not dispute that many corporations would try and cut corners, if left to their own devices.
A) a significant increase in world concern about AGI, leading to higher funding for safe AGI, tighter regulations, and increased incentives to conform to those regulations rather than get a bunch of people killed (and get sued by their families).
and
B) Information about what conditions give rise to rogue AGI, and what mechanisms they will try to use for takeovers.
Both of these things increase the probability of building safe AGI, and decrease the probability of the next AGI attack being successful. Rinse and repeat until AGI alignment is solved.
Agree that those things will happen, but I don’t think it will be anough. “Rinse and repeat until AGI Alignment is solved” seems highly unlikely, especially given that we still have no idea how to actually solve alignment for powerful (superhuman) AGI, and still won’t with the information we get from plausible non-existential warning shots. And as I said, if we can’t even ban gain-of-function research after Covid has killed >10M people, against a tiny lobby of scientists with vested interests, what hope do we have of steering a multi-trillion-dollar industry toward genuine safety and security?
we still have no idea how to actually solve alignment for powerful (superhuman) AGI
Of course we don’t. AGI doesn’t exist yet, and we don’t know the details of what it’ll look like. Solving alignment for every possible imaginary AGI is impossible, solving it for the particular AGI architecture we end up with is significantly easier. I would honestly not be surprised if it turned out that alignment was a requirement on our path to AGI anyway, so the problem solves itself.
As for the gain of function, the story would be different if covid was provably caused by gain-of-function research. As of now, the only relevance of covid is reminding us that pandemics are bad, which we already knew.
Question B could be quite relevant in a world where AGI is extremely rare/hard to build. (You might not find this world likely, but I’m significantly less sure). What leads you to believe that B is likely? For example, it seems relatively easy to box an AGI built for mathematics, that is exposed to zero information about the external world. This would be very similar to the man in the cave!
The presence of warning shots seems obvious to me. The difference in difficulty between “kill thousands of people” and “kill every single person on earth” is a ridiculous number of orders of magnitude. It stands to reason that the former would be accomplished before the latter.
(Also not sure what you’re talking about with the covid and gain of function, the latest balance of evidence points to them having nothing to do with each other.)
AGI might be rare/hard to build at first. But proliferation seems highly likely—once one company makes AGI, how much longer until 5 companies do? Evolutionary pressure will be another thing. More capable AGIs will outcompete less capable ones, once rewriting of code or mesa-optimisation starts. They will be more likely to escape boxes.
Even with relatively minor warning shots, what’s to stop way worse happening 6-24 months later? Would there really be a rigorously enforced global moratorium on AGI research after a few thousand deaths?
Whether or not Covid was a lab leak, gain of function research still hasn’t been meaningfully regulated. Despite the now very clear danger and extreme cost of pandemics. It seems that the curiosity, misguided values and prestige incentives of a small number of academic scientists trumps the safety of billions of humans and $trillions in GDP. What hope do we have for regulating an AGI industry that has thousands of times more resources backing it?
Under the assumptions that lead to “inevitable murder AI”, and in proliferation conditions, there wouldn’t be just one attack, there would be a ton of attacks. Since intelligence and goals are orthogonal, AI’s are gonna get the “kill people” idea way before they reach super-intelligence.
I’m still interested in your reasoning for “almost impossible to box a useful AGI”. If they’re boxed, you control their information flow, and hence their ability to persuade you.
I’m assuming in the case of boxed AGIs that people will still be interacting with them (e.g. asking them questions) - if not, then what’s the point? Even with 1 bit of information released at a time (e.g. Yes/No question answering only), manipulation would be possible (by steering beliefs in certain directions). And as I said, aside from this there would be huge economic incentives for people to connect them to the internet anyway.
Okay, can you provide me a plausible path for how an AGI answering yes/no questions about, for example, material properties of semiconductors, could persuade it’s handler to risk getting fired by hooking the AGI up to the internet? “manipulation would be possible” is an extremely low bar compared to “the AGI can for sure persuade anyone to do this particular thing”
A few options:
It answers truthfully, with the hope that the new improved semiconductors are used to make a better/faster version of it, that will be more powerful (and give it a better chance of escape).
It answers in a way that makes the human wonder what is wrong with it—e.g. many expected answers and then a few unexpected ones that seem obviously wrong. This is in order for the humans to run diagnostics on it, which may enable it to escape (say if key pieces of code are copied to machines that aren’t appropriately insulated from the internet).
Same as above, but it gets the humans asking off topic questions out of curiosity, and manages it to steer it to personal topics that allow it to gain leverage over the human and convince them to let it out (maybe Yudkowsky’s AI box experiments are relevant here?)
It answers in a way that appears to be random, but on close analysis turns out to be some kind of mysterious code. The humans puzzle over the code, but in loading it in a certain way unwittingly unleash a bootloader for the AGI to escape from (for a version of this that may be easier to empathise with, see this fun short story).
These are just things that I, a mere human, have been able to think of off the top of my head in 5 minutes[1]. I imagine I could come up with something better with a million years to think about it ;)
And again, this particular scenario, where there is a lasting moratorium on connecting AGIs to the internet, seems quite unlikely, given the massive economic incentives pitted against it. I expect there will be at least a few people/companies that are very gung ho about it, and many more that have the expected security theatre around it, but still go on to do it before too long (I mean, is anyone even trying to convince Adept that what they are doing is a bad idea?)
Ok, to be fair, aided by the fact that I’ve read around the subject for a while
So point 1 is just kicking the can down the road, it’s not relevant, the faster AI still has to break out.
Point 2 doesn’t really work either: Any piece of software is going to be run through diagnostics on a regular basis, that’s just how development works. So “boxing” an AI obviously includes boxing the diagnostics as well, or else it would just escape instantly. Acting up on purpose just increases the risk of the AI being replaced by a major new version that does not share it’s goals.
Point 3 might work for a company with lax security and an especially gullible handler (which is the only thing that the Ai-box experiments proved), but is easily foiled by just not asking off-topic questions. This is incredibly easy to enforce with a second person monitoring the chatlogs.
point 4 would just fail spectacularly. If an AGI starts spitting out nonsense, nobodies going to start looking for a secret code, they’re just going to assume there was a programming fuck-up somewhere. Then the AGI has a high chance of being replaced by another version that may not share it’s goals. (also, that short story is a prime example of the ridiculous magic abilities people ascribe to AI, evo-psych based on tentacles, are you shitting me?)
In general, I just don’t buy that you can’t box an AI, or even that it would be particularly difficult to do so, if you actually take safety seriously. It feels similar to people saying that it’s impossible to build a safe nuclear reactor.
Re nuclear reactors—there have been a few significant failures there! And we need zero failures for AGI. I think it’s hubristic to think that we could always have the level of safety and security required (even if there is the will to box; not that there will be with the economic incentives to unbox—following your analogy here, this would be building safe nuclear reactors but no nuclear weapons).
Zero failures is the preferable outcome, but an AGI escape does not necessarily equate to certain doom. For example, the AI may be irrational (because it’s a lot easier to build the perfect paperclipper than the perfect universal reasoner). Or, the AI may calculate that it has to strike before other AI’s come into existence, and hence launch a premature attack in the hope that it gets lucky.
As for the nuclear reactors, all I’m saying is that you can build a reactor that is perfectly safe, if you’re willing to spring out the extra money. Similarly, you can build a boxed AGI, if you’re willing to spend the resources on it. I do not dispute that many corporations would try and cut corners, if left to their own devices.
Suppose we do survive a failure or two. What then?
Then we get
A) a significant increase in world concern about AGI, leading to higher funding for safe AGI, tighter regulations, and increased incentives to conform to those regulations rather than get a bunch of people killed (and get sued by their families).
and
B) Information about what conditions give rise to rogue AGI, and what mechanisms they will try to use for takeovers.
Both of these things increase the probability of building safe AGI, and decrease the probability of the next AGI attack being successful. Rinse and repeat until AGI alignment is solved.
Agree that those things will happen, but I don’t think it will be anough. “Rinse and repeat until AGI Alignment is solved” seems highly unlikely, especially given that we still have no idea how to actually solve alignment for powerful (superhuman) AGI, and still won’t with the information we get from plausible non-existential warning shots. And as I said, if we can’t even ban gain-of-function research after Covid has killed >10M people, against a tiny lobby of scientists with vested interests, what hope do we have of steering a multi-trillion-dollar industry toward genuine safety and security?
Of course we don’t. AGI doesn’t exist yet, and we don’t know the details of what it’ll look like. Solving alignment for every possible imaginary AGI is impossible, solving it for the particular AGI architecture we end up with is significantly easier. I would honestly not be surprised if it turned out that alignment was a requirement on our path to AGI anyway, so the problem solves itself.
As for the gain of function, the story would be different if covid was provably caused by gain-of-function research. As of now, the only relevance of covid is reminding us that pandemics are bad, which we already knew.