Hayven Frienby comments on Some quick thoughts on “AI is easy to control”

Hayven Frienby 7 Dec 2023 16:00 UTC
−2 points
0 ∶ 0
Thanks for the article! Even though I have a few disagreements, let me start off by saying that I really, sincerely hope you are right. If you’re right, and I’m wrong, the world is better for everyone.
With that said, I think you staking your argument on two key asumptions:
1. We would have any substantial insight into the workings of a superintelligence
2. The burden of proof is on ASI skeptics to prove x-risk exists, rather than on those who claim alignment of an ASI is possible
I’d have to challenge both of these assumptions.
First of all, we could have no direct insight into—let alone control over—the mental processes of any being more intelligent than us. We could make hypotheses based on our own best interpretations of evidence available to us, but make no mistake that an actually existing ASI (aligned or otherwise) is a black box. And because of this, we could never be certain that it didn’t pose an x-risk to us. For example, a below-human-intelligence* AGI could genuinely possess what you’ve termed CEV values, but it could drift away from that as it increases in intelligence with no human awareness of this. Even if its neural circuits are rewired, it can store multiple copies of its plan in various places to avoid such a goal interruption, and if it is more intelligent than us, it’s safe to assume it has already done this. Such an AGI would be indistinguishable from an aligned model, but it would factually be misaligned. Humans just aren’t intuitively good at eliminating specific thoughts in the same way we could find, for example, a bad line of code in a traditional program. Therefore, I kind of have to conclude that the only thing that could reliably align an AGI is an equally intelligent AGI—and if this intelligence level is superhuman, the actions of both systems (the misaligned and the aligner) would be an absolute black box to us. I’m sorry, but this just doesn’t seem safe to me. (epistemic status -- ~80%)

Secondly, the burden of proof rests with the person making a positive claim, with skepticism being the default position. It seems more than rational to assume the existence of a superintelligent artificial system anywhere in our lightcone poses some x-risk to us, which means the assumption of x-risk with ASI is the default position. Therefore, the burden lies with pro-AGI advocates such as yourself to demonstrate, with hard data, that alignment is not only possible but the most likely outcome. (epistemic status -- ~90%)

I really hope this doesn’t come off as too harsh, that’s really not my intent at all!

*I don’t refer to any potentially conscious system as “subhuman,” all sentient beings (human / animal, or artificial) are intrinsically valuable
- MikhailSamin 7 Dec 2023 18:44 UTC
  3 points
  1 ∶ 0
  Parent
  pro-AGI advocates such as yourself
  I am somewhat confused by your comment. Are you replying to my EA Forum post, the linked article I was replying to, or both? I am certainly not a pro-AGI advocate in the sense you seem to imply: while I think we’re ought to create AGI eventually, after there’s a scientific consensus it’d be safe to, I’m certainly not suggesting to do this now. I’m the author of moratorium.ai, a resource advocating for an AI moratorium.
  I am not making an assumption that we’d have any substantial insight into the workings of a superintelligence. The way we develop AI now, we don’t know or understand what the billions-trillions of numbers that make them up represent, and don’t have a way to extract and understand the cognitive architecture that runs on these numbers.
  In my model, the default way we develop ASI is via deep learning, and I expect us to not understand that ASI and die shortly afterwards (~80%, and the 20% comes primarily from international governance delaying ASI until we solve all the related problems).
  I’d certainly hope we’d instead develop the architecture of ASI manually, thoroughly designing and understanding all its inner workings, not doing any gradient descent; unfortunately, at the moment, this doesn’t seem realistic (although I believe possible in principle).
  I have to say that I can imagine how it’d be possible, in principle, to make a safe AGI with deep learning. It’d require a research direction like Infra-Bayesianism to produce results and insights that could be used as constrains for a training run; and it’d require many other research results to avoid inner misalignment; but I think it’s not literally impossible to be relatively confident in safety of an AGI trained with deep learning. This is not something I expect to happen, at all, but it seems theoretically possible.
  You raise an important problem of stability under reflection. I don’t think it makes much sense to talk about a subhuman AGI performing CEV (it is a procedure that requires being more capable than humans). But designing a coherent AI system in a way that safely maximises humanity’s CEV, without drifting away from it even as it increases its capabilities, is indeed a complicated problem. I expect, again, that it is solvable in principle, even though I wouldn’t be surprised at all if it takes generations of thousands to millions of scientists to solve.
  Even if its neural circuits are rewired, it can store multiple copies of its plan in various places to avoid such a goal interruption, and if it is more intelligent than us, it’s safe to assume it has already done this
  I am not sure what you mean by that, as we don’t program these systems and they themselves can’t really rewrite their source code, and storing something internally requires something like gradient hacking; in any case, while generally I expect alignment to not be preserved as a system’s capabilities increase, there are systems that we could, theoretically, get to (although I don’t expect us to), that would care about CEV in a way that doesn’t change as they increase their capabilities.
  Humans just aren’t intuitively good at eliminating specific thoughts in the same way we could find, for example, a bad line of code in a traditional program. Therefore, I kind of have to conclude that the only thing that could reliably align an AGI is an equally intelligent AGI—and if this intelligence level is superhuman, the actions of both systems (the misaligned and the aligner) would be an absolute black box to us.
  Humanity is capable of producing complicated systems. We usually need to understand laws that govern them first, but we didn’t need rockets capable of getting to the Moon to design the first rocket capable of getting to the Moon. I think the problem of understanding some parts of the space of possible minds is solvable in principle; and it is possible to understand the laws that govern those parts of the space enough to come up with a target—with a mind design that would be safe—and then actually design and launch a corresponding mind, even if it is smarter than humans. Not that I expect any of that to happen within the time constraints.
  the burden of proof rests with the person making a positive claim, with skepticism being the default position. It seems more than rational to assume the existence of a superintelligent artificial system anywhere in our lightcone poses some x-risk to us, which means the assumption of x-risk with ASI is the default position
  I disagree with that. No previously existing technology wiped out humanity; for anthropic reasons, it’s not obvious how much of evidence this is, but most new technologies certainly haven’t wiped us out, and it seems like the default outside view to take about a new technology. The claim that ASI is likely to kill everyone is extraordinary and requires good reasons.
  I think we have really good reasons to think that, and I summarised some of them on moratorium.ai, in this post, and in my other posts. I think it is actively harmful to do advocacy that focuses on “ASI = x-risk by default, burden of proof is on those who build it”, without technical arguments. Policymakers are going to ask Meta about x-risk, and if the policymakers are not already familiar with our technical arguments, they’re going to believe whatever Meta is saying, as they seem to know what they’re talking about and don’t say anything flawed enough for a policymaker not familiar with our arguments to see. To avoid that, we need to explain the technical reasons why ASI is likely to literally kill everyone, in ways that’d, e.g., allow policymakers to call bullshit on what Meta representatives might be saying.
  pro-AGI advocates such as yourself to demonstrate, with hard data, that alignment is not only possible but the most likely outcome
  Sorry, but I’m confused about who it is addressed to, as I think alignment is extremely unlikely in the current situation. I think the probability of doom conditional on AGI before 2040 is >98%. I think I have good arguments for why it’s the case. I think there’s ~80% everyone will die, and I really hope I’m wrong. The burden on pro-AGI-ASAP advocates is to not just demonstrate that alignment is likely to someone new, but to refute my arguments and arguments of those I agree with, and establish a scientific consensus that’d agree with them.
  (Also,
  I don’t refer to any potentially conscious system as “subhuman,” all sentient beings (human / animal, or artificial) are intrinsically valuable
  By default, ASI is not going to be sentient (in the sense of having qualia), and I wouldn’t consider it to be intrinsically valuable. I also hope we won’t make sentient AIs until we’re fully ready to, see this post’s first footnote for some links.
  I’m talking about intelligence as I define it here. The quality of being generally subhuman or generally superhuman on this axis seems important.)
  - Hayven Frienby 7 Dec 2023 18:47 UTC
    1 point
    0 ∶ 0
    Parent
    I apologize, I must have misunderstood the post you quoted and confused it with your own position. I retract that part of my post.