As I argued in my post, I think that we need a moratorium, and one that would lead to an indefinite set of strong restrictions on dangerous AIs, and continued restrictions and oversight on any types of systems that aren’t pretty rigorously provably safe, forever.
To be clear, I’m fine with locking in a set of nice regulations that can prevent dangerous AIs from coming about, if we know how to do that. I think the concept of a “pause” or “moratorium”—as it is traditionally understood, and explicitly outlined in the FLI letter—doesn’t merely mean that we should have legal rules for AI development. The standard meaning of “moratorium” is that we should not build the technology at all until the moratorium ends.
The end goal isn’t a situation where we give up on safety, it’s one where we insist that only safe “human-level” but effectively superhuman systems be built—once we can do that at all, which at present I think essentially everyone agrees we cannot.
Presently, the fact that we can’t build safe superhuman systems is mostly a side effect of the fact that we can’t build superhuman systems at all. By itself, that’s pretty trivial, and it’s not surprising that “essentially everyone” agrees on this point. However, I don’t think essentially everyone agrees that superhuman systems will be unsafe by default unless we give ourselves a lot of extra time right now to do safety research—and that seems closer to the claim that I’m arguing against in the post.
I don’t think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we’re discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay—which was supposed to frame the debate—explicitly tried to clarify that it’s not what anyone is actually discussing.
How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.
You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe—which I think we all agree are possible to create. And in the future, even if we can align systems, it’s unlikely that we can make unaligned systems impossible. So if nothing else, a Bing-like deployment of potentially aligned but currently unsafe systems is incredibly likely, especially if strong systems are open-sourced so that people can reverse any safety features.
How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.
You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe—which I think we all agree are possible to create.
I think that AI safety research will more-or-less simultaneously occur with AI capabilities research. I don’t think it’s a simple matter of thinking we need more safety before capabilities. I’d prefer to talk about something like the ratio of spending on capabilities to safety, or the specific regulatory regime we need, rather than how much safety research we need before moving forward with capabilities.
This is not so much a disagreement with what you said, but rather a comment about how I think we should frame the discussion.
I agree that we should be looking at investment, and carefully considering the offense-defense balance of the new technology. Investments into safety seem important, and we should certainly look at how to balance the two sides—but you were arguing against building a stop button, not saying that the real issue is that we need to figure out how much safety research (and, I hope, actual review of models and assurances of safety in each case,) is needed before proceeding. I agree with your claim that this is the key issue—which is why I think we desperately need a stop button for the case where it fails, and think we can’t build such a button later.
I don’t think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we’re discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay—which was supposed to frame the debate—explicitly tried to clarify that it’s not what anyone is actually discussing
I think Holly Elmore is also asking for an FLI-type pause. If I’m responding to two members of this debate, doesn’t that seem sufficient for my argument to be relevant?
I also think your essay was originally supposed to frame the debate, but no longer serves that purpose. There’s no indication in the original pause post from Ben West that we need to reply to your post.
Tens of thousands of people signed the FLI letter, and many people have asked for an “indefinite pause” on social media and in various articles in the last 12 months. I’m writing an essay in that context, and I don’t think it’s unreasonable to interpret people’s words at face value.
I don’t want to speak for her, but believe that Holly is advocating for both public response to dangerous systems, via advocacy, and shifting the default burden of proof towards those building powerful systems. Given that, stopping the most dangerous types of models—those scaled well beyond current capabilities—until companies agree that they need to prove they are safe before releasing them is critical. That’s certainly not stopping everything for a predefined period of time.
It seems like you’re ignoring other participants’ views in not responding to their actual ideas and claims. (I also think it’s disengenious to say “there’s no indication in the original pause post,” when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn’t respond to it. You didn’t write you post after he wrote his!)
Again, I think you’re pushing a literal interpretation as the only way anyone could support “Pause,” and the people you’re talking to are actively disagreeing. If you want to address that idea, I will agree you’ve done so, but think that continuing to insist that you’re talking to someone else discussing a different proposal that I agree is a bad idea will be detrimental to the discussion.
I also think it’s disengenious to say “there’s no indication in the original pause post,” when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn’t respond to it. You didn’t write you post after he wrote his!
I did write my post after he wrote his, so your claim is false. Also, Ben explicitly told me that I didn’t need to reply to you before I started writing my draft. I’d appreciate if you didn’t suggest that I’m being disingenuous on the basis of very weak evidence.
To be clear, I’m fine with locking in a set of nice regulations that can prevent dangerous AIs from coming about, if we know how to do that. I think the concept of a “pause” or “moratorium”—as it is traditionally understood, and explicitly outlined in the FLI letter—doesn’t merely mean that we should have legal rules for AI development. The standard meaning of “moratorium” is that we should not build the technology at all until the moratorium ends.
Presently, the fact that we can’t build safe superhuman systems is mostly a side effect of the fact that we can’t build superhuman systems at all. By itself, that’s pretty trivial, and it’s not surprising that “essentially everyone” agrees on this point. However, I don’t think essentially everyone agrees that superhuman systems will be unsafe by default unless we give ourselves a lot of extra time right now to do safety research—and that seems closer to the claim that I’m arguing against in the post.
I don’t think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we’re discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay—which was supposed to frame the debate—explicitly tried to clarify that it’s not what anyone is actually discussing.
How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.
You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe—which I think we all agree are possible to create. And in the future, even if we can align systems, it’s unlikely that we can make unaligned systems impossible. So if nothing else, a Bing-like deployment of potentially aligned but currently unsafe systems is incredibly likely, especially if strong systems are open-sourced so that people can reverse any safety features.
I think that AI safety research will more-or-less simultaneously occur with AI capabilities research. I don’t think it’s a simple matter of thinking we need more safety before capabilities. I’d prefer to talk about something like the ratio of spending on capabilities to safety, or the specific regulatory regime we need, rather than how much safety research we need before moving forward with capabilities.
This is not so much a disagreement with what you said, but rather a comment about how I think we should frame the discussion.
I agree that we should be looking at investment, and carefully considering the offense-defense balance of the new technology. Investments into safety seem important, and we should certainly look at how to balance the two sides—but you were arguing against building a stop button, not saying that the real issue is that we need to figure out how much safety research (and, I hope, actual review of models and assurances of safety in each case,) is needed before proceeding. I agree with your claim that this is the key issue—which is why I think we desperately need a stop button for the case where it fails, and think we can’t build such a button later.
I think Holly Elmore is also asking for an FLI-type pause. If I’m responding to two members of this debate, doesn’t that seem sufficient for my argument to be relevant?
I also think your essay was originally supposed to frame the debate, but no longer serves that purpose. There’s no indication in the original pause post from Ben West that we need to reply to your post.
Tens of thousands of people signed the FLI letter, and many people have asked for an “indefinite pause” on social media and in various articles in the last 12 months. I’m writing an essay in that context, and I don’t think it’s unreasonable to interpret people’s words at face value.
I don’t want to speak for her, but believe that Holly is advocating for both public response to dangerous systems, via advocacy, and shifting the default burden of proof towards those building powerful systems. Given that, stopping the most dangerous types of models—those scaled well beyond current capabilities—until companies agree that they need to prove they are safe before releasing them is critical. That’s certainly not stopping everything for a predefined period of time.
It seems like you’re ignoring other participants’ views in not responding to their actual ideas and claims. (I also think it’s disengenious to say “there’s no indication in the original pause post,” when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn’t respond to it. You didn’t write you post after he wrote his!)
Again, I think you’re pushing a literal interpretation as the only way anyone could support “Pause,” and the people you’re talking to are actively disagreeing. If you want to address that idea, I will agree you’ve done so, but think that continuing to insist that you’re talking to someone else discussing a different proposal that I agree is a bad idea will be detrimental to the discussion.
I did write my post after he wrote his, so your claim is false. Also, Ben explicitly told me that I didn’t need to reply to you before I started writing my draft. I’d appreciate if you didn’t suggest that I’m being disingenuous on the basis of very weak evidence.