Yeah wouldn’t want to fullthroatedly support technical safety work—in the worst case scenario, we might figure out how to build AGI safely, which would greatly advance AI capabilities.
The broader context, as I understand it, was that many forms of technical safety work either (a) are capabilities masquerading as safety or (b) are too dependent on the labs for compute and model access and thus vulnerable to industry capture. Without wholly endorsing this frame, I do think it is a reasonable concern to have and it accurately describes at least some work labeled as safety.
I understand that people undermining alignment work will attempt to justify it with the concern that it may advance capabilities and may not advance safety. I just think that’s a terribly mistaken drunk mormon hypothesis.
It’s proceeding from the false premise that somehow alignment and capabilities are two equal concerns that we trade off against each other, and if you can imagine a way that it ends up advancing capabilities and not advancing alignment, you shouldn’t do it. But the tradeoff is not anywhere close to equal, the more accurate perspective would be that if you can imagine it advancing alignment, you should do it.
You appear to be arguing against a position I do not actually hold. I support many forms of alignment work. I expect most avenues to fail, yet still be worth exploring. I expect many such avenues to advance capabilities very little or not at all. Much of interpretability seems to be like this. And even given that some forms of interpretability may be used to train smarter models, it still (probably) seems worth the price to understand those models better.
A subset of work labeled “alignment” or “safety” seems fundamentally flawed and at best a waste of resources. It might be aimed at the wrong things, like Elon Musk’s plan to make AI that “seeks truth”; or it might be falsely labeled “alignment” while actually doing something else, like attempts to get AI to refuse to talk about [insert politicized issue here]. Here I am deliberately picking obvious examples as an existence proof of this category; one may reasonably disagree as to what other forms of research qualify.
I do think there is a further kind of research which is just advancing capabilities and doing little for alignment, e.g. the attempts to automate AI research itself. In this case the tradeoff seems overwhelmingly lopsided against proceeding. The faster you build it, the sooner you die.
Alignment and capabilities are no more equal than poison and medicine. It is still a bad idea to drink a cup containing mostly poison.
I of course cannot speak for the reasoning of PauseAI leadership, and can only make an informed guess as to their stance on the tradeoffs involved.
You appear to be arguing against a position I do not actually hold
Yes, I was arguing against Pause AI’s position (both summarized by you in the piece and as expressed by their leadership elsewhere), I’m aware that you hadn’t expressed a strong pro- or con- stance on it either way.
My comments were a little overly combative, which wasn’t meant to be directed at you in particular—my frustration is that I think a lot of people consider “We cautiously support some technical safety work” to be a moderate position while I believe it is insane.
I agree that it’s theoretically possible for there to be bad alignment-capabilities tradeoffs, just like it’s theoretically possible for there to be bad tradeoffs of anything vs anything.
But in the actual world we live in today, there is widespread skepticism of the existence of AI risk, including among AI researchers. Many people are happy to work full steam ahead on developing AGI with no thought to potential existential risks, and declare this publicly. The issue has low salience among the general public and even less so abroad. And the difficulty of building AGI falls automatically year by year due to the advance of Moore’s law.
With all of these factors in play, I think AGI is happening. More to the point, if it doesn’t happen, it won’t be because a bunch of individual people who were worried about AI risk refrained from doing things that might advance capabilities; it would be because we somehow got an international agreement. Individual people withdrawing from the field doesn’t have anything to do with an AI pause; all it can do is slow things down. And there is a hard limit on how much individual withdrawal can slow things down, because there is a large core of people who believe AI poses no risk who want to advance it as much as possible who will not be dissuaded, and the more you slow down the more they’re aided by Moore’s law.
In this environment, it is really important to try to figure out alignment. That is the thing to focus on. The problem with all the examples of bad alignment research you gave above is not that they advance capabilities, it is that they are bad alignment research.
I can’t get over this sentence:
Yeah wouldn’t want to fullthroatedly support technical safety work—in the worst case scenario, we might figure out how to build AGI safely, which would greatly advance AI capabilities.
The broader context, as I understand it, was that many forms of technical safety work either (a) are capabilities masquerading as safety or (b) are too dependent on the labs for compute and model access and thus vulnerable to industry capture. Without wholly endorsing this frame, I do think it is a reasonable concern to have and it accurately describes at least some work labeled as safety.
I understand that people undermining alignment work will attempt to justify it with the concern that it may advance capabilities and may not advance safety. I just think that’s a terribly mistaken drunk mormon hypothesis.
It’s proceeding from the false premise that somehow alignment and capabilities are two equal concerns that we trade off against each other, and if you can imagine a way that it ends up advancing capabilities and not advancing alignment, you shouldn’t do it. But the tradeoff is not anywhere close to equal, the more accurate perspective would be that if you can imagine it advancing alignment, you should do it.
You appear to be arguing against a position I do not actually hold. I support many forms of alignment work. I expect most avenues to fail, yet still be worth exploring. I expect many such avenues to advance capabilities very little or not at all. Much of interpretability seems to be like this. And even given that some forms of interpretability may be used to train smarter models, it still (probably) seems worth the price to understand those models better.
A subset of work labeled “alignment” or “safety” seems fundamentally flawed and at best a waste of resources. It might be aimed at the wrong things, like Elon Musk’s plan to make AI that “seeks truth”; or it might be falsely labeled “alignment” while actually doing something else, like attempts to get AI to refuse to talk about [insert politicized issue here]. Here I am deliberately picking obvious examples as an existence proof of this category; one may reasonably disagree as to what other forms of research qualify.
I do think there is a further kind of research which is just advancing capabilities and doing little for alignment, e.g. the attempts to automate AI research itself. In this case the tradeoff seems overwhelmingly lopsided against proceeding. The faster you build it, the sooner you die.
Alignment and capabilities are no more equal than poison and medicine. It is still a bad idea to drink a cup containing mostly poison.
I of course cannot speak for the reasoning of PauseAI leadership, and can only make an informed guess as to their stance on the tradeoffs involved.
Yes, I was arguing against Pause AI’s position (both summarized by you in the piece and as expressed by their leadership elsewhere), I’m aware that you hadn’t expressed a strong pro- or con- stance on it either way.
My comments were a little overly combative, which wasn’t meant to be directed at you in particular—my frustration is that I think a lot of people consider “We cautiously support some technical safety work” to be a moderate position while I believe it is insane.
I agree that it’s theoretically possible for there to be bad alignment-capabilities tradeoffs, just like it’s theoretically possible for there to be bad tradeoffs of anything vs anything.
But in the actual world we live in today, there is widespread skepticism of the existence of AI risk, including among AI researchers. Many people are happy to work full steam ahead on developing AGI with no thought to potential existential risks, and declare this publicly. The issue has low salience among the general public and even less so abroad. And the difficulty of building AGI falls automatically year by year due to the advance of Moore’s law.
With all of these factors in play, I think AGI is happening. More to the point, if it doesn’t happen, it won’t be because a bunch of individual people who were worried about AI risk refrained from doing things that might advance capabilities; it would be because we somehow got an international agreement. Individual people withdrawing from the field doesn’t have anything to do with an AI pause; all it can do is slow things down. And there is a hard limit on how much individual withdrawal can slow things down, because there is a large core of people who believe AI poses no risk who want to advance it as much as possible who will not be dissuaded, and the more you slow down the more they’re aided by Moore’s law.
In this environment, it is really important to try to figure out alignment. That is the thing to focus on. The problem with all the examples of bad alignment research you gave above is not that they advance capabilities, it is that they are bad alignment research.