You appear to be arguing against a position I do not actually hold. I support many forms of alignment work. I expect most avenues to fail, yet still be worth exploring. I expect many such avenues to advance capabilities very little or not at all. Much of interpretability seems to be like this. And even given that some forms of interpretability may be used to train smarter models, it still (probably) seems worth the price to understand those models better.
A subset of work labeled “alignment” or “safety” seems fundamentally flawed and at best a waste of resources. It might be aimed at the wrong things, like Elon Musk’s plan to make AI that “seeks truth”; or it might be falsely labeled “alignment” while actually doing something else, like attempts to get AI to refuse to talk about [insert politicized issue here]. Here I am deliberately picking obvious examples as an existence proof of this category; one may reasonably disagree as to what other forms of research qualify.
I do think there is a further kind of research which is just advancing capabilities and doing little for alignment, e.g. the attempts to automate AI research itself. In this case the tradeoff seems overwhelmingly lopsided against proceeding. The faster you build it, the sooner you die.
Alignment and capabilities are no more equal than poison and medicine. It is still a bad idea to drink a cup containing mostly poison.
I of course cannot speak for the reasoning of PauseAI leadership, and can only make an informed guess as to their stance on the tradeoffs involved.
You appear to be arguing against a position I do not actually hold
Yes, I was arguing against Pause AI’s position (both summarized by you in the piece and as expressed by their leadership elsewhere), I’m aware that you hadn’t expressed a strong pro- or con- stance on it either way.
My comments were a little overly combative, which wasn’t meant to be directed at you in particular—my frustration is that I think a lot of people consider “We cautiously support some technical safety work” to be a moderate position while I believe it is insane.
I agree that it’s theoretically possible for there to be bad alignment-capabilities tradeoffs, just like it’s theoretically possible for there to be bad tradeoffs of anything vs anything.
But in the actual world we live in today, there is widespread skepticism of the existence of AI risk, including among AI researchers. Many people are happy to work full steam ahead on developing AGI with no thought to potential existential risks, and declare this publicly. The issue has low salience among the general public and even less so abroad. And the difficulty of building AGI falls automatically year by year due to the advance of Moore’s law.
With all of these factors in play, I think AGI is happening. More to the point, if it doesn’t happen, it won’t be because a bunch of individual people who were worried about AI risk refrained from doing things that might advance capabilities; it would be because we somehow got an international agreement. Individual people withdrawing from the field doesn’t have anything to do with an AI pause; all it can do is slow things down. And there is a hard limit on how much individual withdrawal can slow things down, because there is a large core of people who believe AI poses no risk who want to advance it as much as possible who will not be dissuaded, and the more you slow down the more they’re aided by Moore’s law.
In this environment, it is really important to try to figure out alignment. That is the thing to focus on. The problem with all the examples of bad alignment research you gave above is not that they advance capabilities, it is that they are bad alignment research.
You appear to be arguing against a position I do not actually hold. I support many forms of alignment work. I expect most avenues to fail, yet still be worth exploring. I expect many such avenues to advance capabilities very little or not at all. Much of interpretability seems to be like this. And even given that some forms of interpretability may be used to train smarter models, it still (probably) seems worth the price to understand those models better.
A subset of work labeled “alignment” or “safety” seems fundamentally flawed and at best a waste of resources. It might be aimed at the wrong things, like Elon Musk’s plan to make AI that “seeks truth”; or it might be falsely labeled “alignment” while actually doing something else, like attempts to get AI to refuse to talk about [insert politicized issue here]. Here I am deliberately picking obvious examples as an existence proof of this category; one may reasonably disagree as to what other forms of research qualify.
I do think there is a further kind of research which is just advancing capabilities and doing little for alignment, e.g. the attempts to automate AI research itself. In this case the tradeoff seems overwhelmingly lopsided against proceeding. The faster you build it, the sooner you die.
Alignment and capabilities are no more equal than poison and medicine. It is still a bad idea to drink a cup containing mostly poison.
I of course cannot speak for the reasoning of PauseAI leadership, and can only make an informed guess as to their stance on the tradeoffs involved.
Yes, I was arguing against Pause AI’s position (both summarized by you in the piece and as expressed by their leadership elsewhere), I’m aware that you hadn’t expressed a strong pro- or con- stance on it either way.
My comments were a little overly combative, which wasn’t meant to be directed at you in particular—my frustration is that I think a lot of people consider “We cautiously support some technical safety work” to be a moderate position while I believe it is insane.
I agree that it’s theoretically possible for there to be bad alignment-capabilities tradeoffs, just like it’s theoretically possible for there to be bad tradeoffs of anything vs anything.
But in the actual world we live in today, there is widespread skepticism of the existence of AI risk, including among AI researchers. Many people are happy to work full steam ahead on developing AGI with no thought to potential existential risks, and declare this publicly. The issue has low salience among the general public and even less so abroad. And the difficulty of building AGI falls automatically year by year due to the advance of Moore’s law.
With all of these factors in play, I think AGI is happening. More to the point, if it doesn’t happen, it won’t be because a bunch of individual people who were worried about AI risk refrained from doing things that might advance capabilities; it would be because we somehow got an international agreement. Individual people withdrawing from the field doesn’t have anything to do with an AI pause; all it can do is slow things down. And there is a hard limit on how much individual withdrawal can slow things down, because there is a large core of people who believe AI poses no risk who want to advance it as much as possible who will not be dissuaded, and the more you slow down the more they’re aided by Moore’s law.
In this environment, it is really important to try to figure out alignment. That is the thing to focus on. The problem with all the examples of bad alignment research you gave above is not that they advance capabilities, it is that they are bad alignment research.