I understand the arguments towards encouraging gradual development vs. fast takeoff, but I don’t understand this argument I’ve heard multiple times (not just on this post) that “we need capabilities to increase so that we can stay up to date with alignment research”.
First I thought there’s still a lot of work we could do with current capabilities—technical alignment is surely limited by time, money and manpower not just by computing power. I’m also guessing less powerful AI could be made during a “pause” specifically for alignment research
Second in a theoretical situation where capabilities research globally stopped overnight, isn’t this just free-extra-time for the human race where we aren’t moving towards doom? That feels pretty valuable and high EV in and of itself.
It seems to me the argument would have to be that the advantage to the safety work of improving capabilities would outstrip the increasing risk of dangerous GAI, which I find hard to get my head around, but I might be missing something important.
Second in a theoretical situation where capabilities research globally stopped overnight, isn’t this just free-extra-time for the human race where we aren’t moving towards doom? That feels pretty valuable and high EV in and of itself.
I’m interpreting this as saying that buying humanity more time, in and of itself, is good.
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk. Two reasons for why I think this:
Astronomical waste argument. Time post-transformative-AI is way more valuable than time now, assuming some (but strong version not necessary) aggregating/total utilitarianism. If I was trading clock-time seconds now for seconds a thousand years from now, assuming no difference in existential risk, I would probably be willing to trade every historical second of humans living good lives for like a minute a thousand years from now, because it seems like we could have a ton of (morally relevant) people in the future, and the moral value derived from their experience could be significantly greater than current humans.
The moral value of the current world seems plausibly negative due to large amounts of suffering. Factory farming, wild animal suffering, humans experiencing suffering, and more, seem like they make the total sign unclear. Under moral views that weigh suffering more highly than happiness, there’s an even stronger case for the current world being net-negative. This is one of those arguments that I think is pretty weird and almost never affects my actions, but it is relevant to the question of whether extra time for the human race is positive EV.
Third argument about how AI sooner could help reduce other existential risks. e.g., normal example of AI speeding up vaccine research, or weirder example of AI enabling space colonization, and being on many planets makes x-risk lower. I don’t personally put very much weight on this argument, but it’s worth mentioning.
I Faild to point out my central assumpton here, that Transformative AI in our current state of poor preparedness is net negative due to the existential risk it entails.
Its a good point about time pre transformative AI not being so valuable in the grand scheme of the future, but that ev would increase substantally assuming transformative AI is the end.
Still looking for the fleshing out of this argument that I don’t understand—if anyone can be bothered!
”It seems to me the argument would have to be that the advantage to the safety work of improving capabilities would outstrip the increasing risk of dangerous GAI, which I find hard to get my head around, but I might be missing something important.”
What is your p(doom|AGI)? (Assuming AGI is developed in the next decade.)
Note that Bostrom himself says in Astronomical Waste (my emphasis in bold):
However, the true lesson is a different one. If what we are concerned with is (something like) maximizing the expected number of worthwhile lives that we will create, then in addition to the opportunity cost of delayed colonization, we have to take into account the risk of failure to colonize at all. We might fall victim to an existential risk, one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.[8] Because the lifespan of galaxies is measured in billions of years, whereas the time-scale of any delays that we could realistically affect would rather be measured in years or decades, the consideration of risk trumps the consideration of opportunity cost. For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
I also think it’s bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn’t feel like good faith so much as bullying people into agreeing with you. But that’s just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my “all things considered” view is more like 40% doom.
trying to force somebody into a corner of agreeing with you.
It’s really not. I’m trying to understand where people are coming from. If someone has low p(doom|AGI), then it makes sense that they don’t see pausing AI development as urgent. Or their p(doom) relative to their actions can give some idea of how risk taking they are (but I still don’t understand how OpenAI and their supporters think it’s ok to gamble 100s of millions of lives in expectation for a shot at utopia without any democratic mandate).
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
and
“all things considered” view is more like 40% doom.
Surely means that extra time now (pausing) is extremely valuable? i.e. because of its impact on existential risk.
Or do you think that the chance we’re in a net negative world now means that the astronomical future we could save would also most likely be net negative? I don’ think this follows. Or that continuing to allow AI to speed up now will actually prevent extinction threats in the next 10 years that we would otherwise be wiped out by (this seems very unlikely to me).
Sorry, I agree my previous comment was a bit intense. I think I wouldn’t get triggered if you instead asked “I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it’s very likely (>50%), what do you think?”
P(doom) is not why I disagree with you. It feels a little like if I’m arguing with an environmentalist about recycling and they go “wow do you even care about the environment?” Sure, that could be a crux, but in this case it isn’t and the question is asked in a way that is trying to force me to agree with them. I think asking about AGI beliefs is much less bad, but it feels similar.
I think it’s pretty unclear if extra time now positively impacts existential risk. I wrote about a little bit of this here, and many others have discussed similar things. I expect this is the source of our disagreement, but I’m not sure.
Thanks Aaron that’s a good article appreciate it. It still wasn’t clear to me they were making an argument that increasing capabilities could be net positive, more that safety people should be working with whatever is the current most powerful model
”But we also cannot let excessive caution make it so that the most safety-conscious research efforts only ever engage with systems that are far behind the frontier.”
This makes sense to me, the best safety researchers should have full access to the current most advanced models, preferably in my eyes before they have been (fully) trained.
But then I don’t understand their next sentance “Navigating these tradeoffs responsibly is a balancing act, and these concerns are central to how we make strategic decisions as an organization.”
I’m clearly missing something, what’s the tradeoff? Is working on safety with the most advanced current model while generally slowing everything down not the best approach? This doesn’t seem like a tradeoff to me
How is there any net safety advantage in increasing AI capacity?
Anthropic[1] have a massive conflict of interest (making money), so their statements are in some sense safetywashing. There is at least a few years worth of safety work that can be done on current models if we had the time (i.e. via a pause): interpretability is still stuck on trying to decipher GPT-2 sized models and smaller. And jailbreaks are still very far from being solved. Plenty to be getting on with without pushing the frontier of capabilities yet further.
The assumptions are that more powerful models won’t be like weaker models but more accurate. They will show emergent abilities. Many things that gpt-4 can solve gpt-3 cannot, and those models share a similar lineage.
Safety issues show up when you have a model powerful enough to even exhibit them, and they may not be anything you predicted will happen from theory. Waluigi effect, hallucinations—both were not predicted by any theory by AI safety research groups. They seem to be the majority of the issues with models at the current level of capabilities.
Free extra time is good. The reasonable version of the argument is that you should avoid buying total-time in ways that cost time with more powerful systems; maybe AI progress will look like the purple line.
Nice post. Why didn’t you post it here for AI pause debate week haha.
Yes I somewhat understand this potential “overhang” danger as an argument in and of itself against a pause., I just don’t see how it relates to technical alignment research specifically.
Simple and genuine question from a non-AI guy
I understand the arguments towards encouraging gradual development vs. fast takeoff, but I don’t understand this argument I’ve heard multiple times (not just on this post) that “we need capabilities to increase so that we can stay up to date with alignment research”.
First I thought there’s still a lot of work we could do with current capabilities—technical alignment is surely limited by time, money and manpower not just by computing power. I’m also guessing less powerful AI could be made during a “pause” specifically for alignment research
Second in a theoretical situation where capabilities research globally stopped overnight, isn’t this just free-extra-time for the human race where we aren’t moving towards doom? That feels pretty valuable and high EV in and of itself.
It seems to me the argument would have to be that the advantage to the safety work of improving capabilities would outstrip the increasing risk of dangerous GAI, which I find hard to get my head around, but I might be missing something important.
Thanks.
Not responding to your main question:
I’m interpreting this as saying that buying humanity more time, in and of itself, is good.
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk. Two reasons for why I think this:
Astronomical waste argument. Time post-transformative-AI is way more valuable than time now, assuming some (but strong version not necessary) aggregating/total utilitarianism. If I was trading clock-time seconds now for seconds a thousand years from now, assuming no difference in existential risk, I would probably be willing to trade every historical second of humans living good lives for like a minute a thousand years from now, because it seems like we could have a ton of (morally relevant) people in the future, and the moral value derived from their experience could be significantly greater than current humans.
The moral value of the current world seems plausibly negative due to large amounts of suffering. Factory farming, wild animal suffering, humans experiencing suffering, and more, seem like they make the total sign unclear. Under moral views that weigh suffering more highly than happiness, there’s an even stronger case for the current world being net-negative. This is one of those arguments that I think is pretty weird and almost never affects my actions, but it is relevant to the question of whether extra time for the human race is positive EV.
Third argument about how AI sooner could help reduce other existential risks. e.g., normal example of AI speeding up vaccine research, or weirder example of AI enabling space colonization, and being on many planets makes x-risk lower. I don’t personally put very much weight on this argument, but it’s worth mentioning.
Thanks Aaron appreciate the effort.
I Faild to point out my central assumpton here, that Transformative AI in our current state of poor preparedness is net negative due to the existential risk it entails.
Its a good point about time pre transformative AI not being so valuable in the grand scheme of the future, but that ev would increase substantally assuming transformative AI is the end.
Still looking for the fleshing out of this argument that I don’t understand—if anyone can be bothered!
”It seems to me the argument would have to be that the advantage to the safety work of improving capabilities would outstrip the increasing risk of dangerous GAI, which I find hard to get my head around, but I might be missing something important.”
What is your p(doom|AGI)? (Assuming AGI is developed in the next decade.)
Note that Bostrom himself says in Astronomical Waste (my emphasis in bold):
I don’t think you read my comment:
I also think it’s bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn’t feel like good faith so much as bullying people into agreeing with you. But that’s just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my “all things considered” view is more like 40% doom.
Wow that escalated quickly :(
It’s really not. I’m trying to understand where people are coming from. If someone has low p(doom|AGI), then it makes sense that they don’t see pausing AI development as urgent. Or their p(doom) relative to their actions can give some idea of how risk taking they are (but I still don’t understand how OpenAI and their supporters think it’s ok to gamble 100s of millions of lives in expectation for a shot at utopia without any democratic mandate).
and
Surely means that extra time now (pausing) is extremely valuable? i.e. because of its impact on existential risk.
Or do you think that the chance we’re in a net negative world now means that the astronomical future we could save would also most likely be net negative? I don’ think this follows. Or that continuing to allow AI to speed up now will actually prevent extinction threats in the next 10 years that we would otherwise be wiped out by (this seems very unlikely to me).
Sorry, I agree my previous comment was a bit intense. I think I wouldn’t get triggered if you instead asked “I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it’s very likely (>50%), what do you think?”
P(doom) is not why I disagree with you. It feels a little like if I’m arguing with an environmentalist about recycling and they go “wow do you even care about the environment?” Sure, that could be a crux, but in this case it isn’t and the question is asked in a way that is trying to force me to agree with them. I think asking about AGI beliefs is much less bad, but it feels similar.
I think it’s pretty unclear if extra time now positively impacts existential risk. I wrote about a little bit of this here, and many others have discussed similar things. I expect this is the source of our disagreement, but I’m not sure.
I think one of the better write-ups about this perspective is Anthropic’s Core Views on AI Safety.
From its main text, under the heading The Role of Frontier Models in Empirical Safety, a couple relevant arguments are:
Many safety concerns arise with powerful systems, so we need to have powerful systems to experiment with
Many safety methods require large/powerful models
Need to understand how both problems and our fixes change with model scale (if model gets bigger, does it look like safety technique is still working)
To get evidence of powerful models being dangerous (which is important for many reasons), you need the powerful models.
Thanks Aaron that’s a good article appreciate it. It still wasn’t clear to me they were making an argument that increasing capabilities could be net positive, more that safety people should be working with whatever is the current most powerful model
”But we also cannot let excessive caution make it so that the most safety-conscious research efforts only ever engage with systems that are far behind the frontier.”
This makes sense to me, the best safety researchers should have full access to the current most advanced models, preferably in my eyes before they have been (fully) trained.
But then I don’t understand their next sentance “Navigating these tradeoffs responsibly is a balancing act, and these concerns are central to how we make strategic decisions as an organization.”
I’m clearly missing something, what’s the tradeoff? Is working on safety with the most advanced current model while generally slowing everything down not the best approach? This doesn’t seem like a tradeoff to me
How is there any net safety advantage in increasing AI capacity?
Anthropic[1] have a massive conflict of interest (making money), so their statements are in some sense safetywashing. There is at least a few years worth of safety work that can be done on current models if we had the time (i.e. via a pause): interpretability is still stuck on trying to decipher GPT-2 sized models and smaller. And jailbreaks are still very far from being solved. Plenty to be getting on with without pushing the frontier of capabilities yet further.
And the other big AI companies that supposedly care about x-safety (OpenAI, Google DeepMind)
The assumptions are that more powerful models won’t be like weaker models but more accurate. They will show emergent abilities. Many things that gpt-4 can solve gpt-3 cannot, and those models share a similar lineage.
Safety issues show up when you have a model powerful enough to even exhibit them, and they may not be anything you predicted will happen from theory. Waluigi effect, hallucinations—both were not predicted by any theory by AI safety research groups. They seem to be the majority of the issues with models at the current level of capabilities.
Free extra time is good. The reasonable version of the argument is that you should avoid buying total-time in ways that cost time with more powerful systems; maybe AI progress will look like the purple line.
Nice post. Why didn’t you post it here for AI pause debate week haha.
Yes I somewhat understand this potential “overhang” danger as an argument in and of itself against a pause., I just don’t see how it relates to technical alignment research specifically.