The model of AI risk I’ve mostly believed for years was that of fast-takeoff and therefore a unipolar world.[1] This model allowed me to have some concrete models of what the EA community should do to make AI go better. Now, I am at least half-persuaded of slow-takeoff, multipolar worlds (1, 2). But I have much less idea what to do in this world. So, what should the top priorities be for EA longtermists who want to make AI go well?
Fast-takeoff, unipolar priorities, as seen by me writing quickly:
Get the top AI labs concerned about safety
Means that if they feel like they’re close to AGI, they will hopefully be receptive to whatever the state of the art in alignment research is
Try to solve the alignment problem in the most rigorous way possible.
After all, we only get one shot
[Less obvious to me] Try to get governments concerned about safety in case they nationalize AI labs. But also don’t increase the likelihood of them doing that by shouting about how AI is going to be this incredibly powerful thing.
Multipolar, slow takeoff worlds
Getting top AI labs concerned about safety seems much harder in the long term, as they become increasingly economically incentivized to ignore it.
Trying to solve the alignment problem in the most rigorous way possible seems less necessary. Also maybe alignment is easier, and therefore is less likely to be the thing that fails.
Governments might be captured by increasingly powerful private interests / there might be AI-powered propaganda that does … something to their ability to function.
Broadly in this world I’m much more worried about race-to-the-bottom dynamics. In Meditations on Moloch terms, instead of AI being the solution to Moloch, Moloch becomes the largest contributor to AI x-risk.
I’m interested in all sorts of comments, including:
What should the top priorities be to make AI go well in a slow-takeoff world?
Challenging the hypothetical
Is there anything wrong with this analysis?
- ↩︎
See Nick Bostrom’s Superintelligence
I think that “AI alignment research right now” is a top priority in unipolar fast-takeoff worlds, and it’s also a top priority in multipolar slow-takeoff worlds. (It’s certainly not the only thing to do—e.g. there’s multipolar-specific work to do, like the links in Jonas’s answer on this page, or here etc.)
(COI note: I myself am doing “AI alignment research right now” :-P )
First of all, in the big picture, right now humanity is simultaneously pursuing many quite different research programs towards AGI (I listed a dozen or so here (see Appendix)). If more than one of them is viable (and I think that’s likely), then in a perfect world we would figure out which of them has the best hope of leading to Safe And Beneficial AGI, and differentially accelerate that one (and/or differentially decelerate the others). This isn’t happening today—that’s not how most researchers are deciding what AI capabilities research to do, and it’s not how most funding sources are deciding what AI capabilities research to fund. Could it happen in the future? Yes, I think so! But only if...
AI alignment researchers figure out which of these AGI-relevant research programs is more or less promising for safety,
…and broadly communicate that information to experts, using legible arguments…
…and do it way in advance of any of those research programs getting anywhere close to AGI
The last one is especially important. If some AI research program has already gotten to the point of super-powerful proto-AGI source code published on GitHub, there’s no way you’re going to stop people from using and improving it. Whereas if the research program is still very early-stage and theoretical, and needs many decades of intense work and dozens more revolutionary insights to really start getting powerful, then we have a shot at this kind of differential technological development strategy being viable.
(By the same token, maybe it will turn out that there’s no way to develop safe AGI, and we want to globally ban AGI development. I think if a ban were possible at all, it would only be possible if we got started when we’re still very far from being able to build AGI.)
So for example, if it’s possible to build a “prosaic” AGI using deep neural networks, nobody knows whether it would be possible to control and use it safely. There are some kinda-illegible intuitive arguments on both sides. Nobody really knows. People are working on clarifying this question, and I think they’re making some progress, and I’m saying that it would be really good if they could figure it out one way or the other ASAP.
Second of all, slow takeoff doesn’t necessarily mean that we can just wait and solve the alignment problem later. Sometimes you can have software right in front of you, and it’s not doing what you want it to do, but you still don’t know how to fix it. The alignment problem could be like that.
One way to think about it is: How slow is slow takeoff, versus how long does it take to solve the alignment problem? We don’t know.
Also, how much longer would it take, once somebody develops best practices to solve the alignment problem, for all relevant actors to reach a consensus that following those best practices is a good idea and in their self-interest? That step could add on years, or even decades—as they say, “science progresses one funeral at a time”, and standards committees work at a glacial pace, to say nothing of government regulation, to say nothing of global treaties.
Anyway, if “slow takeoff” is 100 years, OK fine, that’s slow enough. If “slow takeoff” is ten years, maybe that’s slow enough if the alignment problem happens to have an straightforward, costless, highly-legible and intuitive, scalable solution that somebody immediately discovers. Much more likely, I think we would need to be thinking about the alignment problem in advance.
For more detailed discussion, I have my own slow-takeoff AGI doom scenario here. :-P
Thanks for your answer. (Just to check, I think you are a different Steve Byrnes than the one I met at Stanford EA in 2016 or so?)
I do want to emphasize is that I don’t doubt that technical AI safety work is one of the top priorities. It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work. It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
No I don’t think we’ve met! In 2016 I was a professional physicist living in Boston. I’m not sure if I would have even known what “EA” stood for in 2016. :-)
I agree. But maybe I would have said “less hard” rather than “easier” to better convey a certain mood :-P
I’m not sure what your model is here.
Maybe a useful framing is “alignment tax”: if it’s possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That’s the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (It’s never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won’t, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because we’re asking organizations to make tradeoffs for safety, and if that’s a very big ask, we’re less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we’re asking is for them to spend 1% more training time, maybe they all will. If instead we’re asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that’s much less promising.
So either way, we want a low alignment tax.
OK, now let’s get back to what you wrote.
I think maybe your model is:
“If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI”
(You can correct me if I’m misunderstanding.)
If we accept that premise, then I can see where you’re coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely “probably useless” in a unipolar fast-takeoff world. (In the latter case, there’s at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But I’m not sure that I believe the premise. Or at least I’m pretty unsure. I am not myself an Agent Foundations researcher, but I don’t imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that they’re hoping for in their research.
Oh, hmmm, the other possibility is that you’re mentally lumping together “multipolar slow-takeoff AGI” with “prosaic AGI” and with “short timelines”. These are indeed often lumped together, even if they’re different things. Anyway, I would certainly agree that both “prosaic AGI” and “short timelines” would make Agent Foundations research less promising compared to neural-net-specific work.
Some work that seems relevant:
https://arxiv.org/abs/2006.04948
https://futureoflife.org/2020/09/15/andrew-critch-on-ai-research-considerations-for-human-existential-safety/
https://www.alignmentforum.org/posts/EzoCZjTdWTMgacKGS/clr-s-recent-work-on-multi-agent-systems
The Andrew Critch interview is so far exactly what I’m looking for.