Two reasons we might be closer to solving alignment than it seems
I was at an AI safety retreat recently and there seemed to be two categories of researchers:
Those who thought most AI safety research was useless
Those who thought all AI safety research was useless
This is a darkly humurous anecdote illustrating a larger pattern of intense pessimism I’ve noticed among a certain contingency of AI safety researchers.
I don’t disagree with the more moderate version of this position. If things continue as they are, anywhere up to a 95% chance of doom seems defendable.
What I disagree with is the degree of confidence. While we certainly shouldn’t be confident that everything will turn out fine, we also shouldn’t feel confident that it won’t. This post might have easily been titled the same as Rob Bensinger’s similar post: we shouldn’t be maximally pessimistic about AI alignment.
The main two reasons for not being overly confident of doom are:
All of the arguments saying that it’s hard to be confident that transformative AI (TAI) isn’t just around the corner also apply to safety research progress.
It’s still early days and we’ve had about as much progress as you’d predict given that up until recently we’ve only had double-digit numbers of people working on the problem.
The arguments that apply to TAI potentially being closer than we think also apply to alignment
It’s really hard to predict research progress. In ‘There’s no fire alarm for artificial general intelligence’, Eliezer Yudkowsky points out that historically, ‘it is very often the case that key technological developments still seem decades away, five years before they show up’ - even to scientists who are working directly on the problem.
Wilbur Wright thought that heavier-than-air flight was fifty years away; two years later, he helped build the first heavier-than-air flyer. This is because it often feels the same when the technology is decades away and when the technology is a year away: in either case, you don’t yet know how to solve the problem.
These arguments apply not only to TAI, but also to TAI alignment. Heavier-than-air flight felt like it was years away when it was actually round the corner. Similarly, researchers’ sense that alignment is decades away—or even that it is impossible—is consistent with the possibility that we’ll solve alignment next year.
AI safety researchers are more likely to be pessimistic about alignment than the general public because they are deeply embroiled in the weeds of the problem. They are viscerally aware, from firsthand experience, of the difficulty. They are the ones who have to feel the day-to-day confusion, frustration, and despair of bashing their heads against a problem and making inconsistent progress. But this is how it always feels to be on the cutting edge of research. If it felt easy and smooth, it wouldn’t be the edge of our knowledge.
AI progress thus far has been highly discontinuous; there have been times of fast advancement interspersed with ‘AI winters’ where enthusiasm waned, and then several important advances in the last few months. This could also be true for AI safety—even if we’re in a slump now, massive progress could be around the corner.
It’s not surprising to see this little progress when we have so few people working on it
I understand why some people are in despair about the problem. Some have been working on alignment for decades and have still not figured it out. I can empathize. I’ve dedicated my life to trying to do good for the last twelve years and I’m still deeply uncertain whether I’ve even been net positive. It’s hard to stay optimistic and motivated in that scenario.
But let’s take a step back: this is an extremely complex question, and we haven’t attacked the problem with all our strength yet. Some of the earliest pioneers of the field are no doubt some of the most brilliant humans out there. Yet, they are still only a small number of people. There are currently only about one hundred and fifty people working full-time on technical AI safety, and even that is recent—ten years ago, it was more like five. We probably need more like tens of thousands of people researching this for several decades.
I’m reminded of the great bit in Harry Potter and the Methods of Rationality where Harry explains to Fred and George how to think about something. For context, Harry just asked the twins to creatively solve a problem for him:
’Fred and George exchanged worried glances.
“I can’t think of anything,” said George.
“Neither can I,” said Fred. “Sorry.”
Harry stared at them.
And then Harry began to explain how you went about thinking of things.
It had been known to take longer than two seconds, said Harry.
You never called any question impossible, said Harry, until you had taken an actual clock and thought about it for five minutes, by the motion of the minute hand. Not five minutes metaphorically, five minutes by a physical clock….
So Harry was going to leave this problem to Fred and George, and they would discuss all the aspects of it and brainstorm anything they thought might be remotely relevant. And they shouldn’t try to come up with an actual solution until they’d finished doing that, unless of course they did happen to randomly think of something awesome, in which case they could write it down for afterward and then go back to thinking. And he didn’t want to hear back from them about any so-called failures to think of anything for at least a week. Some people spent decades trying to think of things.’
We’ve definitely set a timer and thought about this for five minutes. But this is the sort of problem that won’t just be solved by a small number of geniuses. We need way more “quality-adjusted researcher-years” if we’re going to get through this.
This is one of if not the most difficult intellectual challenge of our time. Even understanding the problem is difficult, and to solve it we will probably require a mix of math, philosophy, programming, and a healthy dose of political acumen.
Think about how many scientists it took before we made progress on practically any important scientific discovery. Except for the lucky ones at the beginning of the Enlightenment period where there were few scientists and lots of low-hanging fruit, there are usually thousands to tens of thousands scientists banging their heads against walls for decades for every one who makes a significant breakthrough. And we’ve got around one hundred in a field barely over a decade old!
When you look at it this way, it’s no wonder we haven’t made a lot of progress yet. In fact, it would be quite surprising if we had. We are a small field that’s just getting started.
We’re currently Fred and George, feeling discouraged after having pondered the world’s most important and challenging question for a few metaphorical seconds. Let’s be inspired by Harry to not only think about it for five minutes, but for decades, with a massive community of other people trying to do the same. Let’s field-build and get thousands of people banging their head against this wicked problem.
Who knows—one of the new researchers might be just a year away from making the crucial insight that ushers in the AI alignment summer.
Reminder that you can listen to EA Forum/LessWrong posts on your podcast player using The Nonlinear Library.
This post was written collaboratively by Kat Woods and Amber Dawn Ace as part of Nonlinear’s experimental Writing Internship program. The ideas are Kat’s; Kat explained them to Amber, and Amber wrote them up. We would like to offer this service to other EAs who want to share their as-yet unwritten ideas or expertise.
If you would be interested in working with Amber to write up your ideas, fill out this form.
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 89 points) (
- EA & LW Forums Weekly Summary (19 − 25 Sep 22′) by 28 Sep 2022 20:13 UTC; 25 points) (
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 25 points) (LessWrong;
- EA & LW Forums Weekly Summary (19 − 25 Sep 22′) by 28 Sep 2022 20:18 UTC; 16 points) (LessWrong;
- Monthly Overload of EA—October 2022 by 1 Oct 2022 12:32 UTC; 13 points) (
- 29 Sep 2022 1:37 UTC; 2 points) 's comment on Optimism, AI risk, and EA blind spots by (
I want to push back on the conception of the progress of a research field being well-correlated with “the number of people working in that field”.
I think the heuristic of “a difficult problem is best solved by having a very large number of people working on it” is not a particularly successful heuristic when predicting past successes in science, nor is it particularly successful if you are trying to forecast business success. When a company is trying to solve a difficult technical or scientific problem, they don’t usually send 10,000 people to work on it (and doing so would almost never work). They send their very best people to work on it, and spend a good number of resources supporting them.
Right now, we don’t have traction for AI Alignment, and indeed, many if not most of the people who I think have most of a chance to find traction on AI Alignment are instead busy dealing with all the newcomers to the field. When I do interviews with top researchers they often complain that the quality of their research environment has gotten worse over time as more and more people with less context are filling their social environment, and they have found the community worse over time for making intellectual progress in (this is not a universally reported thing, but it’s a pretty common thing I’ve heard).
I don’t think the right goal should be to have 10,000 people work on our current confused models of AI Alignment. I think we don’t currently really know how to have 10,000 people work on AI Alignment, and if we tried, I expect that group of people would end up optimizing for proxy variables that have little to do with research progress, like how cognitive psychology ended up optimizing extremely hard for p-values and as a field produces less useful insight than (as far as I can tell) Daniel Kahnemann himself while he was actively publishing.
I think it’s good to get more smart people thinking about the problem, but it’s easy to find examples of extremely large efforts with thousands of people working on a problem, being vastly less effective than a group of 20 people. Indeed, I think that’s the default for most difficult problems in the world (I think FTX would be less successful if it had 10,000 employees, as would most startups, and, I argue, also most research fields).
Also, I’m surprised at the claim that more people doesn’t lead to more progress. I’ve heard that one major cause of progress so far has just been that there’s a much larger population of people to try things (of course, progress also causes there to be more people, so the causal chain goes both ways). Similarly, the reason why cities tend to have more innovation than small towns is because there’s a denser number of people around each other.
You can also think of it from the perspective of adding more explore. Right now there are surprisingly few research agendas. Having more people would lead to more of them, and it increases the odds that one of them is correct.
Of note, I do share your concerns about making sure the field doesn’t just end up maximizing proxy metrics. I think that will be tricky and will require a lot of work (as it does right now even!).
I think more people in a worldwide population generally leads to more innovation, but primarily in domains where there is a lot of returns to scale, and where you have a lot of incentives for people to make progress towards. If you want to have get people to explore a specific problem, I think more people rarely helps (because the difficulty lies in aiming people at the problem, not in the firepower you have).
I think adding more people also rarely causes more exploration to happen. Large companies are usually much less innovative than small companies. Coordinating large groups of people usually requires conformity and because of asymmetries in how easy it is to cause harm to a system vs. to produce value, requires widespread conservativism in order to function. I think similar things are happening in EA, where the larger EA gets, the more people are concerned about someone “destroying the reputation of the community” the more people have to push on the brakes in order to prevent anyone from taking risky action.
I think there exist potential configurations of a research field that can scale substantially better, but I don’t think we are currently configured that way, and I expect by default exploration to go down as scale goes up (in general, the number of promising new research agendas and direction seems to me to have gone down a lot during the last 5 years as EA has grown a lot, and this is a sentiment I’ve heard mirrored from most people who have been engaged for that long).
At least in technical AI Alignment, the opposite seems to have happened in the last couple of years. It looks like we’re in the midst of a Cambrian explosion of research groups and agendas. Or would you argue that most of these aren’t promising?
There is a cambrian explosion of research groups, but basically no new agendas as far as I can tell? Of the agendas listed on that post, I think basically all are 5+ years old (some have morphed, like ELK is a different take on scalable oversight than Paul had 5 years ago, but I would classify it as the same agenda).
There is a giant pile of people working on the stuff, though the vast majority of new work can be characterized “let’s just try to solve some near-term alignment problems and hope that it somehow informs our models of long-term alignment problems” and a large pile of different types of transparency research. I think there are good cases for that work, though I am not very optimistic about it helping with existential risk.
That’s really interesting and unexpected! Seems worth figuring out why that’s happening. What are your top hypotheses for why that’s happening?
My first guess would be epistemic humility norms.
My second would be that the first people in a field are often disproportionately talented compared to people coming in later. (Although you could also tell a story about how at the beginning it’s too socially weird so it can’t attract a lot of top talent).
My third is that since alignment is so hard, it’s easier for people to latch onto existing research agendas instead of creating new ones. At the beginning there were practically no agendas to latch onto, so people had to make new ones, but now there are a few, so most people just sort themselves into those.
Are there any promising directions for AGI x-risk reduction that you are aware of that aren’t being (significantly) explored?
I think this is still in the framework of thinking that large groups of people having to coordinate leads to stagnation. To change my mind, you’d have to make the case that having a larger number of startups leads to less innovation, which seems like a hard case to make.
I think this is a separate issue that might be caused by the size of the movement, but a different hypothesis is that it’s simply an idea that has traction in the movement. One which has been around for a long time, even while we were a lot smaller. Spending your “weirdness points” and such considerations have been around since the very beginning.
(On a side note, I think we’re overly concerned about this, but that’s a whole other post. Suffice to say here that a lot of the probability mass is on this not being caused by the size of the movement, but rather a particularly sticky idea)
🎯 I 100% agree. I’m thinking of spending some more time thinking on and writing up ways that we could make it so the movement could usefully take on more researchers. I also encourage others to think on this, because it could unlock a lot of potential.
I think this is where we disagree. It’d be very surprising if ~150 researchers is the optimal amount, or that having less would lead to more innovation and more/better research agendas.
An alternative hypothesis is that people you’ve been talking to have been becoming more pessimistic about having hope at all (if you hang out with MIRI folk a lot, I’d expect this to be more acute). It might not be because there’s more people having bad ideas or that having more people in the movement leads to a decline in quality, but rather a certain contingency think alignment is impossible or deeply improbable, so that all ideas seem bad. In this paradigm/POV, the default is that all new research agendas seem bad. It’s not that the agendas got worse. It’s that people think the problem is even harder than they originally thought.
Another hypothesis is that the idea of epistemic humility has been spreading, combined with the idea that you need intensive mentorship. This leads to new people coming in being less likely to actually come up with new research agendas, but rather to defer to authority. (A whole other post there!)
Anyways, just some alternatives to consider :) It’s hard to convey tone over text, but I’m enjoying this discussion a lot and you should read all my writing assuming a lot of warmth and engagement. :)
I think de-facto right now people have to coordinate in order to do work on AI Alignment, because most people need structure and mentorship and guidance to do any work, and want to be part of a coherent community.
Separately, I also think many startup communities are indeed failing to be innovative because of their size and culture. Silicon Valley is a pretty unique phenomenon, and I’ve observed “startup communities” in Germany that felt to me liked they harmed innovation more than they benefitted it. The same is true for almost any “startup incubator” that large universities are trying to start up. When I visit them, I feel like the culture there primarily encourage conformity and chasing the same proxy metrics as everyone else.
I think actually creating a startup ecosystem is hard, and I think it’s still easier than creating a similar ecosystem for something as ill-defined as AI Alignment. The benefit that startups have is that you can very roughly measure success by money, at least in the long run, and this makes it pretty easy to point many people at the problem (and like, creates strong incentives for people to point themselves at the problem).
I think we have no similar short pointer for AI Alignment, and most people who start working in the field seem to me to be quite confused about what the actual problem to be solved is, and then often just end up doing AI capabilities research while slapping an “AI Alignment” label on it, and I think scaling that up mostly just harms the world.
I think we should generally have a prior that social dynamics of large groups of people end up pushing heavily towards conformity, and that those pressures towards conformity can cancel out many orders of magnitude of growth of the number of people who could theoretically explore different directions.
As a concrete case study, I like this Robin Hanson post “The World Forager Elite”:
The number of nations, as well as the number of communities and researchers that were capable of doing innovative things in response to COVID was vastly greater in 2020 than for any previous pandemic. But what we saw was much less global variance and innovation in pandemic responses. I think there was scientific innovation, and that innovation was likely greater than for previous pandemics, but overall, despite the vastly greater number of nations and people in the international community of 2020, this only produced more risk-aversion in stepping out of line with elite consensus.
I think by-default we should expect similar effects in fields like AI Alignment. I think maintaining a field that is open to new ideas and approaches is actively difficult. If you grow the field without trying to preserve the concrete and specific mechanisms that are in place to allow innovation to grow, more people will not result in more innovation, it will result in less, even from the people that have previously been part of the same community.
In the case of COVID, the global research community spent a substantial fraction of its effort on actively preventing people from performing experiments like variolation or challenge trials, and we see the same in fields like Psychology research where a substantial fraction of energy is spent on ever-increasing ethical review requirements.
We see the same in the construction industry (a recent strong interest of mine), which despite its quickly growing size, is performing substantially fewer experiments than it was 40 years ago, and is spending most of its effort actively regulating what other people in the industry can do, and limiting the type of allowable construction materials and approaches to smaller and smaller sets.
I think by-default, I expect fast growth of the AI Alignment community to reduce innovation for the same reasons. I expect a larger community will increase pressures towards forming an elite consensus, and that consensus will be enforced via various legible and illegible means. Most of the world is really not great at innovation, and the default outcome of large groups of people, even when pointed towards a shared goal, is not innovation, but conformity, and if we recklessly grow, I think we will default towards the same common outcome.
Re conformity, I wonder if related arguments could help shift the Future Funds’ worldview?
This is a good point, and a wake up call for us to do better with AI Alignment. Given the majority of funding for AGI x-safety is coming from within EA right now, and as a community we are accutely aware of the failings with Covid, we should be striving to do better.
Is there any legible evidence for this?
There was some deviation (e.g. no lockdowns in Sweden), but the most telling thing was no human challenge trials anywhere in the world. That alone was a tragedy that prolonged the pandemic by months (by delaying roll-out of vaccines) and caused millions of deaths.
I agree that 10k people working in the same org would be unwieldy. I’m thinking more having 10k people working in hundreds or orgs and sometimes independently, etc. Each of these people would be in their own little microcosm, and dealing with the same normal amount of interactions. Should address the lowering social environment cost. Might even make it better because people could more easily find their “tribe”
And I agree right now we wouldn’t be able to absorb that number usefully. That’s currently an unsolved problem that would be good to make progress on.
Interesting, can you give some examples of this that are analogous to solving the alignment problem?
As a separate point, it might still be worth getting many more people working on alignment ASAP to shift the Overton window. Some extremely talented mathematicians have worked together on cryptography projects for the US government, and I imagine something similar could happen for alignment in the future.
Strong upvote for giving some outside perspective to the field, and this is an important point of why AI Alignment is likely to be tractable at all. It also means getting many more researchers and money, fast is important for AI Safety.