The role of academia in AI Safety.
AI Safety started as a realization some rationalists had about the possibility of building up powerful general intelligent systems, and how we may control them. Since then, however, the community has grown. My favorite version of the bottom line is something like
We want to build intelligent systems because we believe they could bring up lots of wonderful things, and it will become one of the most important technologies humankind has ever invented. We further want to make sure that powerful AI systems are as beneficial to everyone as possible. To ensure that is indeed the case, we need to make sure AI systems can understand our desires and hopes, both as individuals and as a society, even when we do not have a clear picture of them or they conflict. AI Safety is about developing the techniques that allow AI systems to learn what we want, and how we can fulfill this future.
On the other hand, in the community, there is a lot of focus on existential risk, which is a true thing, but it is definitely not necessary to make the point about the importance of AI Safety. Maybe the focus on quantities will appeal to very rational-oriented people but has the stark risk of making people feel they are being Pascal mugged (I certainly did the first time). For this reason, I think we have to change the discourse.
Similarly, I think much of the culture of the early days persist today. It was focused on independent research centers like MIRI or independent researchers because the early community thought academia was not ready for the task of creating a new research field centered on understanding intelligence. Alternatively, perhaps because of the influence of Silicon Valley, startup stood up as a sexy place to work on AI Safety too.
However, I think that the time has come to make sure AI Safety becomes a respected academic field. I know that we donât like many things about academia: it is bureaucratic, credentialist, slow to move, too centered on ego, promoter of publication races to the bottom and of poor quality of peer reviews. But academia is also how the world perceives science and how they tell knowledge from quackery. It is a powerful machine forcing researchers to make concrete advances in their field of expertise, and most of the time if they do not it is because science is very hard, not because the incentives do not push them brutally to do so.
I also know people believe AI Safety is preparadigmatic, but I argue we have to get over it, or risk building castles in the air. There does not need to be a single paradigm, but we certainly need at least some paradigm to push forward our understanding. Itâs ok if we have many. This would allow us to solve subproblems in concrete ways, instead of staying in high-level ideas about how we would like things to turn out. Letâs have more academic (peer-reviewed) papers, not because blog posts are bad, but because we need to show good and concrete solutions for concrete problems, and publishing papers forces us to do so. Publishing papers provides the tight feedback we seek if we want to solve AI Safety, and academia the mentoring environment we need to face this problem. And in fact, it seems that the lack of concreteness of the AI Safety problems is one key aspect that may be holding back extraordinary researchers from joining efforts in a problem that they also believe to be important.
Instead of academia, our community relies sometimes on independent researchers. It is comforting that some of us can use the money we have to carry out this important research, and I celebrate it. But I wished this were more the exception than the rule. AI Safety might be a new field, but science nowadays is hardly a place where one can make any important contribution without the expertise and lots of effort. I believe there are many tools from machine, deep and reinforcement learning that can be used to solve this problem, and we need experts on them, not volunteers. This is a bit sad for some of us, because it means perhaps we might not be the right people to solve the problem. But it is not who solves it, but actually getting it done what matters, and I think that without academia this will not get done.
For this reason, I am happy the Future of Life Institute is trying to promote an academic community of researchers in AI Safety. I know the main bottleneck might be the lack of experienced researchers with mentoring capability. Thatâs reasonable, but one key idea to address this might be focussing our efforts on better defining the subproblems. Mathematicians know definitions are very important, and definitions of these (toy?) problem may be the key to both making it easier for senior researchers with the mentoring capacity to try their hand at AI Safety, and also to make concrete and measurable progress that will allow us to sell AI Safety as scientific research area to the world.
PD: I did not write this for the red teaming contest, but I think this is a good candidate for it.
For concrete research directions in safety and several dozen project ideas, please see our paper Unsolved Problems in ML Safety: https://ââarxiv.org/ââabs/ââ2109.13916
Note that some directions are less concretized than others. For example, it is easier to do work on Honest AI and Proxy Gaming than it is to do work on, say, Value Clarification.
Since this paper is dense for newcomers, Iâm finishing up creating a course that will expand on these safety problems.
Thanks Dan!
There are a few possible claims mixed up here:
Possible claim 1: âWe want people in academia to be doing lots of good AGI-x-risk-mitigating research.â Yes, I donât think this is controversial.
Possible claim 2: âWe should stop giving independent researchers and nonprofits money to do AGI-x-risk-mitigating research, because academia is better.â You didnât exactly say this, but sorta imply it. I disagree. Academia has strengths and weaknesses, and certain types of projects and people might or might not be suited to academia, and I think we shouldnât make a priori blanket declarations about academia being appropriate for everything versus nothing. My wish-list of AGI safety research projects (blog post is forthcoming UPDATE: here it is) has a bunch of items that are clearly well-suited to academia and a bunch of others that are equally clearly a terrible fit to academia. Likewise, some people who might work on AGI safety are in a great position to do so within academia (e.g. because theyâre already faculty) and some are in a terrible position to do so within academia (e.g. because they lack relevant credentials). Letâs just have everyone do what makes sense for them!
Possible claim 3: âWe should do field-building to make good AGI-x-risk-mitigating research more common, and better, within academia.â The goal seems uncontroversially good. Whether any specific plan will accomplish that goal is a different question. For example, a proposal to fund a particular project led by such-and-such professor (say, Jacob Steinhardt or Stuart Russell) is very different from a proposal to endow a university professorship in AGI safety. In the latter case, I would suspect that universities will happily take the money and spend it on whatever their professors would have been doing anyway, and theyâll just shoehorn the words âAGI safetyâ into the relevant press releases and CVs. Whereas in the former case, itâs just another project, and we can evaluate it on its merits, including comparing it to possible projects done outside academia.
Possible claim 4: âWe should turn AGI safety into a paradigmatic field with well-defined widely-accepted research problems and approaches which contribute meaningfully towards x-risk reduction, and also would be legible to journals, NSF grant applications, etc.â Yes that would be great (and is already true to a certain extent), but you canât just wish that into existence! Nobody wants the field to be preparadigmatic! Itâs preparadigmatic not by choice, but rather to the extent that we are still searching for the right paradigms.
(COI note.)
Here my personal interpretation of the post:
> the EA/âLW community has a comparative advantage at stating the right problem to solve and grantmaking, the academic community has a comparative advantage at solving sufficiently defined problems
I think this is fairly uncontroversial, and roughly rightâI will probably be thinking in these terms more often in the future.
Implications are that the most important output the community can hope to produce is research agendas, benchmarks, idealized solutions and problem statements, and leave ML research, practical solutions and the actual task of building an aligned AGI to academics.
(the picture gets more complicated because experiments are a good way of bashing our heads against the problem and gaining intuitions useful for making the problems sharper)
In slogan form: the role of the community should be to create the ImageNet Challenge of AI Alignment, and plan to leave the task of building the AlexNet of AI Alignment to academics.
Itâs not obvious to me that âthe academic community has a comparative advantage at solving sufficiently defined problemsâ. For example, mechanistic interpretability has been a well-defined problem for the past two years at least, but it seems that a disproportionate amount of progress on it has been made outside of academia, by Chris Olah & collaborators at OpenAI & Anthropic. There are various concrete problems here but it seems that more progress is being made by independent researchers (e.g. Vanessa Kosoy, John Wentworth) and researchers at nonprofits (MIRI) than by anyone in academia. In other domains, I tend to think of big challenging technical projects as being done more often by the private or public sectorâfor example, academic groups are not building rocket ships, or ultra-precise telescope mirrors, etc., instead companies and governments are. Yet another example: In the domain of AI capabilities research, DeepMind and OpenAI and FAIR and Microsoft Research etc. give academic labs a run for their money in solving concrete problems. Also, quasi-independent-researcher Jeremy Howard beat a bunch of ML benchmarks while arguably kicking off the pre-trained-language-model revolution here.
My perspective is: academia has a bunch of (1) talent and (2) resources. I think itâs worth trying to coax that talent and resources towards solving important problems like AI alignment, instead of the various less-important and less-time-sensitive things that they do.
However, I think itâs MUCH less clear that any particular Person X would be more productive as a grad student than as a nonprofit employee, or more productive as a professor than as a nonprofit technical co-founder. In fact, I strongly expect the reverse.
And in that case, we should really be framing it as âThere are tons of talented people in academia, and we should be trying to convince them that AGI x-risk is a problem they should work on. And likewise, there are tons of resources in academia, and we should be trying to direct those resources towards AGI x-risk research.â Note the difference: in this framing, weâre not pre-supposing that pushing people and projects from outside academia to inside academia is a good thing. It might or might not be, depending on the details.
Fair points. In particular, I think my response should have focused more on the role of academia + industry.
Not entirely fair: if you open the field just a bit to âinterpretabilityâ in general you will see that most important advances in the field (eg SHAP and LIME) were done inside academia.
I would also not be too surprised to find people within academia who are doing great mechanistic interpretability, simply because of the sheer number of people researching interpretability.
Granted, but part of the problem is convincing academics to engage. I think that the math community would be x100 times better at solving these problems if they ever become popular enough in academia.
Matches my intuition as well (though I might be biased here). Iâd add that I expect grad students will get better mentorship on average in academia than in non profits /â doing independent research (but mostly work on irrelevant problems while in academia).
One important intuition that I have is that I think academia + industry scales to âcrappy but at the end advancing in the object levelâ despite having lots of mediocre people involved, while I think that all cool things happening in EA+LW are due to some exceptionally talented people, and if we tried to scale them up we would end with âhorrible potcrackeryâ.
But Iâd be delighted to be proven wrong!
I must say I strongly agree with Steven.
If you are saying academia has a good track record, then I must say (1) wrong for stuff like ML, where in recent years much (arguably most) relevant progress is made outside of academia, and (2) it may have a good track record for the long history of science, and when you say itâs good at solving problems, sure I think it might solve alignment in 100 years, but we need it in 10, and academia is slow. (E.g. read Yudkowskyâs sequence on science, if you donât think that academia is slow.)
Do you have some reason why you think that a person can make more progress in academia than elsewhere? I agree that academia has people, and itâs good to get those people, but academia has badly shaped incentives, like (from my other comment): âAcademia doesnât have good incentives to make that kind of important progress: You are supposed to publish papers, so you (1) focus on what you can do with current ML systems, instead of focusing on more uncertain longer-term work, and (2) goodhart on some subproblems that donât take that long to solve, instead of actually focusing on understanding the core difficulties and how one might address them.â So I expect a person can make more progress outside of academia. Much more, in fact.
Some important parts of the AI safety problem seem to me like they donât fit well into academia work. There are of course exceptions, people in academia who can make useful progress here, but they are rare. I am not that confident in this, as my understanding of AI safety isnât that deep, but Iâm not just making this up. (EDIT: This mostly overlaps with the first two points I made, that academia is slow and that there are bad incentives, and maybe some other minor considerations about why excellent people (e.g. John Wentworth) may rather choose to not work in academia. What Iâm saying is that I think that AI safety is a problem where those obstacles are big obstacles, whereas there might be other fields where those obstacles arenât thaaat bad.)
Hi Steven,
I donât agree with possible claim 2. I just say that we should promote academic careers more than independent researching, not that we should stop giving them money. I donât think money is the issue.
Thanks
OK, thanks for clarifying. So my proposal would be: if a person wants to do /â found /â fund an AGI-x-risk-mitigating research project, they should consider their background, their situation, the specific nature of the research project, etc., and decide on a case-by-case basis whether the best home for that research project is academia (e.g. CHAI) versus industry (e.g. DeepMind, Anthropic) versus nonprofits (e.g. MIRI) versus independent research. And a priori, it could be any of those. Do you agree with that?
Yes, I do indeed :)
You can frame it if you want as: founders should aim to expand the range of academic opportunities, and engage more with academics.
We wonât solve AI safety by just throwing a bunch of (ML) researchers on it.
AGI will (likely) be quite different from current ML systems. Also, work on aligning current ML systems wonât be that useful, and generally what we need is not small advancements, but we rather need breakthroughs. (This is a great post for getting started on understanding why this is the case.)
We much rather need a few Paul Christiano level researchers that build a very deep understanding of the alignment problem and then can make huge advances, than we need many still-great-but-not-that-extraordinary researchers.
Academia doesnât have good incentives to make that kind of important progress: You are supposed to publish papers, so you (1) focus on what you can do with current ML systems, instead of focusing on more uncertain longer-term work, and (2) goodhart on some subproblems that donât take that long to solve, instead of actually focusing on understanding the core difficulties and how one might address them.
I think paradigms are partially useful and we should probably create some for some specific approaches to AI safety, but I think the default paradigms that would develop in academia are probably pretty bad, so that the research isnât that useful.
Promoting AI safety in academia is probably still good, but for actually preventing existential risk, we need some other way of creating incentives to usefully contribute to AI safety. I donât know yet how to best do it, but I think there are better options.
Getting people into AI safety without arguing about x-risk seems nice, but mostly because I think this strategy is useful for convincing people of x-risk later, so they then can work on important stuff.
Hey Simon, thanks for answering!
Perhaps we donât need to buy ML researchers (although I think we should try at least), but I think it is more likely we wonât solve AI Safety if we donât get more concrete problems in the first place.
Iâm afraid I disagree with this. For example, if this were true, interpretability from Chris Olah or the Anthropic team would be automatically doomed; Value Learning from CHAI would also be useless, our predictions about forecasting that we use to convince people of the importance of AI Safety equally so. Of course, this does not prove anything; but I think there is a case to be made that Deep Learning seems currently as the only viable path we have found to perhaps get to AGI. And while I think the agnostic approach of MIRI is very valuable, I think it would be foolish to bet all our work to the truth of this statement. It could still be the case if we were much more bottlenecked in people than in research lines, but I donât think thatâs the case, I think we are more bottlenecked in concrete ideas of how to push forward our understanding. Needless to say, I believe Value Learning and interpretability are things that are very suitable for academia.
Breakthroughs only happen when one understands the problem in detail, not when people float around vague ideas.
Agreed. But I think there are great researchers at academia, and perhaps we could profit from that. I donât think we have any method to spot good researchers in our community anyways. Academia can sometimes help with that.
I think this is a bit exaggerated. What academia does is to ask for well defined problems and concrete solutions. And thatâs what we want if we want to progress. It is true that some goodharting will happen, but I think we would be closer to the optimum if we were goodharting a bit than where we are right now, unable to measure much progress. Notice also that Shannon and many other people coming up with breakthroughs did so in academic ways.
If the concrete problems are too watered down compared to the real thing, you also wonât solve AI alignment by misleading people into thinking itâs easier.
But we probably agree that insofar as some original-thinking genius reasoners can produce useful shovel-ready research questions for not-so-original-thinking academics (who may or may not be geniuses at other skills) to unbottleneck all the talent there, they should do it. The question seems to be âis it possible?â
I think the best judges are the people who are already doing work that the alignment community deems valuable. If all of EA is currently thinking about AI alignment in a way thatâs so confused that the experts from within canât even recognize talent, then weâre in trouble anyway. If EAs who have specialized on this for years are so vastly confused about it, academia will be even more confused.
Independently of the above argument that weâre in trouble if we canât even recognize talent, I also feel pretty convinced that we can on first-order grounds. It seems pretty obvious to me that work tests or interviews conducted by community experts do an okay job at recognizing talent. They probably donât do a perfect job, but itâs still good enough. I think the biggest problem is that few people in EA have the expertise to do it well (and those people tend to be very busy), so grantmakers or career advice teams with talent scouts (such as 80,000 Hours) are bottlecked by expert time that would go into evaluations and assessments.
Hey Lukas!
Note that even MIRI sometimes does this
It would be fair to say that this is just from an exposition of the importance of AI Safety, rather than from a proposal itself. But in any case, humans always solve complicated problems by breaking them up because otherwise it is terribly hard. Of course, there is a risk that we oversimplify the problem, but general researchers often know where to stop.
Perhaps you were focusing more on things vaguely related such as fairness etc, but Iâm arguing more for making the real AI Safety problems concrete enough that they will tackle it. And thatâs the challenge, to know where to stop simplifying. :)
Donât discount the originality of academics, they can also be quite cool :)
I agree!
Yeah, I think this is right. Thatâs why I wanted to pose this as concrete subproblems so that they do not feel the confusion we still have around it :)
Yeah, I agree. But also notice that Holden Karnofsky believes that academic research has a lot of aptitudes overlap with AI Safety research skills, and that the academic research track of record is the best fidelity signal for whether youâll do well in AI Safety research. So perhaps we should not discount it entirely.
Thanks!
It sounds like our views are close!
I agree that this would be immensely valuable if it works. Therefore, I think itâs important to try it. I suspect it likely wonât succeed because itâs hard to usefully simplify problems in a pre-paradigmatic field. I feel like if you can do that, maybe youâve already solved the hardest part of the problem.
(I think most of my intuitions about the difficulty of usefully simplifying AI alignment relate to it being a pre-paradigmatic field. However, maybe the necessity of âsecurity mindsetâ for alignment also plays into it.)
In my view, progress in pre-paradigmatic fields often comes from a single individual or a tight-knit group with high-bandwidth internal communication. It doesnât come from lots of people working on a list of simplified problems.
(But maybe the picture Iâm painting is too black-and-white. I agree that thereâs some use to getting inputs from a broader set of people, and occasionally people who isnât usually very creative can have a great insight, etc.)
Thatâs true. What I said sounded like a blanket dismissal of original thinking in academia, but thatâs not how I meant it. Basically, my picture of the situation is as follows:
Few people are capable of making major breakthroughs in pre-paradigmatic fields because that requires a rare kind of creativity and originality (and probably also being a genius). There are people like that in academia, but they have their quirks and theyâd mostly already be working on AI alignment if they had the relevant background. For the sort of people Iâm thinking about, they are drawn to problems like AI risk or AI alignment. They likely wouldnât need things to be simplified. If they look at a simplified problem, their mind immediately jumps to all the implications of the general principle and they think through the more advanced version of the problem because thatâs way more interesting and way more relevant.
In any case, there are a bunch of people like that in long-termist EA because EA heavily selects for this sort of thinking. People from academia who excel at this sort of thinking often end up at EA aligned organizations.
So, who is left in academia and isnât usefully contributing to alignment but could maybe contribute to it if we knew what we wanted from them? Those are the people who donât invent entire fields on their own.
Wow, the âquiteâ wasnât meant that strongly, though I agree that I should have expressed myself a bit clearer/âdifferently. And the work of Chris Olah, etc. isnât useless anyway, but yeah AGI wonât run on transformers and not a lot of what we found wonât be that useful, but we still get experience in how to figure out the principles, and some principles will likely transfer. And AGI forecasting is hard, but certainly not useless/âimpossible, but you do have high uncertainties.
Breakthroughs happen when one understands the problem deeply. I think agree with the ânot when people float around vague ideasâ part, though Iâm not sure what you mean with that. If you mean âacademia of philosophy has a problemâ, then I agree. If you mean âthere is no way Einstein could derive special or general relativity mostly from thought experimentsâ, then I disagree, though you do indeed be skilled to use thought experiments. I donât see any bad kind of âfloating around with vague ideasâ in the AI safety community, but Iâm happy to hear concrete examples from you where you think academia methodology is better!
(And I do btw. think that we need that Einstein-like reasoning, which is hard, but otherwise we basically have no chance of solving the problem in time.)
I still donât see why academia should be better at finding solutions. It can find solutions on easy problems. Thatâs why so many people in academia are goodharting all the time. Finding easy subproblems of which the solutions allow us to solve AI safety is (very likely) much harder than solving those subproblems.
Yes, in history there were some Einsteins in academia that could even solve hard problems, but those are very rare, and getting those brilliant not-goodharting people to work on AI safety is uncontroversially good I would say. But there might be better/âeasier/âfaster options than building the academic field of AI safety to find those people and make them work on AI safety.
Still, Iâm not saying itâs a bad idea to promote AI safety in academia. Iâm just saying it wonât nearly suffice to solve alignment, not by a longshot.
(I think the bottom of your comment isnât as you intended it to be.)
I guess my worry is that if we drop the focus on existential risk then academia may not be helping us solve the problem we need to solve. (As a reference, when I first started EA movement-building I was worried about talking too much about AI Safety because it seemed weird, but in retrospect I consider this to have been a mistake since the winds of change were in the air).
Perhaps we should be thinking about this from the opposite perspective. How can we extend the range of what can be published in academia? We can already identify things like Superintelligence, Stuart Russellâs book and Concrete Problems in AI Safety that have helped build credibility.
I think it is easy to convince someone to work on topic X if you argue it would be very positive rather than warning them that everyone could literally die if he doesnât. If someone comes to me with such kind of argument I will go defensive really quickly, and heâll have to waste a lot of effort to convince me there is a slight chance that heâs right. And even if I have the time to listen to him through and I give him the benefit of the doubt I will come out with awkward feelings, not precisely the ones that make me want to put effort into his topic.
I donât think this is a good idea: there are a couple of reasons why academic publishing is so stringent: to avoid producing blatant useless articles and to measure progress. I argue we want to play by the rules here, both because we would risk being seen as crazy people and because we want to publish sound work.
Well, if you have a low risk preference it is possible to incrementally push things out.