We won’t solve AI safety by just throwing a bunch of (ML) researchers on it.
AGI will (likely) be quite different from current ML systems. Also, work on aligning current ML systems won’t be that useful, and generally what we need is not small advancements, but we rather need breakthroughs. (This is a great post for getting started on understanding why this is the case.)
We much rather need a few Paul Christiano level researchers that build a very deep understanding of the alignment problem and then can make huge advances, than we need many still-great-but-not-that-extraordinary researchers.
Academia doesn’t have good incentives to make that kind of important progress: You are supposed to publish papers, so you (1) focus on what you can do with current ML systems, instead of focusing on more uncertain longer-term work, and (2) goodhart on some subproblems that don’t take that long to solve, instead of actually focusing on understanding the core difficulties and how one might address them.
I think paradigms are partially useful and we should probably create some for some specific approaches to AI safety, but I think the default paradigms that would develop in academia are probably pretty bad, so that the research isn’t that useful.
Promoting AI safety in academia is probably still good, but for actually preventing existential risk, we need some other way of creating incentives to usefully contribute to AI safety. I don’t know yet how to best do it, but I think there are better options.
Getting people into AI safety without arguing about x-risk seems nice, but mostly because I think this strategy is useful for convincing people of x-risk later, so they then can work on important stuff.
We won’t solve AI safety by just throwing a bunch of (ML) researchers on it.
Perhaps we don’t need to buy ML researchers (although I think we should try at least), but I think it is more likely we won’t solve AI Safety if we don’t get more concrete problems in the first place.
AGI will (likely) be quite different from current ML systems.
I’m afraid I disagree with this. For example, if this were true, interpretability from Chris Olah or the Anthropic team would be automatically doomed; Value Learning from CHAI would also be useless, our predictions about forecasting that we use to convince people of the importance of AI Safety equally so. Of course, this does not prove anything; but I think there is a case to be made that Deep Learning seems currently as the only viable path we have found to perhaps get to AGI. And while I think the agnostic approach of MIRI is very valuable, I think it would be foolish to bet all our work to the truth of this statement. It could still be the case if we were much more bottlenecked in people than in research lines, but I don’t think that’s the case, I think we are more bottlenecked in concrete ideas of how to push forward our understanding. Needless to say, I believe Value Learning and interpretability are things that are very suitable for academia.
we rather need breakthroughs
Breakthroughs only happen when one understands the problem in detail, not when people float around vague ideas.
We much rather need a few Paul Christiano level researchers that build a very deep understanding of the alignment problem and then can make huge advances, than we need many still-great-but-not-that-extraordinary researchers.
Agreed. But I think there are great researchers at academia, and perhaps we could profit from that. I don’t think we have any method to spot good researchers in our community anyways. Academia can sometimes help with that.
(1) focus on what you can do with current ML systems, instead of focusing on more uncertain longer-term work, and (2) goodhart on some subproblems that don’t take that long to solve.
I think this is a bit exaggerated. What academia does is to ask for well defined problems and concrete solutions. And that’s what we want if we want to progress. It is true that some goodharting will happen, but I think we would be closer to the optimum if we were goodharting a bit than where we are right now, unable to measure much progress. Notice also that Shannon and many other people coming up with breakthroughs did so in academic ways.
we need some other way of creating incentives to usefully contribute to AI safety
I think arguing about the importance of AI Safety is enough, as long as they don’t feel they have nothing to contribute to because things are too vague or too far away from their expertise.
Perhaps we don’t need to buy ML researchers (although I think we should try at least), but I think it is more likely we won’t solve AI Safety if we don’t get more concrete problems in the first place.
If the concrete problems are too watered down compared to the real thing, you also won’t solve AI alignment by misleading people into thinking it’s easier.
But we probably agree that insofar as some original-thinking genius reasoners can produce useful shovel-ready research questions for not-so-original-thinking academics (who may or may not be geniuses at other skills) to unbottleneck all the talent there, they should do it. The question seems to be “is it possible?”
I don’t think we have any method to spot good researchers in our community anyways. Academia can sometimes help with that.
I think the best judges are the people who are already doing work that the alignment community deems valuable. If all of EA is currently thinking about AI alignment in a way that’s so confused that the experts from within can’t even recognize talent, then we’re in trouble anyway. If EAs who have specialized on this for years are so vastly confused about it, academia will be even more confused.
Independently of the above argument that we’re in trouble if we can’t even recognize talent, I also feel pretty convinced that we can on first-order grounds. It seems pretty obvious to me that work tests or interviews conducted by community experts do an okay job at recognizing talent. They probably don’t do a perfect job, but it’s still good enough. I think the biggest problem is that few people in EA have the expertise to do it well (and those people tend to be very busy), so grantmakers or career advice teams with talent scouts (such as 80,000 Hours) are bottlecked by expert time that would go into evaluations and assessments.
If the concrete problems are too watered down compared to the real thing, you also won’t solve AI alignment by misleading people into thinking it’s easier.
We could not yet create a beneficial AI system even via brute force.
Imagine you have a Jupiter-sized computer and a very simple goal: Make the universe contain as much diamond as possible. The computer has access to the internet and a number of robotic factories and laboratories, and by “diamond” we mean carbon atoms covalently bound to four other carbon atoms. (Pretend we don’t care how it makes the diamond, or what it has to take apart in order to get the carbon; the goal is to study a simplified problem.) Let’s say that the Jupiter-sized computer is running python. How would you program it to produce lots and lots of diamond?
As it stands, we do not yet know how to program a computer to achieve a goal such as that one.
It would be fair to say that this is just from an exposition of the importance of AI Safety, rather than from a proposal itself. But in any case, humans always solve complicated problems by breaking them up because otherwise it is terribly hard. Of course, there is a risk that we oversimplify the problem, but general researchers often know where to stop.
Perhaps you were focusing more on things vaguely related such as fairness etc, but I’m arguing more for making the real AI Safety problems concrete enough that they will tackle it. And that’s the challenge, to know where to stop simplifying. :)
some original-thinking genius reasoners can produce useful shovel-ready research questions for not-so-original-thinking academics
Don’t discount the originality of academics, they can also be quite cool :)
I think the best judges are the people who are already doing work that the alignment community deems valuable.
I agree!
If EAs who have specialized on this for years are so vastly confused about it, academia will be even more confused.
Yeah, I think this is right. That’s why I wanted to pose this as concrete subproblems so that they do not feel the confusion we still have around it :)
Independently of the above argument that we’re in trouble if we can’t even recognize talent, I also feel pretty convinced that we can on first-order grounds. It seems pretty obvious to me that work tests or interviews conducted by community experts do an okay job at recognizing talent.
Yeah, I agree. But also notice that Holden Karnofsky believes that academic research has a lot of aptitudes overlap with AI Safety research skills, and that the academic research track of record is the best fidelity signal for whether you’ll do well in AI Safety research. So perhaps we should not discount it entirely.
but I’m arguing more for making the real AI Safety problems concrete enough that they will tackle it.
I agree that this would be immensely valuable if it works. Therefore, I think it’s important to try it. I suspect it likely won’t succeed because it’s hard to usefully simplify problems in a pre-paradigmatic field. I feel like if you can do that, maybe you’ve already solved the hardest part of the problem.
(I think most of my intuitions about the difficulty of usefully simplifying AI alignment relate to it being a pre-paradigmatic field. However, maybe the necessity of “security mindset” for alignment also plays into it.)
In my view, progress in pre-paradigmatic fields often comes from a single individual or a tight-knit group with high-bandwidth internal communication. It doesn’t come from lots of people working on a list of simplified problems.
(But maybe the picture I’m painting is too black-and-white. I agree that there’s some use to getting inputs from a broader set of people, and occasionally people who isn’t usually very creative can have a great insight, etc.)
Don’t discount the originality of academics, they can also be quite cool :)
That’s true. What I said sounded like a blanket dismissal of original thinking in academia, but that’s not how I meant it. Basically, my picture of the situation is as follows:
Few people are capable of making major breakthroughs in pre-paradigmatic fields because that requires a rare kind of creativity and originality (and probably also being a genius). There are people like that in academia, but they have their quirks and they’d mostly already be working on AI alignment if they had the relevant background. For the sort of people I’m thinking about, they are drawn to problems like AI risk or AI alignment. They likely wouldn’t need things to be simplified. If they look at a simplified problem, their mind immediately jumps to all the implications of the general principle and they think through the more advanced version of the problem because that’s way more interesting and way more relevant.
In any case, there are a bunch of people like that in long-termist EA because EA heavily selects for this sort of thinking. People from academia who excel at this sort of thinking often end up at EA aligned organizations.
So, who is left in academia and isn’t usefully contributing to alignment but could maybe contribute to it if we knew what we wanted from them? Those are the people who don’t invent entire fields on their own.
AGI will (likely) be quite different from current ML systems.
I’m afraid I disagree with this. For example, if this were true, interpretability from Chris Olah or the Anthropic team would be automatically doomed; Value Learning from CHAI would also be useless, our predictions about forecasting that we use to convince people of the importance of AI Safety equally so.
Wow, the “quite” wasn’t meant that strongly, though I agree that I should have expressed myself a bit clearer/differently. And the work of Chris Olah, etc. isn’t useless anyway, but yeah AGI won’t run on transformers and not a lot of what we found won’t be that useful, but we still get experience in how to figure out the principles, and some principles will likely transfer. And AGI forecasting is hard, but certainly not useless/impossible, but you do have high uncertainties.
Breakthroughs only happen when one understands the problem in detail, not when people float around vague ideas.
Breakthroughs happen when one understands the problem deeply. I think agree with the “not when people float around vague ideas” part, though I’m not sure what you mean with that. If you mean “academia of philosophy has a problem”, then I agree. If you mean “there is no way Einstein could derive special or general relativity mostly from thought experiments”, then I disagree, though you do indeed be skilled to use thought experiments. I don’t see any bad kind of “floating around with vague ideas” in the AI safety community, but I’m happy to hear concrete examples from you where you think academia methodology is better! (And I do btw. think that we need that Einstein-like reasoning, which is hard, but otherwise we basically have no chance of solving the problem in time.)
What academia does is to ask for well defined problems and concrete solutions. And that’s what we want if we want to progress.
I still don’t see why academia should be better at finding solutions. It can find solutions on easy problems. That’s why so many people in academia are goodharting all the time. Finding easy subproblems of which the solutions allow us to solve AI safety is (very likely) much harder than solving those subproblems.
Notice also that Shannon and many other people coming up with breakthroughs did so in academic ways.
Yes, in history there were some Einsteins in academia that could even solve hard problems, but those are very rare, and getting those brilliant not-goodharting people to work on AI safety is uncontroversially good I would say. But there might be better/easier/faster options than building the academic field of AI safety to find those people and make them work on AI safety.
Still, I’m not saying it’s a bad idea to promote AI safety in academia. I’m just saying it won’t nearly suffice to solve alignment, not by a longshot.
(I think the bottom of your comment isn’t as you intended it to be.)
We won’t solve AI safety by just throwing a bunch of (ML) researchers on it.
AGI will (likely) be quite different from current ML systems. Also, work on aligning current ML systems won’t be that useful, and generally what we need is not small advancements, but we rather need breakthroughs. (This is a great post for getting started on understanding why this is the case.)
We much rather need a few Paul Christiano level researchers that build a very deep understanding of the alignment problem and then can make huge advances, than we need many still-great-but-not-that-extraordinary researchers.
Academia doesn’t have good incentives to make that kind of important progress: You are supposed to publish papers, so you (1) focus on what you can do with current ML systems, instead of focusing on more uncertain longer-term work, and (2) goodhart on some subproblems that don’t take that long to solve, instead of actually focusing on understanding the core difficulties and how one might address them.
I think paradigms are partially useful and we should probably create some for some specific approaches to AI safety, but I think the default paradigms that would develop in academia are probably pretty bad, so that the research isn’t that useful.
Promoting AI safety in academia is probably still good, but for actually preventing existential risk, we need some other way of creating incentives to usefully contribute to AI safety. I don’t know yet how to best do it, but I think there are better options.
Getting people into AI safety without arguing about x-risk seems nice, but mostly because I think this strategy is useful for convincing people of x-risk later, so they then can work on important stuff.
Hey Simon, thanks for answering!
Perhaps we don’t need to buy ML researchers (although I think we should try at least), but I think it is more likely we won’t solve AI Safety if we don’t get more concrete problems in the first place.
I’m afraid I disagree with this. For example, if this were true, interpretability from Chris Olah or the Anthropic team would be automatically doomed; Value Learning from CHAI would also be useless, our predictions about forecasting that we use to convince people of the importance of AI Safety equally so. Of course, this does not prove anything; but I think there is a case to be made that Deep Learning seems currently as the only viable path we have found to perhaps get to AGI. And while I think the agnostic approach of MIRI is very valuable, I think it would be foolish to bet all our work to the truth of this statement. It could still be the case if we were much more bottlenecked in people than in research lines, but I don’t think that’s the case, I think we are more bottlenecked in concrete ideas of how to push forward our understanding. Needless to say, I believe Value Learning and interpretability are things that are very suitable for academia.
Breakthroughs only happen when one understands the problem in detail, not when people float around vague ideas.
Agreed. But I think there are great researchers at academia, and perhaps we could profit from that. I don’t think we have any method to spot good researchers in our community anyways. Academia can sometimes help with that.
I think this is a bit exaggerated. What academia does is to ask for well defined problems and concrete solutions. And that’s what we want if we want to progress. It is true that some goodharting will happen, but I think we would be closer to the optimum if we were goodharting a bit than where we are right now, unable to measure much progress. Notice also that Shannon and many other people coming up with breakthroughs did so in academic ways.
If the concrete problems are too watered down compared to the real thing, you also won’t solve AI alignment by misleading people into thinking it’s easier.
But we probably agree that insofar as some original-thinking genius reasoners can produce useful shovel-ready research questions for not-so-original-thinking academics (who may or may not be geniuses at other skills) to unbottleneck all the talent there, they should do it. The question seems to be “is it possible?”
I think the best judges are the people who are already doing work that the alignment community deems valuable. If all of EA is currently thinking about AI alignment in a way that’s so confused that the experts from within can’t even recognize talent, then we’re in trouble anyway. If EAs who have specialized on this for years are so vastly confused about it, academia will be even more confused.
Independently of the above argument that we’re in trouble if we can’t even recognize talent, I also feel pretty convinced that we can on first-order grounds. It seems pretty obvious to me that work tests or interviews conducted by community experts do an okay job at recognizing talent. They probably don’t do a perfect job, but it’s still good enough. I think the biggest problem is that few people in EA have the expertise to do it well (and those people tend to be very busy), so grantmakers or career advice teams with talent scouts (such as 80,000 Hours) are bottlecked by expert time that would go into evaluations and assessments.
Hey Lukas!
Note that even MIRI sometimes does this
It would be fair to say that this is just from an exposition of the importance of AI Safety, rather than from a proposal itself. But in any case, humans always solve complicated problems by breaking them up because otherwise it is terribly hard. Of course, there is a risk that we oversimplify the problem, but general researchers often know where to stop.
Perhaps you were focusing more on things vaguely related such as fairness etc, but I’m arguing more for making the real AI Safety problems concrete enough that they will tackle it. And that’s the challenge, to know where to stop simplifying. :)
Don’t discount the originality of academics, they can also be quite cool :)
I agree!
Yeah, I think this is right. That’s why I wanted to pose this as concrete subproblems so that they do not feel the confusion we still have around it :)
Yeah, I agree. But also notice that Holden Karnofsky believes that academic research has a lot of aptitudes overlap with AI Safety research skills, and that the academic research track of record is the best fidelity signal for whether you’ll do well in AI Safety research. So perhaps we should not discount it entirely.
Thanks!
It sounds like our views are close!
I agree that this would be immensely valuable if it works. Therefore, I think it’s important to try it. I suspect it likely won’t succeed because it’s hard to usefully simplify problems in a pre-paradigmatic field. I feel like if you can do that, maybe you’ve already solved the hardest part of the problem.
(I think most of my intuitions about the difficulty of usefully simplifying AI alignment relate to it being a pre-paradigmatic field. However, maybe the necessity of “security mindset” for alignment also plays into it.)
In my view, progress in pre-paradigmatic fields often comes from a single individual or a tight-knit group with high-bandwidth internal communication. It doesn’t come from lots of people working on a list of simplified problems.
(But maybe the picture I’m painting is too black-and-white. I agree that there’s some use to getting inputs from a broader set of people, and occasionally people who isn’t usually very creative can have a great insight, etc.)
That’s true. What I said sounded like a blanket dismissal of original thinking in academia, but that’s not how I meant it. Basically, my picture of the situation is as follows:
Few people are capable of making major breakthroughs in pre-paradigmatic fields because that requires a rare kind of creativity and originality (and probably also being a genius). There are people like that in academia, but they have their quirks and they’d mostly already be working on AI alignment if they had the relevant background. For the sort of people I’m thinking about, they are drawn to problems like AI risk or AI alignment. They likely wouldn’t need things to be simplified. If they look at a simplified problem, their mind immediately jumps to all the implications of the general principle and they think through the more advanced version of the problem because that’s way more interesting and way more relevant.
In any case, there are a bunch of people like that in long-termist EA because EA heavily selects for this sort of thinking. People from academia who excel at this sort of thinking often end up at EA aligned organizations.
So, who is left in academia and isn’t usefully contributing to alignment but could maybe contribute to it if we knew what we wanted from them? Those are the people who don’t invent entire fields on their own.
Wow, the “quite” wasn’t meant that strongly, though I agree that I should have expressed myself a bit clearer/differently. And the work of Chris Olah, etc. isn’t useless anyway, but yeah AGI won’t run on transformers and not a lot of what we found won’t be that useful, but we still get experience in how to figure out the principles, and some principles will likely transfer. And AGI forecasting is hard, but certainly not useless/impossible, but you do have high uncertainties.
Breakthroughs happen when one understands the problem deeply. I think agree with the “not when people float around vague ideas” part, though I’m not sure what you mean with that. If you mean “academia of philosophy has a problem”, then I agree. If you mean “there is no way Einstein could derive special or general relativity mostly from thought experiments”, then I disagree, though you do indeed be skilled to use thought experiments. I don’t see any bad kind of “floating around with vague ideas” in the AI safety community, but I’m happy to hear concrete examples from you where you think academia methodology is better!
(And I do btw. think that we need that Einstein-like reasoning, which is hard, but otherwise we basically have no chance of solving the problem in time.)
I still don’t see why academia should be better at finding solutions. It can find solutions on easy problems. That’s why so many people in academia are goodharting all the time. Finding easy subproblems of which the solutions allow us to solve AI safety is (very likely) much harder than solving those subproblems.
Yes, in history there were some Einsteins in academia that could even solve hard problems, but those are very rare, and getting those brilliant not-goodharting people to work on AI safety is uncontroversially good I would say. But there might be better/easier/faster options than building the academic field of AI safety to find those people and make them work on AI safety.
Still, I’m not saying it’s a bad idea to promote AI safety in academia. I’m just saying it won’t nearly suffice to solve alignment, not by a longshot.
(I think the bottom of your comment isn’t as you intended it to be.)