I think there was a time when it seemed like a good idea, back when the companies were small and there was more of a chance of setting their standards and culture. Back in 2016 I thought on balance we should try to put Safety people in OpenAI, for instance. OpenAI was supposed to be explicitly Safety-oriented, but any company’s safety division seemed like it might pay off to stock with Safety people.
I think everything had clearly changed around the chatGPT moment. The companies had a successful paradigm for making the models, the product was extremely valuable, and the race was very clearly on. At this time, EAs still believed that OpenAI and Anthropic were on their side because they had Safety teams (including many EAs) and talked a lot about Safety, in fact claiming to be developing AGI for the sake of Safety. Actual influence from EA employees to do things that were safe that weren’t good for the mission of those companies was already lost at this point, imo.
It was proven in the ensuing two years that the Safety teams at OpenAI were expendable. Sam Altman has used up and thrown away EA, and he no longer feels any need to pretend OpenAI cares about Safety, despite having very fluently talked to the talk for years before. He was happy to use the EA board members and the entire movement as scapegoats.
Anthropic is showing signs of going the same way. They do Safety research, but nothing stops them developing further, including former promises not to advance the frontier. The main thing they do is develop bigger and bigger models. They want to be attractive to natsec, and whether the actual decisionmakers at the top ultimately believe their agenda is for the sake of Safety or not, it’s clearly not up to the marginal Safety hire or hingeing on their research results. Other AI companies don’t even claim to care about Safety particularly.
So, I do not think it is effective to work at these places. But the real harm is that working for AI labs keeps EAs from speaking out about AI danger, whether because they are under NDA, or because they want to be hireable by a lab, or they want to cooperate with people working at labs, or because they defer to their friends and general social environment and so they think the labs are good (at least Anthropic). imo this price is unacceptably high, and EAs would have a lot more of the impact they hoped to get from being “in the room” at labs by speaking out and contributing to real external pressure and regulation.
I agree that there could be an effect that keeps people from speaking out about AI danger. But:
I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
Working at a frontier AI company (or having worked at one) can give people a better platform to talk about these issues!
Both because of giving people deeper expertise (so they are actually more informed on key questions), but also because of making that legible to the outside world
For instance, I feel better about GDM publishing their recent content on safety and security than not, and I think the paper would have had much less impact on publicdiscourse if it had come from an unaffiliated group
Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).
I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.
I think that the degree of consensus you’d need for the position that you’re outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:
Scientific ~consensus—people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than just switching the way society sees AI without having the scientists leading that)
Freak-out about everyday implications of AI—sufficiently advanced AI would not just pose unprecedented risks, but also represent a fundamental change in the human condition. This could drive a tide of strong sentiment, that doesn’t rely on abstract arguments about danger.
Much better epistemics and/or coordination—out of reach now, put potentially obtainable with stronger tech.
I think there’s potentially something to each of these. But I think the GDM paper is (in expectation) actively helpful for 1 and probably 3, and doesn’t move the needle much either way on 2.
(My own view is that 3 is the most likely route to succeed. There’s some discussion of the pragmatics of this route in AI Tools for Existential Security or AI for AI Safety (both of which also discuss automation of safety research, which is another potential success route), and relevant background views on the big-picture strategic situation in the Choice Transition. But I also feel positive about people exploring routes 1 and 2.)
Much better epistemics and/or coordination—out of reach now, put potentially obtainable with stronger tech.
Why are these the same category and why are you writing coordination off as impossible? It’s not. We have literally done global nonproliferation treaties before.
This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It’s actually the opposite—alignment is not tractable and coordination is.
I’m talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
These are the capacities that we need to help everyone to recognize the situation we’re in and come together to do something about it (and they are partial substitutes: the better everyone’s epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently)
I’m not actually making a claim about alignment difficulty—beyond that I do think systems in the vein of those today and the near-successors of those look pretty safe.
I think that getting people to pause AI research would be a bigger lift than any nonproliferation treaties we’ve had in the past (not that such treaties have always been effective!). This isn’t just a military tech, it’s a massively valuable economic tech. Given the incentives, and the importance of having treaties actually followed, I do think this would be a more difficult challenge than any past nonproliferation work. I don’t think that means it’s impossible, but I do think it’s way more likely if something shifts—hence my 1-3.
(Or if you were asking why I say “out of reach now” in the quoted sentence it’s because I’m literally talking about “much better coordination” as a capability; not what could or couldn’t be achieved with a certain level of coordination.)
Can you explain why you think doing safety work at these places is bad?
I think there was a time when it seemed like a good idea, back when the companies were small and there was more of a chance of setting their standards and culture. Back in 2016 I thought on balance we should try to put Safety people in OpenAI, for instance. OpenAI was supposed to be explicitly Safety-oriented, but any company’s safety division seemed like it might pay off to stock with Safety people.
I think everything had clearly changed around the chatGPT moment. The companies had a successful paradigm for making the models, the product was extremely valuable, and the race was very clearly on. At this time, EAs still believed that OpenAI and Anthropic were on their side because they had Safety teams (including many EAs) and talked a lot about Safety, in fact claiming to be developing AGI for the sake of Safety. Actual influence from EA employees to do things that were safe that weren’t good for the mission of those companies was already lost at this point, imo.
It was proven in the ensuing two years that the Safety teams at OpenAI were expendable. Sam Altman has used up and thrown away EA, and he no longer feels any need to pretend OpenAI cares about Safety, despite having very fluently talked to the talk for years before. He was happy to use the EA board members and the entire movement as scapegoats.
Anthropic is showing signs of going the same way. They do Safety research, but nothing stops them developing further, including former promises not to advance the frontier. The main thing they do is develop bigger and bigger models. They want to be attractive to natsec, and whether the actual decisionmakers at the top ultimately believe their agenda is for the sake of Safety or not, it’s clearly not up to the marginal Safety hire or hingeing on their research results. Other AI companies don’t even claim to care about Safety particularly.
So, I do not think it is effective to work at these places. But the real harm is that working for AI labs keeps EAs from speaking out about AI danger, whether because they are under NDA, or because they want to be hireable by a lab, or they want to cooperate with people working at labs, or because they defer to their friends and general social environment and so they think the labs are good (at least Anthropic). imo this price is unacceptably high, and EAs would have a lot more of the impact they hoped to get from being “in the room” at labs by speaking out and contributing to real external pressure and regulation.
I agree that there could be an effect that keeps people from speaking out about AI danger. But:
I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
Working at a frontier AI company (or having worked at one) can give people a better platform to talk about these issues!
Both because of giving people deeper expertise (so they are actually more informed on key questions), but also because of making that legible to the outside world
For instance, I feel better about GDM publishing their recent content on safety and security than not, and I think the paper would have had much less impact on public discourse if it had come from an unaffiliated group
Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).
I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.
I think that the degree of consensus you’d need for the position that you’re outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:
Scientific ~consensus—people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than just switching the way society sees AI without having the scientists leading that)
Freak-out about everyday implications of AI—sufficiently advanced AI would not just pose unprecedented risks, but also represent a fundamental change in the human condition. This could drive a tide of strong sentiment, that doesn’t rely on abstract arguments about danger.
Much better epistemics and/or coordination—out of reach now, put potentially obtainable with stronger tech.
I think there’s potentially something to each of these. But I think the GDM paper is (in expectation) actively helpful for 1 and probably 3, and doesn’t move the needle much either way on 2.
(My own view is that 3 is the most likely route to succeed. There’s some discussion of the pragmatics of this route in AI Tools for Existential Security or AI for AI Safety (both of which also discuss automation of safety research, which is another potential success route), and relevant background views on the big-picture strategic situation in the Choice Transition. But I also feel positive about people exploring routes 1 and 2.)
Why are these the same category and why are you writing coordination off as impossible? It’s not. We have literally done global nonproliferation treaties before.
This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It’s actually the opposite—alignment is not tractable and coordination is.
These are in the same category because:
I’m talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
These are the capacities that we need to help everyone to recognize the situation we’re in and come together to do something about it (and they are partial substitutes: the better everyone’s epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently)
I’m not actually making a claim about alignment difficulty—beyond that I do think systems in the vein of those today and the near-successors of those look pretty safe.
I think that getting people to pause AI research would be a bigger lift than any nonproliferation treaties we’ve had in the past (not that such treaties have always been effective!). This isn’t just a military tech, it’s a massively valuable economic tech. Given the incentives, and the importance of having treaties actually followed, I do think this would be a more difficult challenge than any past nonproliferation work. I don’t think that means it’s impossible, but I do think it’s way more likely if something shifts—hence my 1-3.
(Or if you were asking why I say “out of reach now” in the quoted sentence it’s because I’m literally talking about “much better coordination” as a capability; not what could or couldn’t be achieved with a certain level of coordination.)