I agree with you that people seem to somewhat overrate getting jobs in AI companies.
However, I do think there’s good work to do inside AI companies. Currently, a lot of the quality-adjusted safety research happens inside AI companies. And see here for my rough argument that it’s valuable to have safety-minded people inside AI companies at the point where they develop catastrophically dangerous AI.
What you write there makes sense but it’s not free to have people in those positions, as I said. I did a lot of thinking about this when I was working on wild animal welfare. It seems superficially like you could get the right kind of WAW-sympathetic person into agencies like FWS and the EPA and they would be there to, say, nudge the agency in a way no one else cared about to help animals when the time came. I did some interviews and looked into some historical cases and I concluded this is not a good idea.
The risk of being captured by the values and motivations of the org where they spend most of their daily lives before they have the chance to provide that marginal difference is high. Then that person is lost the Safety cause or converted into further problem. I predict that you’ll get one successful Safety sleeper agent in, generously, 10 researchers who go to work at a lab. In that case your strategy is just feeding the labs talent and poisoning the ability of their circles to oppose them.
Even if it’s harmless, planting an ideological sleeper agent in firms is generally not the best counterfactual use of the person because their influence in a large org is low. Even relatively high-ranking people frequently have almost no discretion about what happens in the end. AI labs probably have more flexibility than US agencies, but I doubt the principle is that different.
Therefore I think trying to influence the values and safety of labs by working there is a bad idea that would not be pulled off.
My sense is that of the many EAs who have taken EtG jobs quite a few have remained fairly value-aligned? I don’t have any data on this and am just going on vibes, but I would guess significantly more than 10%. Which is some reason to think the same would be the case for AI companies. Though plausibly the finance company’s values are only orthogonal to EA, while the AI company’s values (or at least plans) might be more directly opposed.
In that case your strategy is just feeding the labs talent and poisoning the ability of their circles to oppose them.
It seems like your model only has such influence going one way. The lab worker will influence their friends, but not the other way around. I think two-way influence is a more accurate model.
Another option is to ask your friends to monitor you so you don’t get ideologically captured, and hold an intervention if it seems appropriate.
I think you, and this community, have no idea how difficult it is to resist value/mission drift in these situations. This is not a friend:friend exchange. It’s a small community of nonprofits and individuals:the most valuable companies in the world. They aren’t just gonna pick up the values of a few researchers by osmosis.
From your other comment it seems like you have already been affected by the lab’s influence via the technical research community. The emphasis on technical solutions only benefits them, and it just so happens that to work on the big models you have to work with them. This is not an open exchange where they have been just as influenced by us. Sam and Dario sure want you and the US government to think they are the right safety approach, though.
“The emphasis on technical solutions only benefits them”
This is blatantly question-begging, right? In that it is only true if looking for technical solutions doesn’t lead to safe models, which is one of the main points in dispute between you versus people with a higher opinion of the work inside on safety strategy. Of course, it is true that if you don’t have your own opinion already, you shouldn’t trust people who work at leading labs (or want to) on the question of whether technical safety work will help, for the reasons you give. But “people have an incentive to say X” isn’t actually evidence that X is false, it’s just evidence you shouldn’t trust them. If all people outside labs thought technical safety work was useless that would be one thing. But I don’t think that is actually true, it seems people with relevant expertise are divided even outside the labs. Now of course, there are subtler ways in which even people outside the labs might be incentivized to play down the risks. (Though they might also have other reasons to play them up.) But even that won’t get you to “therefore technical safety is definitely useless”; it’s all meta, not object-level.
There’s also a subtler point that even if “do technical safety work on the inside” is unlikely to work, it might still be the better strategy if confrontational lobbying from the outside is unlikely to work too (something that I think is more true now Trump is in power, although Musk is a bit of a wildcard in that respect.)
I didn’t mean “there is no benefit to technical safety work”; I meant more like “there is only benefit to labs to emphasizing technical safety work to the exclusion of other things”, as in it benefits them and doesn’t cost them to do this.
I agree with you that people seem to somewhat overrate getting jobs in AI companies.
However, I do think there’s good work to do inside AI companies. Currently, a lot of the quality-adjusted safety research happens inside AI companies. And see here for my rough argument that it’s valuable to have safety-minded people inside AI companies at the point where they develop catastrophically dangerous AI.
What you write there makes sense but it’s not free to have people in those positions, as I said. I did a lot of thinking about this when I was working on wild animal welfare. It seems superficially like you could get the right kind of WAW-sympathetic person into agencies like FWS and the EPA and they would be there to, say, nudge the agency in a way no one else cared about to help animals when the time came. I did some interviews and looked into some historical cases and I concluded this is not a good idea.
The risk of being captured by the values and motivations of the org where they spend most of their daily lives before they have the chance to provide that marginal difference is high. Then that person is lost the Safety cause or converted into further problem. I predict that you’ll get one successful Safety sleeper agent in, generously, 10 researchers who go to work at a lab. In that case your strategy is just feeding the labs talent and poisoning the ability of their circles to oppose them.
Even if it’s harmless, planting an ideological sleeper agent in firms is generally not the best counterfactual use of the person because their influence in a large org is low. Even relatively high-ranking people frequently have almost no discretion about what happens in the end. AI labs probably have more flexibility than US agencies, but I doubt the principle is that different.
Therefore I think trying to influence the values and safety of labs by working there is a bad idea that would not be pulled off.
My sense is that of the many EAs who have taken EtG jobs quite a few have remained fairly value-aligned? I don’t have any data on this and am just going on vibes, but I would guess significantly more than 10%. Which is some reason to think the same would be the case for AI companies. Though plausibly the finance company’s values are only orthogonal to EA, while the AI company’s values (or at least plans) might be more directly opposed.
It seems like your model only has such influence going one way. The lab worker will influence their friends, but not the other way around. I think two-way influence is a more accurate model.
Another option is to ask your friends to monitor you so you don’t get ideologically captured, and hold an intervention if it seems appropriate.
I think you, and this community, have no idea how difficult it is to resist value/mission drift in these situations. This is not a friend:friend exchange. It’s a small community of nonprofits and individuals:the most valuable companies in the world. They aren’t just gonna pick up the values of a few researchers by osmosis.
From your other comment it seems like you have already been affected by the lab’s influence via the technical research community. The emphasis on technical solutions only benefits them, and it just so happens that to work on the big models you have to work with them. This is not an open exchange where they have been just as influenced by us. Sam and Dario sure want you and the US government to think they are the right safety approach, though.
“The emphasis on technical solutions only benefits them”
This is blatantly question-begging, right? In that it is only true if looking for technical solutions doesn’t lead to safe models, which is one of the main points in dispute between you versus people with a higher opinion of the work inside on safety strategy. Of course, it is true that if you don’t have your own opinion already, you shouldn’t trust people who work at leading labs (or want to) on the question of whether technical safety work will help, for the reasons you give. But “people have an incentive to say X” isn’t actually evidence that X is false, it’s just evidence you shouldn’t trust them. If all people outside labs thought technical safety work was useless that would be one thing. But I don’t think that is actually true, it seems people with relevant expertise are divided even outside the labs. Now of course, there are subtler ways in which even people outside the labs might be incentivized to play down the risks. (Though they might also have other reasons to play them up.) But even that won’t get you to “therefore technical safety is definitely useless”; it’s all meta, not object-level.
There’s also a subtler point that even if “do technical safety work on the inside” is unlikely to work, it might still be the better strategy if confrontational lobbying from the outside is unlikely to work too (something that I think is more true now Trump is in power, although Musk is a bit of a wildcard in that respect.)
I didn’t mean “there is no benefit to technical safety work”; I meant more like “there is only benefit to labs to emphasizing technical safety work to the exclusion of other things”, as in it benefits them and doesn’t cost them to do this.