People interested in AI risk and this post might be interested in applying to the Business Operations role at Anthropic, which is working to build reliable, interpretable, and steerable AI systems.
This is a test by the EA Forum Team to gauge interest in job ads relevant to posts - give us feedback here.
The question of whether to enhance capabilities indirectly — for example, by taking a front end web development job at Capabilities AI Company, or building tooling for their researchers — seems a lot simpler than the question of whether to work on capabilities-flavoured research yourself. Because you lose most of the benefits of capabilities research (which fall out from having expertise in the research itself), but keep all the downsides of reducing the time available for alignment research.
Right?
In which case there’s a coherent viewpoint whereby we should generally try to dissuade people from taking most jobs at Capabilities AI Company, while actively steering trustworthy people into their research positions.
I highlight this because I don’t want anyone to make the mistake of bucketing the two together. For example, if 80,000 Hours ends up deciding on a ‘house policy’ formulated as “continue to list capabilities roles on our jobs board”, they could do a lot better by remembering to exclude the indirect kind of job. (I remember seeing such a job on there before—a software role at an AI capabilities org that was not direct research.)
Note Anthropic and Redwood don’t require experience in [roles that advance capabilities] as of last I interviewed them, see more here. (I’ll update this if I hear otherwise)
Before taking a role that you believe will cause harm:
If your goal is to build skills to get accepted to some org, first ask that org if this is at all a good way to prepare for a role with them. It would be kind of, eh, unfortunate if the answer would be “no”
There seem like two obvious models: 1) intractability model, where AGI = doom and the only safe move is not to make it 2) race / differential progress model, where safety needs to be ahead of capabilities by some amount, before capabilities reaches point X
As far as I can tell, alignment is advancing a lot slower per researcher than capabilities. So even if you contribute 1 year on capabilities and 10 on alignment, your effect under differential progress was just bad, and your effect under intractability was badder.
I’m curious how much the “having aligned people in the room is good” theory can be assessed already. I personally am not a big buyer of it. For example this phenomenon doesn’t seem visible in the manhattan project or following nuclear policy.
People interested in AI risk and this post might be interested in applying to the Business Operations role at Anthropic, which is working to build reliable, interpretable, and steerable AI systems.
This is a test by the EA Forum Team to gauge interest in job ads relevant to posts - give us feedback here.
The question of whether to enhance capabilities indirectly — for example, by taking a front end web development job at Capabilities AI Company, or building tooling for their researchers — seems a lot simpler than the question of whether to work on capabilities-flavoured research yourself. Because you lose most of the benefits of capabilities research (which fall out from having expertise in the research itself), but keep all the downsides of reducing the time available for alignment research.
Right?
In which case there’s a coherent viewpoint whereby we should generally try to dissuade people from taking most jobs at Capabilities AI Company, while actively steering trustworthy people into their research positions.
I highlight this because I don’t want anyone to make the mistake of bucketing the two together. For example, if 80,000 Hours ends up deciding on a ‘house policy’ formulated as “continue to list capabilities roles on our jobs board”, they could do a lot better by remembering to exclude the indirect kind of job. (I remember seeing such a job on there before—a software role at an AI capabilities org that was not direct research.)
(What am I missing?)
Great post. This is possibly the best explanation of the relationship between capabilities and safety I’ve seen so far.
“Agree” with this comment if you think the answer is “No, probably not”
Note Anthropic and Redwood don’t require experience in [roles that advance capabilities] as of last I interviewed them, see more here. (I’ll update this if I hear otherwise)
Before taking a role that you believe will cause harm:
If your goal is to build skills to get accepted to some org, first ask that org if this is at all a good way to prepare for a role with them. It would be kind of, eh, unfortunate if the answer would be “no”
On the other hand, optimizing for getting into one specific org might not be the best career plan.
I’m ok with asking 10 orgs.
I’m against asking 0 orgs, especially when it goes together with taking a role that you think is causing harm
There seem like two obvious models:
1) intractability model, where AGI = doom and the only safe move is not to make it
2) race / differential progress model, where safety needs to be ahead of capabilities by some amount, before capabilities reaches point X
As far as I can tell, alignment is advancing a lot slower per researcher than capabilities. So even if you contribute 1 year on capabilities and 10 on alignment, your effect under differential progress was just bad, and your effect under intractability was badder.
I’m curious how much the “having aligned people in the room is good” theory can be assessed already. I personally am not a big buyer of it. For example this phenomenon doesn’t seem visible in the manhattan project or following nuclear policy.