It’s not clear to me that the variance of “being a technical researcher” is actually lower than “being a social coordinator”. Historically, quite a lot of capabilities advancements have come out of efforts that were initially intended to be alignment-focused.
Edited to add: I do think it’s probably harder to have a justified inside-view model of whether one’s efforts are directionally positive or negative when attempting to “buy time”, as opposed to “doing technical research”, if one actually makes a real effort in both cases.
Would you be able to give tangible examples where alignment research has advanced capabilities? I’ve no doubt it’s happened due to alignment-focused researchers being chatty about their capabilities-related findings, but idk examples.
There’s obviously substantial disagreement here, but the most recent salient example (and arguably the entire surrounding context of OpenAI as an institution).
Not sure what Rob is referring to but there are a fair few examples of org/people’s purposes slipping from alignment to capabilities, eg. OpenAI
I myself find it surprisingly difficult to focus on ideas that are robustly beneficial to alignment but not to capabilities.
(E.g. I have a bunch of interpretability ideas. But interpretability can only have no impact on, or accelerate timelines)
Do you know if any of the alignment orgs have some kind of alignment research NDA, with a panel to allow any alignment-only ideas be public, but keep the maybe-capabilities ideas private?
Do you mean you find it hard to avoid thinking about capabilities research or hard to avoid sharing it?
It seems reasonable to me that you’d actually want to try to advance the capabilities frontier, to yourself, privately, so you’re better able to understand the system you’re trying to align, and also you can better predict what’s likely to be dangerous.
That’s a reasonable point—the way this would reflect in the above graph is then wider uncertainty around technical alignment at the high end of researcher ability
It’s not clear to me that the variance of “being a technical researcher” is actually lower than “being a social coordinator”. Historically, quite a lot of capabilities advancements have come out of efforts that were initially intended to be alignment-focused.
Edited to add: I do think it’s probably harder to have a justified inside-view model of whether one’s efforts are directionally positive or negative when attempting to “buy time”, as opposed to “doing technical research”, if one actually makes a real effort in both cases.
Would you be able to give tangible examples where alignment research has advanced capabilities? I’ve no doubt it’s happened due to alignment-focused researchers being chatty about their capabilities-related findings, but idk examples.
There’s obviously substantial disagreement here, but the most recent salient example (and arguably the entire surrounding context of OpenAI as an institution).
Not sure what Rob is referring to but there are a fair few examples of org/people’s purposes slipping from alignment to capabilities, eg. OpenAI
I myself find it surprisingly difficult to focus on ideas that are robustly beneficial to alignment but not to capabilities.
(E.g. I have a bunch of interpretability ideas. But interpretability can only have no impact on, or accelerate timelines)
Do you know if any of the alignment orgs have some kind of alignment research NDA, with a panel to allow any alignment-only ideas be public, but keep the maybe-capabilities ideas private?
Do you mean you find it hard to avoid thinking about capabilities research or hard to avoid sharing it?
It seems reasonable to me that you’d actually want to try to advance the capabilities frontier, to yourself, privately, so you’re better able to understand the system you’re trying to align, and also you can better predict what’s likely to be dangerous.
Thinking about
That’s a reasonable point—the way this would reflect in the above graph is then wider uncertainty around technical alignment at the high end of researcher ability