Thanks for this exercise, it’s great to do this kind of thinking explicitly and get other eyes on it.
One issue that jumps out at me to adjust: the calculation of researcher impact doesn’t seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.
I would distribute my credence in alignment research making the difference over a number of doublings of the cumulative quality-adjusted efforts, e.g. say that you get an x% reduction of risk per doubling over some range.
Although in that framework if you would likely have doom with zero effort, that means we have more probability of making the difference to distribute across the effort levels above zero. The results could be pretty similar but a bit smaller than yours above if we thought that the marginal doubling of cumulative effort was worth a 5-10% relative risk reduction.
This is a good point I hadn’t considered. I’ve added a few rows calculating a marginal correction-factor to the google sheet and I’ll update the table if you think they’re sensible.
The new correction factor is based on integrating an exponentially decaying function from N_researchers to N_researchers+1, with the decay rate set by a question about the effect of halving the size of the AI alignment community. Make sure to expand the hidden rows in the doc if you want to see the calculations.
Caveats: No one likes me. I don’t know anything about AI safety, and I have trouble reading spreadsheets. I use paperclips sometimes to make sculptures.
One issue that jumps out at me to adjust: the calculation of researcher impact doesn’t seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.
Ok, this statement about marginal effects is internally consistent....but this seems more than a little nitpicky?
I don’t see any explicit mention of marginal effects in the post[1]:
The only implied marginal effect might be the choice being influenced by the post, which is the OP or someone joining today. There isn’t 500 or 1,000 safety researchers today.
(Diving down this perspective) so with a smaller community, this omission would bias the numbers that appear in the post downward.
From the perspective of an author writing this post on the forum, it seems unlikely that introducing this consideration and raising the numbers, would be helpful instrumentally/rhetorically, since the magnitudes are pretty compelling and not a weakness of the argument.
For similar reasons, it doesn’t seem that probable that someone making a career choice spreadsheet would explicitly model marginal production (as opposed to rounding it off implicitly somewhere).
More substantively, while some sort of “log marginal productivity” is probably true “on average” and useful in the abstract, it’s very, extraordinarily hard to pin down the shape of the “production function from talent”. E.g. we can easily think of weird bends and increasing marginal returns in that function[2][3].
The same difficulty applies with outliers or extraordinary talent—it doesn’t seem reasonable for the OP to account for this.
This is an aesthetic/ideological sort of thing, but IMO it seems unlikely that you would be able to write anything like a concrete production function. This is because of all the unknown considerations, that can only come from object level work.
Like, I’m borderline unsure if it’s practical to express these considerations in English language.
It would be great for my comment here is to be wrong and be stomped all over!
Also, if there is a more substantial reason this post can be expanded, that seems useful.
Like, Chris Olah might be brilliant and 100x better than every other AI safety person/approach. At the same time, we could easily imagine that, no matter what, he’s not going to get AI safety by himself, but an entire org like Anthropic might, right?
In that person’s worldview/opinion, applying a log production function doesn’t seem right. It’s unlikely that say, 7 doublings would do it (100x more quality adjusted people) in this rigid function, since the base probability is so low.
In reality, I think that in that person’s worldview, certain configurations of 100x more talent would be effective.
Thanks for this exercise, it’s great to do this kind of thinking explicitly and get other eyes on it.
One issue that jumps out at me to adjust: the calculation of researcher impact doesn’t seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.
I would distribute my credence in alignment research making the difference over a number of doublings of the cumulative quality-adjusted efforts, e.g. say that you get an x% reduction of risk per doubling over some range.
Although in that framework if you would likely have doom with zero effort, that means we have more probability of making the difference to distribute across the effort levels above zero. The results could be pretty similar but a bit smaller than yours above if we thought that the marginal doubling of cumulative effort was worth a 5-10% relative risk reduction.
This is a good point I hadn’t considered. I’ve added a few rows calculating a marginal correction-factor to the google sheet and I’ll update the table if you think they’re sensible.
The new correction factor is based on integrating an exponentially decaying function from N_researchers to N_researchers+1, with the decay rate set by a question about the effect of halving the size of the AI alignment community. Make sure to expand the hidden rows in the doc if you want to see the calculations.
Caveats: No one likes me. I don’t know anything about AI safety, and I have trouble reading spreadsheets. I use paperclips sometimes to make sculptures.
Ok, this statement about marginal effects is internally consistent....but this seems more than a little nitpicky?
I don’t see any explicit mention of marginal effects in the post[1]:
The only implied marginal effect might be the choice being influenced by the post, which is the OP or someone joining today. There isn’t 500 or 1,000 safety researchers today.
(Diving down this perspective) so with a smaller community, this omission would bias the numbers that appear in the post downward.
From the perspective of an author writing this post on the forum, it seems unlikely that introducing this consideration and raising the numbers, would be helpful instrumentally/rhetorically, since the magnitudes are pretty compelling and not a weakness of the argument.
For similar reasons, it doesn’t seem that probable that someone making a career choice spreadsheet would explicitly model marginal production (as opposed to rounding it off implicitly somewhere).
More substantively, while some sort of “log marginal productivity” is probably true “on average” and useful in the abstract, it’s very, extraordinarily hard to pin down the shape of the “production function from talent”. E.g. we can easily think of weird bends and increasing marginal returns in that function[2][3].
The same difficulty applies with outliers or extraordinary talent—it doesn’t seem reasonable for the OP to account for this.
This is an aesthetic/ideological sort of thing, but IMO it seems unlikely that you would be able to write anything like a concrete production function. This is because of all the unknown considerations, that can only come from object level work.
Like, I’m borderline unsure if it’s practical to express these considerations in English language.
It would be great for my comment here is to be wrong and be stomped all over!
Also, if there is a more substantial reason this post can be expanded, that seems useful.
Please don’t ban me.
I didn’t read it actually.
Like, Chris Olah might be brilliant and 100x better than every other AI safety person/approach. At the same time, we could easily imagine that, no matter what, he’s not going to get AI safety by himself, but an entire org like Anthropic might, right?
As an example, one activist doesn’t seem to think any current AI safety intervention is effective at all.
In that person’s worldview/opinion, applying a log production function doesn’t seem right. It’s unlikely that say, 7 doublings would do it (100x more quality adjusted people) in this rigid function, since the base probability is so low.
In reality, I think that in that person’s worldview, certain configurations of 100x more talent would be effective.