Charles He comments on Expected impact of a career in AI safety under different opinions

Charles He Jun 15, 2022, 12:09 AM
6 points
1 ∶ 0
Caveats: No one likes me. I don’t know anything about AI safety, and I have trouble reading spreadsheets. I use paperclips sometimes to make sculptures.
One issue that jumps out at me to adjust: the calculation of researcher impact doesn’t seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.
Ok, this statement about marginal effects is internally consistent....but this seems more than a little nitpicky?
- I don’t see any explicit mention of marginal effects in the post^[1]:
  - The only implied marginal effect might be the choice being influenced by the post, which is the OP or someone joining today. There isn’t 500 or 1,000 safety researchers today.
    (Diving down this perspective) so with a smaller community, this omission would bias the numbers that appear in the post downward.
    From the perspective of an author writing this post on the forum, it seems unlikely that introducing this consideration and raising the numbers, would be helpful instrumentally/rhetorically, since the magnitudes are pretty compelling and not a weakness of the argument.
    For similar reasons, it doesn’t seem that probable that someone making a career choice spreadsheet would explicitly model marginal production (as opposed to rounding it off implicitly somewhere).
- More substantively, while some sort of “log marginal productivity” is probably true “on average” and useful in the abstract, it’s very, extraordinarily hard to pin down the shape of the “production function from talent”. E.g. we can easily think of weird bends and increasing marginal returns in that function^[2]^[3].
  - The same difficulty applies with outliers or extraordinary talent—it doesn’t seem reasonable for the OP to account for this.
  - This is an aesthetic/ideological sort of thing, but IMO it seems unlikely that you would be able to write anything like a concrete production function. This is because of all the unknown considerations, that can only come from object level work.
    Like, I’m borderline unsure if it’s practical to express these considerations in English language.
It would be great for my comment here is to be wrong and be stomped all over!
Also, if there is a more substantial reason this post can be expanded, that seems useful.
Please don’t ban me.
1. ^
  I didn’t read it actually.
2. ^
  Like, Chris Olah might be brilliant and 100x better than every other AI safety person/approach. At the same time, we could easily imagine that, no matter what, he’s not going to get AI safety by himself, but an entire org like Anthropic might, right?
3. ^
  As an example, one activist doesn’t seem to think any current AI safety intervention is effective at all.
  In that person’s worldview/opinion, applying a log production function doesn’t seem right. It’s unlikely that say, 7 doublings would do it (100x more quality adjusted people) in this rigid function, since the base probability is so low.
  In reality, I think that in that person’s worldview, certain configurations of 100x more talent would be effective.