I’m confused about how to square this with specific counterexamples. Say theoretical alignment work: P(important safety progress) probably scales with time invested, but not 100x by doubling your work hours. Any explanations here?
Idk if this is because uncertainty/ probabilistic stuff muddles the log picture. E.g. we really don’t know where the hits are, so many things are ‘decent shots’. Maybe after we know the outcomes, the outlier good things would be quite bad on the personal-liking front. But that doesn’t sound exactly correct either
I’m confused about how to square this with specific counterexamples. Say theoretical alignment work: P(important safety progress) probably scales with time invested, but not 100x by doubling your work hours. Any explanations here?
Idk if this is because uncertainty/ probabilistic stuff muddles the log picture. E.g. we really don’t know where the hits are, so many things are ‘decent shots’. Maybe after we know the outcomes, the outlier good things would be quite bad on the personal-liking front. But that doesn’t sound exactly correct either