I propose an adjustment to this model: you have to be greater than the rest of the world’s total contributions over time under the action-relevant probability measure. What I mean by action-relevant measure is the probability distribution where worlds are weighted according to your expected impact, not just their probability.
So if you think there’s a decent chance that we’re barely going to solve alignment, and that in those worlds the world will pivot towards a much higher safety focus, you should be more cautious about contributing to capabilities.
I propose an adjustment to this model: you have to be greater than the rest of the world’s total contributions over time under the action-relevant probability measure. What I mean by action-relevant measure is the probability distribution where worlds are weighted according to your expected impact, not just their probability.
So if you think there’s a decent chance that we’re barely going to solve alignment, and that in those worlds the world will pivot towards a much higher safety focus, you should be more cautious about contributing to capabilities.