A small point of confusion: taking U(C) = C (+ a constant) by appropriate parametrization of C is an interesting move. I’m not totally sure what to think of it; I can see that it helps here, but it makes it seem quite hard work to develop good intuitions about the shape of P. But the one clear intuition I have about the shape of P is that there should be some C>0 where P is 0, regardless of S, because there are clearly some useful applications of AI which pose no threat of existential catastrophe. But your baseline functional form for P excludes this possibility. I’m not sure how much this matters, because as you say the conclusions extend to a much broader class of possible functions (not all of which exclude this kind of shape), but the tension makes me want to check I’m not missing something?
Thanks for noting this. If in some case there is a positive level of capabilities for which P is 1, then we can just say that the level of capabilities denoted by C = 0 is the maximum level at which P is still 1. What will sort of change is that the constraint will be not C ≥ 0 but C ≥ (something negative), but that doesn’t really matter since here you’ll never want to set C<0 anyway. I’ve added a note to clarify this.
Maybe a thought here is that, since there is some stretch of capabilities along which P=1, we should think that P(.) is horizontal around C=0 (the point at which P can start falling from 1) for any given S, and that this might produce very different results from the e−CS example in which there would be a kink at C=0. But no—the key point is whether increases to S change the curve in a way that widens as C moves to the right, and so “act as price decreases to C”, not the slope of the curve around C=0. E.g. if P=1−C2/S (for C∈[0,√S], and 0 above), then in the k=0 case where the lab is trying to maximize (1−C2/S)C, they set C=√S/3, and so P is again fixed (here, at 2⁄3) regardless of S.
A small point of confusion: taking U(C) = C (+ a constant) by appropriate parametrization of C is an interesting move. I’m not totally sure what to think of it; I can see that it helps here, but it makes it seem quite hard work to develop good intuitions about the shape of P. But the one clear intuition I have about the shape of P is that there should be some C>0 where P is 0, regardless of S, because there are clearly some useful applications of AI which pose no threat of existential catastrophe. But your baseline functional form for P excludes this possibility. I’m not sure how much this matters, because as you say the conclusions extend to a much broader class of possible functions (not all of which exclude this kind of shape), but the tension makes me want to check I’m not missing something?
Thanks for noting this. If in some case there is a positive level of capabilities for which P is 1, then we can just say that the level of capabilities denoted by C = 0 is the maximum level at which P is still 1. What will sort of change is that the constraint will be not C ≥ 0 but C ≥ (something negative), but that doesn’t really matter since here you’ll never want to set C<0 anyway. I’ve added a note to clarify this.
Maybe a thought here is that, since there is some stretch of capabilities along which P=1, we should think that P(.) is horizontal around C=0 (the point at which P can start falling from 1) for any given S, and that this might produce very different results from the e−CS example in which there would be a kink at C=0. But no—the key point is whether increases to S change the curve in a way that widens as C moves to the right, and so “act as price decreases to C”, not the slope of the curve around C=0. E.g. if P=1−C2/S (for C∈[0,√S], and 0 above), then in the k=0 case where the lab is trying to maximize (1−C2/S)C, they set C=√S/3, and so P is again fixed (here, at 2⁄3) regardless of S.