Owen Cotton-Barratt comments on Notes on risk compensation

Owen Cotton-Barratt 12 May 2024 20:55 UTC
2 points
0 ∶ 0
A small point of confusion: taking U(C) = C (+ a constant) by appropriate parametrization of C is an interesting move. I’m not totally sure what to think of it; I can see that it helps here, but it makes it seem quite hard work to develop good intuitions about the shape of P. But the one clear intuition I have about the shape of P is that there should be some C>0 where P is 0, regardless of S, because there are clearly some useful applications of AI which pose no threat of existential catastrophe. But your baseline functional form for P excludes this possibility. I’m not sure how much this matters, because as you say the conclusions extend to a much broader class of possible functions (not all of which exclude this kind of shape), but the tension makes me want to check I’m not missing something?
- trammell 12 May 2024 21:36 UTC
  2 points
  0 ∶ 0
  Parent
  Thanks for noting this. If in some case there is a positive level of capabilities for which P is 1, then we can just say that the level of capabilities denoted by C = 0 is the maximum level at which P is still 1. What will sort of change is that the constraint will be not C ≥ 0 but C ≥ (something negative), but that doesn’t really matter since here you’ll never want to set C<0 anyway. I’ve added a note to clarify this.
  Maybe a thought here is that, since there is some stretch of capabilities along which P=1, we should think that P(.) is horizontal around C=0 (the point at which P can start falling from 1) for any given S, and that this might produce very different results from the $e^{- \frac{C}{S}}$ example in which there would be a kink at C=0. But no—the key point is whether increases to S change the curve in a way that widens as C moves to the right, and so “act as price decreases to C”, not the slope of the curve around C=0. E.g. if $P = 1 - C^{2} / S$ (for $C \in [0, \sqrt{S}]$ , and 0 above), then in the k=0 case where the lab is trying to maximize $(1 - C^{2} / S) C$ , they set $C = \sqrt{S / 3}$ , and so P is again fixed (here, at ²⁄₃) regardless of S.