Okay great, good to know. Again, my hope here is to present the logic of risk compensation in a way that makes it easy to make up your mind about how you think it applies in some domain, not to argue that it does apply in any domain. (And certainly not to argue that a model stripped down to the point that the only effect going on is a risk compensation effect is a realistic model of any domain!)
As for the role of preference-differences in the AI risk case—if what you’re saying is that there’s no difference at all between capabilities researchers’ and safety researchers’ preferences (rather than just that the distributions overlap), that’s not my own intuition at all. I would think that if I learn
that two people have similar transhuamanist-ey preferences except that one discounts the distant future (or future generations), and so cares primarily about achieving amazing outcomes in the next few decades for people alive today, whereas the other cares primarily about the “expected value of the lightcone”; and
that one works on AI capabilities and the other works on AI safety,
my guess about who was who would be a fair bit better than random.
But I absolutely agree that epistemic disagreement is another reason, and could well be a bigger reason, why different people put different values on safety work relative to capabilities work. I say a few words about how this does / doesn’t change the basic logic of risk compensation in the section on “misperceptions”: nothing much seems to change if the parties just disagree in a proportional way about the magnitude of the risk at any given levels of C and S—though this disagreement can change who prioritizes which kind of work, it doesn’t change how the risk compensation interaction plays out. What really changes things there is if the parties disagree about the effectiveness of marginal increases to S, or really, if they disagree about the extent to which increases to S decrease the extent to which increases to C lower P.
In any event though, if what you’re saying is that a framing more applicable to the AI risk context would have made the epistemic diagreement bit central and the preference disagreement secondary (or swept under the rug entirely), fair enough! Look forward to seeing that presentation of it all if someone writes it up.
Tbc if the preferences are written in words like “expected value of the lightcone” I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers—probably the majority at this point—are not longtermists).
Okay great, good to know. Again, my hope here is to present the logic of risk compensation in a way that makes it easy to make up your mind about how you think it applies in some domain, not to argue that it does apply in any domain. (And certainly not to argue that a model stripped down to the point that the only effect going on is a risk compensation effect is a realistic model of any domain!)
As for the role of preference-differences in the AI risk case—if what you’re saying is that there’s no difference at all between capabilities researchers’ and safety researchers’ preferences (rather than just that the distributions overlap), that’s not my own intuition at all. I would think that if I learn
that two people have similar transhuamanist-ey preferences except that one discounts the distant future (or future generations), and so cares primarily about achieving amazing outcomes in the next few decades for people alive today, whereas the other cares primarily about the “expected value of the lightcone”; and
that one works on AI capabilities and the other works on AI safety,
my guess about who was who would be a fair bit better than random.
But I absolutely agree that epistemic disagreement is another reason, and could well be a bigger reason, why different people put different values on safety work relative to capabilities work. I say a few words about how this does / doesn’t change the basic logic of risk compensation in the section on “misperceptions”: nothing much seems to change if the parties just disagree in a proportional way about the magnitude of the risk at any given levels of C and S—though this disagreement can change who prioritizes which kind of work, it doesn’t change how the risk compensation interaction plays out. What really changes things there is if the parties disagree about the effectiveness of marginal increases to S, or really, if they disagree about the extent to which increases to S decrease the extent to which increases to C lower P.
In any event though, if what you’re saying is that a framing more applicable to the AI risk context would have made the epistemic diagreement bit central and the preference disagreement secondary (or swept under the rug entirely), fair enough! Look forward to seeing that presentation of it all if someone writes it up.
Tbc if the preferences are written in words like “expected value of the lightcone” I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers—probably the majority at this point—are not longtermists).