I‘ve just edited the intro to say: it’s not obvious to me one way or the other whether it’s a big deal in the AI risk case. I don’t think I know much about the AI risk case (or any other case) to have much of an opinion, and I certainly don’t think anything here is specific enough to come to a conclusion in any case. My hope is just that something here makes it easier to for people who do know about particular cases to get started thinking through the problem.
If I have to make a guess about the AI risk case, I’d emphasize my conjecture near the end, just before the “takeaways” section, namely that (as you suggest) there currently isn’t a ton of restraint, so (b) mostly fails, but that this has a good chance of changing in the future:
Today, while even the most advanced AI systems are neither very capable nor very dangerous, safety concerns are not constraining C much below ¯C. If technological advances unlock the ability to develop systems which offer utopia if their deployment is successful, but which pose large risks, then the developer’s choice of C at any given S is more likely to be far below
¯C, and the risk compensation induced by increasing S is therefore more likely to be strong.
If lots/most of AI safety work (beyond evals) is currently acting more “like evals” than like pure “increases to S”, great to hear—concern about risk compensation can just be an argument for making sure it stays that way!
Good to hear, thanks!
I‘ve just edited the intro to say: it’s not obvious to me one way or the other whether it’s a big deal in the AI risk case. I don’t think I know much about the AI risk case (or any other case) to have much of an opinion, and I certainly don’t think anything here is specific enough to come to a conclusion in any case. My hope is just that something here makes it easier to for people who do know about particular cases to get started thinking through the problem.
If I have to make a guess about the AI risk case, I’d emphasize my conjecture near the end, just before the “takeaways” section, namely that (as you suggest) there currently isn’t a ton of restraint, so (b) mostly fails, but that this has a good chance of changing in the future:
If lots/most of AI safety work (beyond evals) is currently acting more “like evals” than like pure “increases to S”, great to hear—concern about risk compensation can just be an argument for making sure it stays that way!