Itâs a great question. I see Safety Cases more as a meta-framework in which you can use different kinds of evidence. Other risk management techniques can be used as evidence in a Safety Case (eg this paper uses a delphi method).
Also I think Safety Cases are attractive to people in AI Safety because:
1) They offer flexibility for the kind of evidence and reasoning that is allowed. From skimming it seems to me that many of the other risk management practices you linked are more strict about the kind of arguments or the kind of evidence that can be brought.
2) They strive to comprehensively prove that overall risk is low. I think most of the other techniques donât let you make claims such as âoverall risk from a system is <x%â (which AI Safety people want).
3) (I might be wrong here), but it seems to me that many other risk management techniques require you to understand the system and itâs environment decently well, whereas this is very difficult for AI Safety.
Overall, you might well be right that other risk management techniques have been overlooked and we shouldnât just focus on Safety Cases.
Thanks for the input!
On Scheming: I actually donât think scheming risk is the most important factor. Even removing it completely doesnât change my final conclusion. I agree that a bimodal distribution with scheming/ânon-scheming would be appropriate for a more sophisticated model. I just ended up lowering the weight I assign to the scheming factor (by half) to take into account that I am not sure whether scheming will/âwonât be an issue.
In my analysis, the ability to get good feedback signals/âsuccess criteria is the factor that moves me the most to thinking that capabilities get sped up before safety.
On Task length: You have more visibility into this, so Iâm happy to defer. But Iâd love to hear more about why you think tasks in capabilities research have longer task lengths. Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?