Can someone provide a more realistic example of partial alignment causing s-risk than SignFlip or MisconfiguredMinds? I don’t see either of these as something that you’d be reasonably likely to get by say, only doing 95% of the alignment research necessary rather than 110%.
Can someone provide a more realistic example of partial alignment causing s-risk than SignFlip or MisconfiguredMinds? I don’t see either of these as something that you’d be reasonably likely to get by say, only doing 95% of the alignment research necessary rather than 110%.