Just gonna boost this excellent piece by Tomasik. I think partial alignment/near-misses causing s-risk is potentially an enormous concern. This is more true the shorter timelines are and thus the more likely people are to try using “hail mary” risky alignment techniques. Also more true for less principled/Agent Foundations-type alignment directions.
Can someone provide a more realistic example of partial alignment causing s-risk than SignFlip or MisconfiguredMinds? I don’t see either of these as something that you’d be reasonably likely to get by say, only doing 95% of the alignment research necessary rather than 110%.
https://reducing-suffering.org/near-miss/
Just gonna boost this excellent piece by Tomasik. I think partial alignment/near-misses causing s-risk is potentially an enormous concern. This is more true the shorter timelines are and thus the more likely people are to try using “hail mary” risky alignment techniques. Also more true for less principled/Agent Foundations-type alignment directions.
Can someone provide a more realistic example of partial alignment causing s-risk than SignFlip or MisconfiguredMinds? I don’t see either of these as something that you’d be reasonably likely to get by say, only doing 95% of the alignment research necessary rather than 110%.