Sharmake comments on AI Safety Endgame Stories

Sharmake 28 Sep 2022 22:08 UTC
3 points
0 ∶ 0

Counterfactual Impact and Power-Seeking

It worries me that many of the most promising theories of impact for alignment end up with the structure “acquire power, then use it for good”.

This seems to be a result of the counterfactual impact framing and a bias towards simple plans. You are a tiny agent in an unfathomably large world, trying to intervene on what may be the biggest event in human history. If you try to generate stories where you have a clear, simple counterfactual impact, most of them will involve power-seeking for the usual instrumental convergence reasons. Power-seeking might be necessary sometimes, but it seems extremely dangerous as a general attitude; ironically human power-seeking is one of the key drivers of AI x-risk to begin with. Benjamin Ross Hoffman writes beautifully about this problem in Against responsibility.

I don’t have any good solutions, other than a general bias away from power-seeking strategies and towards strategies involving cooperation, dealism, and reducing transaction costs. I think the pivotal act framing is particularly dangerous, and aiming to delay existential catastrophe rather than preventing it completely is a better policy for most actors.

This is why AI risk is so high, in a nutshell.

Yet unlike this post (or Benjamin Ross Hoffman’s post), I think this was a sad, but crucially necessary decision. I think the option you propose is at least partially a fabricated option. I think a lot of the reason is people dearly want to there be a better option, even if it’s not there.

Link to fabricated options:

https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-options