Jobst Heitzig (EMPO project) comments on My tentative best guess on how EAs and Rationalists sometimes turn crazy

Jobst Heitzig (EMPO project) 13 Sep 2023 5:45 UTC
5 points
0 ∶ 0
The “impossible to correlate perfectly” piece is like in AI alignment, where one could also argue that perfect alignment of a reward function to the “true” utility function is impossible.

Indeed, one might even argue that the joint cognition implemented by the EA/rationality/x-risk community as a whole is a form of “artificial” intelligence, let’s call it “EI” and thus we face an “EI alignment” problem. As EA becomes more powerful in the world, we get “ESI” (effective altruism superhuman intelligence) and related risks from misaligned ESI.

The obvious solution in my opinion is the same for AI and EI: don’t maximize, since the metric you might aim to maximize is most likely imperfectly aligned with true utility. Rather satisfice: be ambitious, but not infinitely so. After reaching an ambitious goal, check if your reward function still makes sense before setting the next, more ambitious goal. And have some human users constantly verify your reward function :-)