I was not a huge fan of the instrumental convergence paper, although I didn’t have time to thoroughly review it. In short, it felt too slow in making its reasoning and conclusion clear, and once (I think?) I understood what it was saying, it felt quite nitpicky (or a borderline motte-and-bailey). In reality, I’m still unclear if/how it responds to the real-world applications of the reasoning (e.g., explaining why a system with a seemingly simple goal like calculating digits of pi would want to cause the extinction of humanity).
The summary in this forum post seems to help, but I really feel like the caveats identified in this post (“this paper simply argues that this would not be true of agents with randomly-initialized goals”) is not made clear in the abstract.[1]
The abstract mentions “I find that, even if intrinsic desires are randomly selected [...]” but this does not at all read like a caveat, especially due to the use of “even if” (rather than just “if”).
I was not a huge fan of the instrumental convergence paper, although I didn’t have time to thoroughly review it. In short, it felt too slow in making its reasoning and conclusion clear, and once (I think?) I understood what it was saying, it felt quite nitpicky (or a borderline motte-and-bailey). In reality, I’m still unclear if/how it responds to the real-world applications of the reasoning (e.g., explaining why a system with a seemingly simple goal like calculating digits of pi would want to cause the extinction of humanity).
The summary in this forum post seems to help, but I really feel like the caveats identified in this post (“this paper simply argues that this would not be true of agents with randomly-initialized goals”) is not made clear in the abstract.[1]
The abstract mentions “I find that, even if intrinsic desires are randomly selected [...]” but this does not at all read like a caveat, especially due to the use of “even if” (rather than just “if”).