I just wanna say, if that’s the best you can do for “EA is deceptive” then it seems like EA is significantly less deceptive than the average social movement, corporation, university, government, etc.
As for misaligned, yes definitely, in the way you mention. This is true of pretty much all human groups though, e.g. social movements, corporations, universities, governments. The officially stated goals and metrics that are officially optimized for would cause existential catastrophe if optimized at superintelligent levels.
I think this is probably true because it is true of all organizations. The problem seems to be EA is reluctant to define its membership and power structure, and as a result these are left up to various default incentive structures, which seem likely to produce misalignment.
I am strong up-voting and commenting to bring this to people’s attention.
A snappier title might really help broaden its appeal as well.
I think it’s too easy for someone to skim this entire post and still completely miss the headline “this is strong empirical evidence that mesa-optimizers are real in practice”.
I’ll construct 2 scenarios of where EA mesa-objectives would likely conflict with reality, and conditional on this, I expect the EA community to learn deceptive alignment with >50% probability:
Moral realism is correct, but the correct theory of ethics is non-utilitarian. Specifically, moral realism is the claim that there are mind independent facts of morality, similar to how reality is today mind independent. There is a fact of the matter on morality.
Bluntly, EA is a numbers movement. And only utilitarianism endorses using numbers. So if dentology or virtue ethics were right, I do not expect EA to be aligned to it, and instead become deceptively aligned.
Moral anti-realism is correct, that is there is no fact of the matter of which morality is correct, and everything is subjective. That is, if people are disagreeing on values, both sides are right in their own view, and that’s it. There is no moral reality here.
Again, I expect failure to transmit that fact to the public, admittedly this time EA doesn’t need to justify it’s values, nor does anybody else, but I do expect EA to put a front of being objective truth even if it isn’t.
Do I view every problem in my life and my community as analogous or bearing surprising similarity to the alignment problem
This made me laugh.
But also, as I said at the top of the post, I actually do think the alignment problem does bear surprising similarities to other things, but this is mainly because of general ideas about complex systems pertain to both.
I just wanna say, if that’s the best you can do for “EA is deceptive” then it seems like EA is significantly less deceptive than the average social movement, corporation, university, government, etc.
As for misaligned, yes definitely, in the way you mention. This is true of pretty much all human groups though, e.g. social movements, corporations, universities, governments. The officially stated goals and metrics that are officially optimized for would cause existential catastrophe if optimized at superintelligent levels.
I think this is probably true because it is true of all organizations. The problem seems to be EA is reluctant to define its membership and power structure, and as a result these are left up to various default incentive structures, which seem likely to produce misalignment.
I am strong up-voting and commenting to bring this to people’s attention.
A snappier title might really help broaden its appeal as well.
Hope this helps!
I think the answer is yes, primarily because I do think this is an effective strategy to do much of anything in the real world.
Here’s a link:
https://www.lesswrong.com/posts/firtXAWGdvzXYAh9B/paper-transformers-learn-in-context-by-gradient-descent
And here’s a comment from the post.
I’ll construct 2 scenarios of where EA mesa-objectives would likely conflict with reality, and conditional on this, I expect the EA community to learn deceptive alignment with >50% probability:
Moral realism is correct, but the correct theory of ethics is non-utilitarian. Specifically, moral realism is the claim that there are mind independent facts of morality, similar to how reality is today mind independent. There is a fact of the matter on morality.
Bluntly, EA is a numbers movement. And only utilitarianism endorses using numbers. So if dentology or virtue ethics were right, I do not expect EA to be aligned to it, and instead become deceptively aligned.
Moral anti-realism is correct, that is there is no fact of the matter of which morality is correct, and everything is subjective. That is, if people are disagreeing on values, both sides are right in their own view, and that’s it. There is no moral reality here.
Again, I expect failure to transmit that fact to the public, admittedly this time EA doesn’t need to justify it’s values, nor does anybody else, but I do expect EA to put a front of being objective truth even if it isn’t.
[deleted]
This made me laugh.
But also, as I said at the top of the post, I actually do think the alignment problem does bear surprising similarities to other things, but this is mainly because of general ideas about complex systems pertain to both.