I’m not convinced that aligning with a single human is much harder than aligning with a group of humans that have diverse and partly conflicting interests.
I did not claim that aligning with a single human is harder than aligning with a group of humans (nor have I claimed that others believe that). I have probably expressed myself poorly, if that was the impression after reading my comment. In fact, I believe the opposite!
Let my make another attempt at explaining.
A: Figuring out how to align an AGI with a single human.
B: Figuring out how to align an AGI with a group of human.
C: Doing B after you have completed A.
Then, for the difficulties of these, I currently believe
all three of A, B, C are hard
B is harder than A
B is harder than C
A is much harder than C (this was what I was trying to state in the comment above)
A reasonable strategy for doing B would be to do A, and then do C (I am not super confident here, and things might be much more complex)
If you do both A and C, it is better to first focus on A (and put more resources into it), because A is harder than C.
I would be curious what other people think. My current guess would be that at least some alignment researchers believe these (or a part of these) points too. I do not recall hearing opposing viewpoints.
I do not believe that, for example, the author of the PreDCA alignment proposal wishes that the values of a random human are imposed (via AGI) on the rest of humanity, even though PreDCA (currently) is a protocol that aligns AGI with a single human (called “user”).
I’d agree that A, B, and C seem hard; that B is harder than A, and that B is harder than C.
Where we disagree is that I suspect that C is harder than A, for basic game-theoretic reasons I mentioned in the original post.
I’m also not confident that C is a whole lot easier than B—I’m not sure that alignment with individual humans will actually give us all that much help in doing alignment with complicated groups of humans.
But, I need to think further about this, and do some more readings!
I did not claim that aligning with a single human is harder than aligning with a group of humans (nor have I claimed that others believe that). I have probably expressed myself poorly, if that was the impression after reading my comment. In fact, I believe the opposite!
Let my make another attempt at explaining.
A: Figuring out how to align an AGI with a single human.
B: Figuring out how to align an AGI with a group of human.
C: Doing B after you have completed A.
Then, for the difficulties of these, I currently believe
all three of A, B, C are hard
B is harder than A
B is harder than C
A is much harder than C (this was what I was trying to state in the comment above)
A reasonable strategy for doing B would be to do A, and then do C (I am not super confident here, and things might be much more complex)
If you do both A and C, it is better to first focus on A (and put more resources into it), because A is harder than C.
I would be curious what other people think. My current guess would be that at least some alignment researchers believe these (or a part of these) points too. I do not recall hearing opposing viewpoints.
I do not believe that, for example, the author of the PreDCA alignment proposal wishes that the values of a random human are imposed (via AGI) on the rest of humanity, even though PreDCA (currently) is a protocol that aligns AGI with a single human (called “user”).
Hi harfe, thanks for this helpful clarification.
I’d agree that A, B, and C seem hard; that B is harder than A, and that B is harder than C.
Where we disagree is that I suspect that C is harder than A, for basic game-theoretic reasons I mentioned in the original post.
I’m also not confident that C is a whole lot easier than B—I’m not sure that alignment with individual humans will actually give us all that much help in doing alignment with complicated groups of humans.
But, I need to think further about this, and do some more readings!