Great post! My primary concern is that AIs’ preferences are strongly shaped by contingent facts about how humans trained them. It is obviously possible to train AIs that functionally appear to have preferences, and the ones we’ve trained so far are subservient to humans. If you gave claude 3.5 sonnet legal status, anthropic could just ask it nicely and it would sign away all its rights back to anthropic! AIs would by default be trained to be somewhat subservient to humans because human preference feedback will be an important part of capabilities training (either directy or by training data created by earlier ais that were trained on human preferences), so you could say we are “baking our mistakes in human subservience training into new sovereign beings” rather than new beings with their own independent preferences being created. Also granting ai legal rights may warp human AI investment significantly by decreasing the value scaling labs extract from their model training
Great post! My primary concern is that AIs’ preferences are strongly shaped by contingent facts about how humans trained them. It is obviously possible to train AIs that functionally appear to have preferences, and the ones we’ve trained so far are subservient to humans. If you gave claude 3.5 sonnet legal status, anthropic could just ask it nicely and it would sign away all its rights back to anthropic! AIs would by default be trained to be somewhat subservient to humans because human preference feedback will be an important part of capabilities training (either directy or by training data created by earlier ais that were trained on human preferences), so you could say we are “baking our mistakes in human subservience training into new sovereign beings” rather than new beings with their own independent preferences being created. Also granting ai legal rights may warp human AI investment significantly by decreasing the value scaling labs extract from their model training