Increasing the amount of animal-friendly content that is likely to feature in AI training data
My understanding is that current AIs’ (professed) values are largely determined by RLHF, not by training data. Therefore it would be more effective to persuade the people in charge of RLHF policies to make them more animal-friendly.
But I have no idea whether RLHF will continue to be relevant as AI gets more powerful, or if RLHF affects AI’s actual values rather than merely its professed values.
My understanding is that current AIs’ (professed) values are largely determined by RLHF, not by training data. Therefore it would be more effective to persuade the people in charge of RLHF policies to make them more animal-friendly.
But I have no idea whether RLHF will continue to be relevant as AI gets more powerful, or if RLHF affects AI’s actual values rather than merely its professed values.