Alignment to specific values is underrated in research relative to control
I’m unsure how broadly to interpret “specific values”. If it’s values such as democracy or equality, then both values and control are overrated.
By specific values we mean any particular goal we want AIs to pursue besides deferrence to humans. So democracy and equality would both count, as would goals like harm reduction or utilitarianism
I’m unsure how broadly to interpret “specific values”. If it’s values such as democracy or equality, then both values and control are overrated.
By specific values we mean any particular goal we want AIs to pursue besides deferrence to humans. So democracy and equality would both count, as would goals like harm reduction or utilitarianism