Regarding 2: I’m totally no expert but it seems to me that there are other ways of influencing the preferences/dispositions of AI—e.g., i) penalizing, say, malevolent or fanatical reasoning/behavior/attitudes (e.g., by telling RLHF raters to specifically look out for such properties and penalize them), or ii) similarly amending the principles and rules of constitutional AI.
Thanks Anthony!
Regarding 2: I’m totally no expert but it seems to me that there are other ways of influencing the preferences/dispositions of AI—e.g., i) penalizing, say, malevolent or fanatical reasoning/behavior/attitudes (e.g., by telling RLHF raters to specifically look out for such properties and penalize them), or ii) similarly amending the principles and rules of constitutional AI.