Curious what you think of the following objection: AI character work has to grapple with the question of whether we want AI systems to be âobedientâ or âethicalâ. Yet, it seems non-obvious which is better for the long-term future because training them to be ethical might make alignment risk worse, and training them to be obedient makes misuse and other risks worse. So if youâre uncertain about which is better â which I think you probably should be â then on expectation, the impact of affecting AI character is reduced proportional to how uncertain you are. I expect there are some ways you can affect AI character work that doesnât have this âobedience vs ethicalâ structure, but I also suspect that the argument made in this report about the importance of such work apply less to these interventions.
This is a real trade-off when thinking about what AI character should look like. I think that more work to think about that trade-off and where we should be along it, and ways to get the best of bothâthe biggest benefits of being ethical and of being obedientâis hugely valuable.
It may seem non-obvious at first which one of these two structures is better, but that is because we are considering AI Character as a set of very broad recommendations about very broad situations. If given substantively more information about these scenarios, a non-trivial number would likely have clearer resolutions to support.
I would guess that AI constitutionsâmuch like actual national constitutionsâwould likely be more fine grained and specific with their moral directives for AIs.
Naturally, thereâs a level of specificity that would solve fewer cases through clarity than it would hurt by imposing inapplicable rigid standards. But this sounds analogous to the actual legal systems that strive to iteratively resolve this exact issue. Legal tests, fictions, and precedent might find its analogies in the ethical âtrainingâ of future AIs.
Still, whether trained, prescribed/âdirected, or highly fine-grained, AI character would create a non-trivial effect on the decisions made in these cases.
Thanks for this post!
Curious what you think of the following objection: AI character work has to grapple with the question of whether we want AI systems to be âobedientâ or âethicalâ. Yet, it seems non-obvious which is better for the long-term future because training them to be ethical might make alignment risk worse, and training them to be obedient makes misuse and other risks worse. So if youâre uncertain about which is better â which I think you probably should be â then on expectation, the impact of affecting AI character is reduced proportional to how uncertain you are. I expect there are some ways you can affect AI character work that doesnât have this âobedience vs ethicalâ structure, but I also suspect that the argument made in this report about the importance of such work apply less to these interventions.
This is a real trade-off when thinking about what AI character should look like. I think that more work to think about that trade-off and where we should be along it, and ways to get the best of bothâthe biggest benefits of being ethical and of being obedientâis hugely valuable.
It may seem non-obvious at first which one of these two structures is better, but that is because we are considering AI Character as a set of very broad recommendations about very broad situations. If given substantively more information about these scenarios, a non-trivial number would likely have clearer resolutions to support.
I would guess that AI constitutionsâmuch like actual national constitutionsâwould likely be more fine grained and specific with their moral directives for AIs.
Naturally, thereâs a level of specificity that would solve fewer cases through clarity than it would hurt by imposing inapplicable rigid standards. But this sounds analogous to the actual legal systems that strive to iteratively resolve this exact issue. Legal tests, fictions, and precedent might find its analogies in the ethical âtrainingâ of future AIs.
Still, whether trained, prescribed/âdirected, or highly fine-grained, AI character would create a non-trivial effect on the decisions made in these cases.
Edit: If I understood her correctly, it seems like Amanda Askell says this is quite similar to how Claude was actually trained to follow its constitution