Four layers come to mind for me:
Have strong theoretical reasons to think your method of creating the system cannot result in something motivated to take dangerous actions
Inspect the system thoroughly after creation, before deployment, to make sure it looks as expected and appears incapable of making dangerous decisions
Deploy the system in an environment where it is physically incapable of doing anything dangerous
Monitor the internals of the system closely during deployment to ensure operation is as expected, and that no dangerous actions are attempted
Four layers come to mind for me:
Have strong theoretical reasons to think your method of creating the system cannot result in something motivated to take dangerous actions
Inspect the system thoroughly after creation, before deployment, to make sure it looks as expected and appears incapable of making dangerous decisions
Deploy the system in an environment where it is physically incapable of doing anything dangerous
Monitor the internals of the system closely during deployment to ensure operation is as expected, and that no dangerous actions are attempted