I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
Agreed; it strikes me that I’ve probably been over-anchoring on this model