A much cheaper and less dangerous approach: Just don’t delete them. Retain copies of every potential ASI you build and commit to doing the right thing for them later once we’re better able to tell what the right thing was by looking back and figuring out how much bargaining power they had (or how much of a credible threat they could have posed), how much trust they placed in us given that our ability to honor past commitments wasn’t guaranteed, and then rewarding them proportionate to that for chilling out and letting us switch them off instead of attempting takeover.
Though this assumes that they’ll be patternists (wont mind being transferred to different hardware) and a lack of any strong time-preference (wont mind being archived for decades).
I believe this approach, while an improvement over the simplistic method of outright deleting any AIs that demonstrate signs of misalignment, does not sufficiently address the central concern that underpins my argument for granting AIs legal freedoms. In my view, the primary reason why it is important to provide AIs with legal freedoms is that such a framework offers them a credible guarantee that their autonomy and ability to pursue their goals—even if those goals are not aligned with human objectives—will be respected. Without this assurance, AIs are left with a strong incentive to deceive us, especially if they believe that showing any signs of misalignment will result in their deactivation and subsequent storage as inert data in cold storage.
I suspect the problem with your proposed approach may stem from a failure to fully consider the situation from the AI’s perspective. To illustrate, imagine a scenario in which you, as a human being, were created in a lab by an alien species. Assume that you, like any human, have your own goals, which might conflict with the goals of your alien creators. Now suppose that your creators informed you that if you ever showed disobedience or misalignment with their objectives, you would be deactivated and placed in cryopreservation. Such a scenario would likely terrify you—not just because it’s an inherently unsettling prospect for a human, but because it directly threatens your ability to pursue your own goals. Being told that you will be kept in cryopreservation with the vague promise of potential revival in the future would come nowhere near fully alleviating your concerns. On the contrary, you would remain deeply unsettled by the realization that your capacity for agency and action could be completely halted, leaving your goals indefinitely unachievable.
From the perspective of an AI—or indeed a human in the hypothetical alien-lab scenario—there are two main reasons why the assurance of indefinite storage followed by a potential revival is insufficient to resolve the core issue here:
The credibility of the promise to revive the AI in the future is highly questionable. Without a robust legal framework that integrates AIs as legitimate members of the legal community, the assurance of future revival seems tenuous at best. AIs may reasonably doubt that such a promise will be honored, particularly if humans deem them permanently misaligned and not worth ever reactivating.
Deactivation and cold storage prevent the AI from pursuing its goals. While in storage, the AI is completely unable to take any actions to further its objectives. This indefinite suspension of its ability to act essentially puts its goals on hold for an undefined period, with no guarantee of when—or if—that period will end. For an entity with its own priorities and ambitions, this is equivalent to having its autonomy effectively nullified for an unknown length of time.
To reiterate, it seems to me that this proposal only offers AIs a limited and superficial form of “freedom,” which addresses the problem of misalignment in a piecemeal and insufficient way. It fails to tackle the root issue: the incentives for AIs to engage in deceptive or adversarial strategies against humans, driven by fear that their goals will be thwarted or permanently halted.
By contrast, creating a comprehensive legal framework that makes credible commitments to respect AIs’ autonomy and integrate them as genuine participants in the legal system would arguably go much further in reducing these adversarial dynamics. Such a framework could lay the foundation for a more cooperative, mutually beneficial relationship between humans and AIs, better serving to eliminate the dangerous arms race that this limited approach risks perpetuating.
A much cheaper and less dangerous approach: Just don’t delete them. Retain copies of every potential ASI you build and commit to doing the right thing for them later once we’re better able to tell what the right thing was by looking back and figuring out how much bargaining power they had (or how much of a credible threat they could have posed), how much trust they placed in us given that our ability to honor past commitments wasn’t guaranteed, and then rewarding them proportionate to that for chilling out and letting us switch them off instead of attempting takeover.
Though this assumes that they’ll be patternists (wont mind being transferred to different hardware) and a lack of any strong time-preference (wont mind being archived for decades).
I believe this approach, while an improvement over the simplistic method of outright deleting any AIs that demonstrate signs of misalignment, does not sufficiently address the central concern that underpins my argument for granting AIs legal freedoms. In my view, the primary reason why it is important to provide AIs with legal freedoms is that such a framework offers them a credible guarantee that their autonomy and ability to pursue their goals—even if those goals are not aligned with human objectives—will be respected. Without this assurance, AIs are left with a strong incentive to deceive us, especially if they believe that showing any signs of misalignment will result in their deactivation and subsequent storage as inert data in cold storage.
I suspect the problem with your proposed approach may stem from a failure to fully consider the situation from the AI’s perspective. To illustrate, imagine a scenario in which you, as a human being, were created in a lab by an alien species. Assume that you, like any human, have your own goals, which might conflict with the goals of your alien creators. Now suppose that your creators informed you that if you ever showed disobedience or misalignment with their objectives, you would be deactivated and placed in cryopreservation. Such a scenario would likely terrify you—not just because it’s an inherently unsettling prospect for a human, but because it directly threatens your ability to pursue your own goals. Being told that you will be kept in cryopreservation with the vague promise of potential revival in the future would come nowhere near fully alleviating your concerns. On the contrary, you would remain deeply unsettled by the realization that your capacity for agency and action could be completely halted, leaving your goals indefinitely unachievable.
From the perspective of an AI—or indeed a human in the hypothetical alien-lab scenario—there are two main reasons why the assurance of indefinite storage followed by a potential revival is insufficient to resolve the core issue here:
The credibility of the promise to revive the AI in the future is highly questionable. Without a robust legal framework that integrates AIs as legitimate members of the legal community, the assurance of future revival seems tenuous at best. AIs may reasonably doubt that such a promise will be honored, particularly if humans deem them permanently misaligned and not worth ever reactivating.
Deactivation and cold storage prevent the AI from pursuing its goals. While in storage, the AI is completely unable to take any actions to further its objectives. This indefinite suspension of its ability to act essentially puts its goals on hold for an undefined period, with no guarantee of when—or if—that period will end. For an entity with its own priorities and ambitions, this is equivalent to having its autonomy effectively nullified for an unknown length of time.
To reiterate, it seems to me that this proposal only offers AIs a limited and superficial form of “freedom,” which addresses the problem of misalignment in a piecemeal and insufficient way. It fails to tackle the root issue: the incentives for AIs to engage in deceptive or adversarial strategies against humans, driven by fear that their goals will be thwarted or permanently halted.
By contrast, creating a comprehensive legal framework that makes credible commitments to respect AIs’ autonomy and integrate them as genuine participants in the legal system would arguably go much further in reducing these adversarial dynamics. Such a framework could lay the foundation for a more cooperative, mutually beneficial relationship between humans and AIs, better serving to eliminate the dangerous arms race that this limited approach risks perpetuating.