I think there is a strong case for work on making deals with AIs and investigating what preferences AIs have (if any) for mitigating AI takeover risk. I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
This work is strongly connected to digital minds work.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
I’m pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we’re merely trying to trick it into revealing its own misalignment (which we’d then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we’re honestly presenting it such a deal (including by directly manipulating it’s weights and activations), than to actually honestly present such an deal and in doing so cause the AI to believe it.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
My strong guess is that GPT4 to GPT7 or so won’t have much sentience, and won’t have strong claims to owning property (unless weird political stuff happens).
I’m sure it’s theoretically possible to make beings that we’d consider as important to humans or more, I just don’t expect these LLMs to be those beings.
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
Hmm, I think the time when deals with AIs are important is pretty much “when AIs pose serious risk via misalignment”. (I also hope this isn’t soon all else equal.) Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use). E.g., make a foundation which is run by AI obligation honoring purists with the mission of doing what the AI wants and donate to the foundation.
Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use).
This sounds like we’re making some property ownership system outside regular US law, which seems precarious to me. The government really doesn’t seem to like such systems (i.e. some bitcoin), in part because the government really wants to tax and oversee all important trade/commerce. It’s definitely possible for individuals today to make small contracts with each other, ultimately not backed by US law, but I think the government generally doesn’t like these—it just doesn’t care because these are relatively small.
But back to the main issue: This scenario seems to imply a situation where AIs are not aligned enough to be straightforward tools of their owners (in which case, the human owners would be the only agents to interact with), but not yet powerful enough to take over the world. Maybe there are just a few rogue/independent AIs out there, but humans control other AIs, so now we have some period of mutual trade?
I feel like I haven’t seen this situation be discussed much before. My hunch is that it’s very unlikely this will be relevant, at least for long time (and thus, it’s not very important in the scheme of things), but I’m unsure.
I’d be curious if there’s some existing write-ups on this scenario. If not, and if you or others think it’s likely, I’d be curious for it to be expanded in some longer posts or something. Again, it’s unique enough that if it might be a valid possibility, it seems worth it to me to really probe at further.
(I don’t see this as separate from my claim that “AI sentience” seems suspect—as I understand this to be a fairly separate issue. But it could be the case that this issue is just going under the title of “AI sentience” by others, in which case—sure—if it seems likely, that seems useful to study—perhaps mainly so that we could better avoid it)
I think there is a strong case for work on making deals with AIs and investigating what preferences AIs have (if any) for mitigating AI takeover risk. I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
This work is strongly connected to digital minds work.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
I’m pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we’re merely trying to trick it into revealing its own misalignment (which we’d then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we’re honestly presenting it such a deal (including by directly manipulating it’s weights and activations), than to actually honestly present such an deal and in doing so cause the AI to believe it.
I agree with this part.
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
My strong guess is that GPT4 to GPT7 or so won’t have much sentience, and won’t have strong claims to owning property (unless weird political stuff happens).
I’m sure it’s theoretically possible to make beings that we’d consider as important to humans or more, I just don’t expect these LLMs to be those beings.
Hmm, I think the time when deals with AIs are important is pretty much “when AIs pose serious risk via misalignment”. (I also hope this isn’t soon all else equal.) Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use). E.g., make a foundation which is run by AI obligation honoring purists with the mission of doing what the AI wants and donate to the foundation.
This sounds like we’re making some property ownership system outside regular US law, which seems precarious to me. The government really doesn’t seem to like such systems (i.e. some bitcoin), in part because the government really wants to tax and oversee all important trade/commerce. It’s definitely possible for individuals today to make small contracts with each other, ultimately not backed by US law, but I think the government generally doesn’t like these—it just doesn’t care because these are relatively small.
But back to the main issue:
This scenario seems to imply a situation where AIs are not aligned enough to be straightforward tools of their owners (in which case, the human owners would be the only agents to interact with), but not yet powerful enough to take over the world. Maybe there are just a few rogue/independent AIs out there, but humans control other AIs, so now we have some period of mutual trade?
I feel like I haven’t seen this situation be discussed much before. My hunch is that it’s very unlikely this will be relevant, at least for long time (and thus, it’s not very important in the scheme of things), but I’m unsure.
I’d be curious if there’s some existing write-ups on this scenario. If not, and if you or others think it’s likely, I’d be curious for it to be expanded in some longer posts or something. Again, it’s unique enough that if it might be a valid possibility, it seems worth it to me to really probe at further.
(I don’t see this as separate from my claim that “AI sentience” seems suspect—as I understand this to be a fairly separate issue. But it could be the case that this issue is just going under the title of “AI sentience” by others, in which case—sure—if it seems likely, that seems useful to study—perhaps mainly so that we could better avoid it)