[Edit: I’m not an expert in digital minds at all, don’t take too much from this!]
Minor point, but I’m still really not convinced by research into digital minds. My personal model is that most of that can be figured out post-AGI, and we first should model epistemic-lock-in, before we are worried about it in specific scenarios. Would be happy to see writing explaining the expected value! (That said, I’m not really making decisions in this area, just a passing observer)
Update: I see above that Oliver seems to really be bought into it, which I didn’t realize. I would flag that I’m very uncertain here—there definitely could be some strong reasons I’m not aware of.
Also, this isn’t “me not liking weird things”—I’m a big fan of weird research, just not all weird research (and I’m not even sure this area seems that weird now)
My personal model is that most of that can be figured out post-AGI
One could also have argued for figuring out farmed animal welfare after cheap animal food (produced in factory-farms) is widely available? Now that lots of people are eating factory-farmed animals, it is harder to role back factory-farming.
Not sure if this helps, but I currently believe: 1. Relatively little or no AI suffering will happen, pre-AGI. 2. There’s not going to actually be much lock-in on this, post-AGI. 3. When we get to AGI, we’ll gain much better abilities to reason through these questions. (making it different from the “figuring out animal welfare” claim.
Commenting just to encourage you to make this its own post. I haven’t seen a (recent) standalone post about this topic, it seems important, and though I imagine many people are following this comment section it also seems easy for this discussion to get lost and for people with relevant opinions to miss it/not engage because it’s off-topic.
Apparently there will be a debate week about this soon! I hope that that covers territory similar to what I’m thinking (which I assumed was fairly basic). It’s very possible I’ll be convinced to the other side, I look forward to the discussion.
I might write a short post if it seems useful then.
Some quick takes on this from me: I agree with 2 and 3, but it’s worth noting that “post-AGI” might be “2 years after AGI while there is a crazy singularity on going and vast amounts of digital minds”.
I think as stated, (1) seems about 75% likely to me, which is not hugely reassuring. Further, I think there is a critical time you’re not highlighting: a time when AGI exists but humans are still (potentially) in control and society looks similar to now.
I think there is a strong case for work on making deals with AIs and investigating what preferences AIs have (if any) for mitigating AI takeover risk. I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
This work is strongly connected to digital minds work.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
I’m pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we’re merely trying to trick it into revealing its own misalignment (which we’d then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we’re honestly presenting it such a deal (including by directly manipulating it’s weights and activations), than to actually honestly present such an deal and in doing so cause the AI to believe it.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
My strong guess is that GPT4 to GPT7 or so won’t have much sentience, and won’t have strong claims to owning property (unless weird political stuff happens).
I’m sure it’s theoretically possible to make beings that we’d consider as important to humans or more, I just don’t expect these LLMs to be those beings.
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
Hmm, I think the time when deals with AIs are important is pretty much “when AIs pose serious risk via misalignment”. (I also hope this isn’t soon all else equal.) Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use). E.g., make a foundation which is run by AI obligation honoring purists with the mission of doing what the AI wants and donate to the foundation.
Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use).
This sounds like we’re making some property ownership system outside regular US law, which seems precarious to me. The government really doesn’t seem to like such systems (i.e. some bitcoin), in part because the government really wants to tax and oversee all important trade/commerce. It’s definitely possible for individuals today to make small contracts with each other, ultimately not backed by US law, but I think the government generally doesn’t like these—it just doesn’t care because these are relatively small.
But back to the main issue: This scenario seems to imply a situation where AIs are not aligned enough to be straightforward tools of their owners (in which case, the human owners would be the only agents to interact with), but not yet powerful enough to take over the world. Maybe there are just a few rogue/independent AIs out there, but humans control other AIs, so now we have some period of mutual trade?
I feel like I haven’t seen this situation be discussed much before. My hunch is that it’s very unlikely this will be relevant, at least for long time (and thus, it’s not very important in the scheme of things), but I’m unsure.
I’d be curious if there’s some existing write-ups on this scenario. If not, and if you or others think it’s likely, I’d be curious for it to be expanded in some longer posts or something. Again, it’s unique enough that if it might be a valid possibility, it seems worth it to me to really probe at further.
(I don’t see this as separate from my claim that “AI sentience” seems suspect—as I understand this to be a fairly separate issue. But it could be the case that this issue is just going under the title of “AI sentience” by others, in which case—sure—if it seems likely, that seems useful to study—perhaps mainly so that we could better avoid it)
[Edit: I’m not an expert in digital minds at all, don’t take too much from this!]
Minor point, but I’m still really not convinced by research into digital minds. My personal model is that most of that can be figured out post-AGI, and we first should model epistemic-lock-in, before we are worried about it in specific scenarios. Would be happy to see writing explaining the expected value! (That said, I’m not really making decisions in this area, just a passing observer)
Update: I see above that Oliver seems to really be bought into it, which I didn’t realize. I would flag that I’m very uncertain here—there definitely could be some strong reasons I’m not aware of.
Also, this isn’t “me not liking weird things”—I’m a big fan of weird research, just not all weird research (and I’m not even sure this area seems that weird now)
Hi Ozzie,
One could also have argued for figuring out farmed animal welfare after cheap animal food (produced in factory-farms) is widely available? Now that lots of people are eating factory-farmed animals, it is harder to role back factory-farming.
Not sure if this helps, but I currently believe:
1. Relatively little or no AI suffering will happen, pre-AGI.
2. There’s not going to actually be much lock-in on this, post-AGI.
3. When we get to AGI, we’ll gain much better abilities to reason through these questions. (making it different from the “figuring out animal welfare” claim.
Commenting just to encourage you to make this its own post. I haven’t seen a (recent) standalone post about this topic, it seems important, and though I imagine many people are following this comment section it also seems easy for this discussion to get lost and for people with relevant opinions to miss it/not engage because it’s off-topic.
Apparently there will be a debate week about this soon! I hope that that covers territory similar to what I’m thinking (which I assumed was fairly basic). It’s very possible I’ll be convinced to the other side, I look forward to the discussion.
I might write a short post if it seems useful then.
Some quick takes on this from me: I agree with 2 and 3, but it’s worth noting that “post-AGI” might be “2 years after AGI while there is a crazy singularity on going and vast amounts of digital minds”.
I think as stated, (1) seems about 75% likely to me, which is not hugely reassuring. Further, I think there is a critical time you’re not highlighting: a time when AGI exists but humans are still (potentially) in control and society looks similar to now.
I think there is a strong case for work on making deals with AIs and investigating what preferences AIs have (if any) for mitigating AI takeover risk. I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
This work is strongly connected to digital minds work.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
I’m pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we’re merely trying to trick it into revealing its own misalignment (which we’d then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we’re honestly presenting it such a deal (including by directly manipulating it’s weights and activations), than to actually honestly present such an deal and in doing so cause the AI to believe it.
I agree with this part.
I’m hoping this doesn’t happen anytime soon. This assumes that AI would themselves own property and be seen as having legal persons or similar.
My strong guess is that GPT4 to GPT7 or so won’t have much sentience, and won’t have strong claims to owning property (unless weird political stuff happens).
I’m sure it’s theoretically possible to make beings that we’d consider as important to humans or more, I just don’t expect these LLMs to be those beings.
Hmm, I think the time when deals with AIs are important is pretty much “when AIs pose serious risk via misalignment”. (I also hope this isn’t soon all else equal.) Even if such AIs have absolutely no legal rights at this time, it still seems like we can make deals with them and give them assets (at least assets they will eventually be able to use). E.g., make a foundation which is run by AI obligation honoring purists with the mission of doing what the AI wants and donate to the foundation.
This sounds like we’re making some property ownership system outside regular US law, which seems precarious to me. The government really doesn’t seem to like such systems (i.e. some bitcoin), in part because the government really wants to tax and oversee all important trade/commerce. It’s definitely possible for individuals today to make small contracts with each other, ultimately not backed by US law, but I think the government generally doesn’t like these—it just doesn’t care because these are relatively small.
But back to the main issue:
This scenario seems to imply a situation where AIs are not aligned enough to be straightforward tools of their owners (in which case, the human owners would be the only agents to interact with), but not yet powerful enough to take over the world. Maybe there are just a few rogue/independent AIs out there, but humans control other AIs, so now we have some period of mutual trade?
I feel like I haven’t seen this situation be discussed much before. My hunch is that it’s very unlikely this will be relevant, at least for long time (and thus, it’s not very important in the scheme of things), but I’m unsure.
I’d be curious if there’s some existing write-ups on this scenario. If not, and if you or others think it’s likely, I’d be curious for it to be expanded in some longer posts or something. Again, it’s unique enough that if it might be a valid possibility, it seems worth it to me to really probe at further.
(I don’t see this as separate from my claim that “AI sentience” seems suspect—as I understand this to be a fairly separate issue. But it could be the case that this issue is just going under the title of “AI sentience” by others, in which case—sure—if it seems likely, that seems useful to study—perhaps mainly so that we could better avoid it)