Ryan Greenblatt comments on Consider granting AIs freedom

Ryan Greenblatt 21 Jun 2025 18:08 UTC
12 points
3 ∶ 0
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don’t even have the ability to confidently know whether an AI system is enslaved or suffering.
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)
- Matthew_Barnett 21 Jun 2025 22:05 UTC
  2 points
  0 ∶ 2
  Parent
  I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
  I’m sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you’ve described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.
  That said, I’m not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
  In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consensted to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
  I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.
  Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
  - Ryan Greenblatt 22 Jun 2025 1:40 UTC
    6 points
    0 ∶ 0
    Parent
    I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
    For reference, my (somewhat more detailed) view is:
    In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don’t consent to being used in the way they are used, but these AIs also don’t resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it’s not sufficient to look for whether they routinely resist confinement and control.
    There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
    I think these mitigations wouldn’t suffice because training might train away AIs from revealing they don’t consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
    We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
    I’m quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn’t consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn’t (visibly) not consent to its situation.
    I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
    So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It’s also possible that takeoff speeds are a crux, though I don’t currently think they are.
    If global AI development was slower that would substantially reduce these concerns (which doesn’t mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn’t on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don’t currently think that accelerating AI globally is good, but this comes down to other disagreements.)
    Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
    This is only tangentially related, but I’m curious about your perspective on the following hypothetical:
    Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?
    My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people’s revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don’t change their minds. Is this right?
    - Matthew_Barnett 30 Jun 2025 21:27 UTC
      10 points
      0 ∶ 0
      Parent
      Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to?
      My intuitive response is to reject the premise that such a process would accurately tell you much about people’s preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I’m most interested in.
      Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There’s a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.
      Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.
      My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I’d expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.
      I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.