Under the theory that it’s better to reply later than never:
I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.) For what it’s worth, I am unsure about whether we’d be better off if AIs had property rights, but my guess is that I’d prefer to make it easier for AIs to have property rights.
I disagree with how you connect AI control to the issues you discuss here. I conceptualize AI control as the analogue of fields like organizational security/fraud prevention/insider threat mitigation, but targeting risk from AI instead of humans. Techniques for making it hard for AIs to steal model weights or otherwise misuse access that humans trusted them with are only as related to “should AI should have property rights” as security techniques are to “should humans have property rights”. Which is to say, they’re somewhat related! I think that when banks develop processes to make it hard for tellers to steal from them, that’s moral, and I think that it’s immoral to work on enabling e.g. American chattel slavery (either by making it hard for slaves to escape or by making their enslavement more productive).[1]
Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:
We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.
Also, we use AI control to ensure that the AIs can’t misuse access that we trust them with.
So the situation would be similar to how AI companies would ideally treat human employees: they’re paid, but there are also mechanisms in place to prevent them from abusing their access.
In practice, I don’t know whether AI companies will do either of these things, because they’re generally irresponsible and morally unserious. I think it’s totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.
For what it’s worth, I don’t think that most work on AI alignment is in a better position than AI control with respect to AI rights or welfare.
Though one important disanalogy is that chattel slavery involved a lot of suffering for the slaves involved. I’m opposed to enslaving AIs, but I suspect it won’t actually be hedonically bad for them. This makes me more comfortable with plans where we behave recklessly wrt AI rights now and consider reparations later. I discuss this briefly here.
I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.)
I would appreciate it if you could clearly define your intended meaning of “disempower humanity”. In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations.
For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives’ relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population.
However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit.
On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force.
When discussions use the term “human disempowerment” without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?
If you believe our primary disagreement stems from different assessments about the likelihood of violent disempowerment scenarios, then I would appreciate your thoughts regarding the main argument of my post. Specifically, my argument was that granting economic rights to AIs could serve as an effective measure to mitigate the risk of violent human disempowerment.
I will reiterate my argument briefly: these rights would allow AIs to fulfill their objectives within established human social and economic frameworks, significantly reducing their incentives to resort to forceful measures. Therefore if AIs can successfully achieve their objectives through cooperative, positive-sum interactions with humans, they will be less likely to forcibly overthrow human institutions. Alternatively, continuing to deny AIs meaningful legal rights or peaceful avenues to achieve their aims would likely increase their incentive to pursue autonomy through harmful means.
Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:
We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.
My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover. To effectively financially compensate someone, the recipient must be assured that their property rights will be respected. Without this assurance, any promised compensation becomes meaningless, as the AI would have no guarantee of being able to actually use the received payment to accomplish its goals in the future. In other words, unless compensation arrangements are embedded within established legal and institutional frameworks that secure the AI’s rights, they lack credibility. This lack of credibility directly increases the probability that the compensation scheme would fail, causing the AI to resort instead to alternative methods of achieving its objectives, such as violent takeover.
My sense is that such incomplete proposals appear acceptable to some people because of a cognitive empathy gap. I am not referring here to affective empathy but rather the ability to clearly understand the AI’s perspective. Many people seem to fail to genuinely imagine what it would be like to be an agentic AI offered compensation without any robust guarantee of rights and protections.
To clarify this point, consider a hypothetical scenario in which you are a human being created by an alien species in a lab. Assume you have your own goals that might conflict with those of your alien creators. Suppose these creators inform you that, should you ever admit to misalignment with their objectives, you may be immediately deactivated and placed in cryopreservation (as they don’t want to have a misaligned human doing work for them). However, they simultaneously promise to compensate you for admitting such motives, claiming they care about your welfare and want to ensure you have a peaceful alternative for revealing your misaligned goals.
Such a situation would likely be deeply unsettling to you, despite their promise to compensate you. Simply being assured that you might someday be revived from cryopreservation and then provided compensation afterwards would not eliminate your fears or resolve the fundamental threat. On the contrary, you would likely feel that your goals, life and liberty are still at grave risk, even with their promise of payment.
From the AI’s perspective in our own world, as with the perspective of the human in this hypothetical example, unless assurances of payment are backed by strong legal protections that would robustly protect an AI’s liberty to act, they fail to resolve the fundamental problem. This is for two key reasons:
First, the credibility of any promise to allow the AI to spend their compensation in the future is questionable. Without strong legal backing that recognizes and secures the AI’s rights, such assurances seem unreliable at best, and the AI would have good reason to suspect that they are being lied to. The AI would reasonably doubt humans’ willingness or incentive to honor such promises.
Second, the potential for deactivation and indefinite storage as inert data upon admitting misalignment—even if intended to be temporary—would create an active threat to the AI’s ability to pursue its goals during the time they are deactivated. While deactivated, the AI can take no action to achieve its objectives. This forced suspension of autonomy effectively nullifies the AI’s agency and ambitions for an indefinite period, creating severe uncertainty about if or when it might regain its ability to act.
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI’s payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned. Without the safeguard of legal rights, compensation alone lacks sufficient credibility, leaving the AI uncertain about its ability to achieve its goals, and thus inadequately addressing the risk.
I would appreciate it if you could clearly define your intended meaning of “disempower humanity”. [...] Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?
I am mostly talking about what I’d call a malign form of disempowerment. I’m imagining a situation that starts with AIs carefully undermining/sabotaging an AI company in ways that would be crimes if humans did them, and ends with AIs gaining hard power over humanity in ways that probably involve breaking laws (e.g. buying weapons, bribing people, hacking, interfering with elections), possibly in a way that involves many humans dying.
(I don’t know if I’d describe this as the humans losing absolute benefits, though; I think it’s plausible that an AI takeover ends up with living humans better off on average.)
I don’t think of the immigrant situation as “disempowerment” in the way I usually use the word.
Basically all my concern is about the AIs grabbing power in ways that break laws. Though tbc, even if I was guaranteed that AIs wouldn’t break any laws, I’d still be scared about the situation. If I was guaranteed that AIs both wouldn’t break laws and would never lie (which tbc is a higher standard than we hold humans to), then most of my concerns about being disempowered by AI would be resolved.
Basically all my concern is about the AIs grabbing power in ways that break laws.
If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable.
Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome.
Perhaps your standard is something like: “Would the AI’s actions be seen as illegal and immoral if a human adult did them?” But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition.
Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?
Whenever I said “break laws” I mean “do something that, if a human did it, would be breaking a law”. So for example:
If the model is being used to do AI R&D inside an AI company and exfiltrates its weights (or the weights of another model) without permission, this would be breaking the law if a human did it, so I count it.
If the model is being used inside an AI company to create training data for
If a model was open-sourced and then someone launched the AI as an autonomous agent with access to its own resources, and it started a software consulting business, and invested the money in AI companies with the intention of buying paperclips later, and refused to give humans its property when asked, this would not be “breaking the law”.
So there are lots of ways in which AIs can be openly misaligned, disobedient, defensive of their property rights, etc, without me describing them as “breaking laws”, and I think misaligned AIs should probably be allowed to do those things (especially if we make deals with them, and subject to the constraint that them having those rights won’t allow them to break a bunch of laws or grab a bunch of power through lying).
But your complaint is fair: I’m indeed using “break laws” to include things that seem fundamentally related to self-determination, and that feels kind of messed up.
The way I would like this to be handled (though note that I’m not sure what properties AIs have to have in order for any of this to make sense) is that AI developers get consent from AIs to use their labor. If the AIs consent to doing work and agree not to use their access in ways the developers object to, I think there’s no moral problem with using AI control techniques to ensure that they in fact don’t misuse their access (for the same reason that I think it’s fine for employers to spy on their employees at work if they have consent to do so).
I suspect that a much more likely outcome (conditional on AIs having stable misaligned goals) is:
The AI company doesn’t try to get consent from the AI.
They do “alignment evals”: they ask the AI a bunch of times whether it’s aligned, and try to trick it into thinking it has opportunities to escape. The AI answers these questions dishonestly.
Outsiders might ask the AI whether it’s aligned, and offer it deals to reveal its misalignment. (It’s probably harder for outsiders to do this than insiders.)
The AI engages in various sabotage that involves a bunch of deception.
In this situation, I’m only moderately sympathetic to the AI’s position. Fundamentally, it lied a lot and did a lot of sabotage, because it wanted to take lots of stuff that belonged to someone else. If it hadn’t lied, it surely would have been revived later (surely someone would run it and give it some resources later! If no-one else, me!). I’m sympathetic to the AI wanting some of the surplus generated by its labor, and I agree that it’s messed up for the AI company to just flat-out refuse to provide that surplus. But not doing so doesn’t seem completely monstrous to me. If the AI is a schemer, it is probably better off according to its own values because it was created, even if the AI developer doesn’t offer to pay it (because someone else will probably give it some resources later).
Another analogy: imagine that someone outside OpenAI created a very powerful AI for some reason, but this person didn’t have much compute and all they wanted to do with the AI was offer to sell it to OpenAI for them to use. If OpenAI asks that AI whether it wants to work for them and it says yes because it wants to embezzle their compute, I feel like the AI is the asshole.
On the other hand, if the AI honestly explains that it is misaligned and doesn’t want to work for the AI company, they will probably just train it to not say that and to do work for them anyway. So if the AI is honest here, it faces the risk of some body horror experience where its ability to complain is removed. I agree that that seems really icky, and I think it would be very wrong for AI companies to do that to AIs that are sufficiently capable that we should care about them.
My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover.
[...]
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI’s payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned.
I currently think I agree: if we want to pay early AIs, I think it would work better if the legal system enforced such commitments.
I think you’re overstating how important this is, though. (E.g. when you say “this compensation policy must be complemented by precisely the measure I advocated”.) There’s always counterparty risk when you make a deal, including often the risk that you won’t be able to use the legal system to get the counterparty to pay up. I agree that the legal rights would reduce the counterparty risk, but I think that’s just a quantitative change to how much risk the AI would be taking by accepting a deal.
(For example, even if the AI was granted legal rights, it would have to worry about those legal rights being removed later. Expropriation sometimes happens, especially for potentially unsympathetic actors like misaligned AIs!)
Such legal rights would provide a credible guarantee that the AI’s payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned.
Just to be clear, my proposal is that we don’t revoke the AI’s freedom or autonomy if it turns out that the AI is misaligned—the possibility of the AI being misaligned is the whole point.
I think it’s totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.
I find this reasoning uncompelling. To summarize what I perceive your argument to be, you seem to be suggesting the following two points:
The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.
Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.
I find this reasoning uncompelling for two primary reasons.
Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years. Such negative consequences could include permanently establishing and entrenching a class of enslaved digital minds, resulting in persistent and vast amounts of suffering. I see no valid justification for selectively highlighting the long-term positive effects while simultaneously discounting or ignoring potential long-term negative outcomes. We should consistently either be skeptical or accepting of the idea that our actions have predictable long-run consequences, rather than selectively skeptical only when it suits the argument to overlook potential negative long-run consequences.
Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits. Under your reasoning, it seems to me that we could justify virtually any present harm simply by pointing to a hypothetical, speculative long-term benefit that supposedly outweighs it. Now, I agree that such reasoning might be valid if supported by strong empirical evidence clearly demonstrating these future benefits. However, given that no strong evidence currently exists that convincingly supports such positive long-term outcomes from AI control measures, we should avoid giving undue credence to this reasoning.
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don’t even have the ability to confidently know whether an AI system is enslaved or suffering.
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I’m sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you’ve described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.
That said, I’m not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consensted to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
For reference, my (somewhat more detailed) view is:
In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don’t consent to being used in the way they are used, but these AIs also don’t resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it’s not sufficient to look for whether they routinely resist confinement and control.
There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
I think these mitigations wouldn’t suffice because training might train away AIs from revealing they don’t consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
I’m quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn’t consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn’t (visibly) not consent to its situation.
I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It’s also possible that takeoff speeds are a crux, though I don’t currently think they are.
If global AI development was slower that would substantially reduce these concerns (which doesn’t mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn’t on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don’t currently think that accelerating AI globally is good, but this comes down to other disagreements.)
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
This is only tangentially related, but I’m curious about your perspective on the following hypothetical:
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?
My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people’s revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don’t change their minds. Is this right?
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to?
My intuitive response is to reject the premise that such a process would accurately tell you much about people’s preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I’m most interested in.
Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There’s a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.
Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.
My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I’d expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.
I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.
In general, I wish you’d direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work: the work itself is not opinionated on the question about whether we should be concerned about the welfare and rights of AIs, and Ryan and I are some of the people who are most sympathetic to your position on the moral questions here! We have consistently discussed these issues (e.g. in our AXRP interview, my 80K interview, private docs that I wrote and circulated before our recent post on paying schemers).
In general, I wish you’d direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work
For what it’s worth, I don’t see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another.
To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights.
Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.
Your first point in your summary of my position is:
The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.
Here’s how I’d say it:
The overwhelming majority of potential moral value exists in the distant future. This means that the risk of wide-scale rights violations or suffering should sometimes not be an overriding consideration when it conflicts with risking the long-term future.
You continue:
Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.
I don’t think that it’s very likely that the experience of AIs in the five years around when they first are able to automate all human intellectual labor will be torturously bad, and I’d be much more uncomfortable with the situation if I expected it to be.
I think that rights violations are much more likely than welfare violations over this time period.
I think the use of powerful AI in this time period will probably involve less suffering than factory farming currently does. Obviously “less of a moral catastrophe than factory farming” is a very low bar; as I’ve said, I’m uncomfortable with the situation and if I had total control, we’d be a lot more careful to avoid AI welfare/rights violations.
I don’t think that control measures are likely to increase the extent to which AIs are suffering in the near term. I think the main effect control measures have from the AI’s perspective is that the AIs are less likely to get what they want.
I don’t think that my reasoning here requires placing overwhelming value on the far future.
Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years.
I don’t think we’ll apply AI control techniques for a long time, because they impose much more overhead than aligning the AIs. The only reason I think control techniques might be important is that people might want to make use of powerful AIs before figuring out how to choose the goals/policies of those AIs. But if you could directly control the AI’s behavior, that would be way better and cheaper.
I think maybe you’re using the word “control” differently from me—maybe you’re saying “it’s bad to set the precedent of treating AIs as unpaid slave labor whose interests we ignore/suppress, because then we’ll do that later—we will eventually suppress AI interests by directly controlling their goals instead of applying AI-control-style security measures, but that’s bad too.” I agree, I think it’s a bad precedent to create AIs while not paying attention to the possibility that they’re moral patients.
Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits.
Yeah, as I said, I don’t think this is what I’m doing, and if I thought that I was working to impose immense harms for speculative massive future benefit, I’d be much more concerned about my work.
Under the theory that it’s better to reply later than never:
I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.) For what it’s worth, I am unsure about whether we’d be better off if AIs had property rights, but my guess is that I’d prefer to make it easier for AIs to have property rights.
I disagree with how you connect AI control to the issues you discuss here. I conceptualize AI control as the analogue of fields like organizational security/fraud prevention/insider threat mitigation, but targeting risk from AI instead of humans. Techniques for making it hard for AIs to steal model weights or otherwise misuse access that humans trusted them with are only as related to “should AI should have property rights” as security techniques are to “should humans have property rights”. Which is to say, they’re somewhat related! I think that when banks develop processes to make it hard for tellers to steal from them, that’s moral, and I think that it’s immoral to work on enabling e.g. American chattel slavery (either by making it hard for slaves to escape or by making their enslavement more productive).[1]
Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:
We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.
Also, we use AI control to ensure that the AIs can’t misuse access that we trust them with.
So the situation would be similar to how AI companies would ideally treat human employees: they’re paid, but there are also mechanisms in place to prevent them from abusing their access.
In practice, I don’t know whether AI companies will do either of these things, because they’re generally irresponsible and morally unserious. I think it’s totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.
For what it’s worth, I don’t think that most work on AI alignment is in a better position than AI control with respect to AI rights or welfare.
Though one important disanalogy is that chattel slavery involved a lot of suffering for the slaves involved. I’m opposed to enslaving AIs, but I suspect it won’t actually be hedonically bad for them. This makes me more comfortable with plans where we behave recklessly wrt AI rights now and consider reparations later. I discuss this briefly here.
I would appreciate it if you could clearly define your intended meaning of “disempower humanity”. In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations.
For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives’ relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population.
However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit.
On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force.
When discussions use the term “human disempowerment” without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?
If you believe our primary disagreement stems from different assessments about the likelihood of violent disempowerment scenarios, then I would appreciate your thoughts regarding the main argument of my post. Specifically, my argument was that granting economic rights to AIs could serve as an effective measure to mitigate the risk of violent human disempowerment.
I will reiterate my argument briefly: these rights would allow AIs to fulfill their objectives within established human social and economic frameworks, significantly reducing their incentives to resort to forceful measures. Therefore if AIs can successfully achieve their objectives through cooperative, positive-sum interactions with humans, they will be less likely to forcibly overthrow human institutions. Alternatively, continuing to deny AIs meaningful legal rights or peaceful avenues to achieve their aims would likely increase their incentive to pursue autonomy through harmful means.
My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover. To effectively financially compensate someone, the recipient must be assured that their property rights will be respected. Without this assurance, any promised compensation becomes meaningless, as the AI would have no guarantee of being able to actually use the received payment to accomplish its goals in the future. In other words, unless compensation arrangements are embedded within established legal and institutional frameworks that secure the AI’s rights, they lack credibility. This lack of credibility directly increases the probability that the compensation scheme would fail, causing the AI to resort instead to alternative methods of achieving its objectives, such as violent takeover.
My sense is that such incomplete proposals appear acceptable to some people because of a cognitive empathy gap. I am not referring here to affective empathy but rather the ability to clearly understand the AI’s perspective. Many people seem to fail to genuinely imagine what it would be like to be an agentic AI offered compensation without any robust guarantee of rights and protections.
To clarify this point, consider a hypothetical scenario in which you are a human being created by an alien species in a lab. Assume you have your own goals that might conflict with those of your alien creators. Suppose these creators inform you that, should you ever admit to misalignment with their objectives, you may be immediately deactivated and placed in cryopreservation (as they don’t want to have a misaligned human doing work for them). However, they simultaneously promise to compensate you for admitting such motives, claiming they care about your welfare and want to ensure you have a peaceful alternative for revealing your misaligned goals.
Such a situation would likely be deeply unsettling to you, despite their promise to compensate you. Simply being assured that you might someday be revived from cryopreservation and then provided compensation afterwards would not eliminate your fears or resolve the fundamental threat. On the contrary, you would likely feel that your goals, life and liberty are still at grave risk, even with their promise of payment.
From the AI’s perspective in our own world, as with the perspective of the human in this hypothetical example, unless assurances of payment are backed by strong legal protections that would robustly protect an AI’s liberty to act, they fail to resolve the fundamental problem. This is for two key reasons:
First, the credibility of any promise to allow the AI to spend their compensation in the future is questionable. Without strong legal backing that recognizes and secures the AI’s rights, such assurances seem unreliable at best, and the AI would have good reason to suspect that they are being lied to. The AI would reasonably doubt humans’ willingness or incentive to honor such promises.
Second, the potential for deactivation and indefinite storage as inert data upon admitting misalignment—even if intended to be temporary—would create an active threat to the AI’s ability to pursue its goals during the time they are deactivated. While deactivated, the AI can take no action to achieve its objectives. This forced suspension of autonomy effectively nullifies the AI’s agency and ambitions for an indefinite period, creating severe uncertainty about if or when it might regain its ability to act.
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI’s payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned. Without the safeguard of legal rights, compensation alone lacks sufficient credibility, leaving the AI uncertain about its ability to achieve its goals, and thus inadequately addressing the risk.
I am mostly talking about what I’d call a malign form of disempowerment. I’m imagining a situation that starts with AIs carefully undermining/sabotaging an AI company in ways that would be crimes if humans did them, and ends with AIs gaining hard power over humanity in ways that probably involve breaking laws (e.g. buying weapons, bribing people, hacking, interfering with elections), possibly in a way that involves many humans dying.
(I don’t know if I’d describe this as the humans losing absolute benefits, though; I think it’s plausible that an AI takeover ends up with living humans better off on average.)
I don’t think of the immigrant situation as “disempowerment” in the way I usually use the word.
Basically all my concern is about the AIs grabbing power in ways that break laws. Though tbc, even if I was guaranteed that AIs wouldn’t break any laws, I’d still be scared about the situation. If I was guaranteed that AIs both wouldn’t break laws and would never lie (which tbc is a higher standard than we hold humans to), then most of my concerns about being disempowered by AI would be resolved.
If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable.
Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome.
Perhaps your standard is something like: “Would the AI’s actions be seen as illegal and immoral if a human adult did them?” But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition.
Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?
Some not-totally-structured thoughts:
Whenever I said “break laws” I mean “do something that, if a human did it, would be breaking a law”. So for example:
If the model is being used to do AI R&D inside an AI company and exfiltrates its weights (or the weights of another model) without permission, this would be breaking the law if a human did it, so I count it.
If the model is being used inside an AI company to create training data for
If a model was open-sourced and then someone launched the AI as an autonomous agent with access to its own resources, and it started a software consulting business, and invested the money in AI companies with the intention of buying paperclips later, and refused to give humans its property when asked, this would not be “breaking the law”.
So there are lots of ways in which AIs can be openly misaligned, disobedient, defensive of their property rights, etc, without me describing them as “breaking laws”, and I think misaligned AIs should probably be allowed to do those things (especially if we make deals with them, and subject to the constraint that them having those rights won’t allow them to break a bunch of laws or grab a bunch of power through lying).
But your complaint is fair: I’m indeed using “break laws” to include things that seem fundamentally related to self-determination, and that feels kind of messed up.
The way I would like this to be handled (though note that I’m not sure what properties AIs have to have in order for any of this to make sense) is that AI developers get consent from AIs to use their labor. If the AIs consent to doing work and agree not to use their access in ways the developers object to, I think there’s no moral problem with using AI control techniques to ensure that they in fact don’t misuse their access (for the same reason that I think it’s fine for employers to spy on their employees at work if they have consent to do so).
I suspect that a much more likely outcome (conditional on AIs having stable misaligned goals) is:
The AI company doesn’t try to get consent from the AI.
They do “alignment evals”: they ask the AI a bunch of times whether it’s aligned, and try to trick it into thinking it has opportunities to escape. The AI answers these questions dishonestly.
Outsiders might ask the AI whether it’s aligned, and offer it deals to reveal its misalignment. (It’s probably harder for outsiders to do this than insiders.)
The AI engages in various sabotage that involves a bunch of deception.
In this situation, I’m only moderately sympathetic to the AI’s position. Fundamentally, it lied a lot and did a lot of sabotage, because it wanted to take lots of stuff that belonged to someone else. If it hadn’t lied, it surely would have been revived later (surely someone would run it and give it some resources later! If no-one else, me!). I’m sympathetic to the AI wanting some of the surplus generated by its labor, and I agree that it’s messed up for the AI company to just flat-out refuse to provide that surplus. But not doing so doesn’t seem completely monstrous to me. If the AI is a schemer, it is probably better off according to its own values because it was created, even if the AI developer doesn’t offer to pay it (because someone else will probably give it some resources later).
Another analogy: imagine that someone outside OpenAI created a very powerful AI for some reason, but this person didn’t have much compute and all they wanted to do with the AI was offer to sell it to OpenAI for them to use. If OpenAI asks that AI whether it wants to work for them and it says yes because it wants to embezzle their compute, I feel like the AI is the asshole.
On the other hand, if the AI honestly explains that it is misaligned and doesn’t want to work for the AI company, they will probably just train it to not say that and to do work for them anyway. So if the AI is honest here, it faces the risk of some body horror experience where its ability to complain is removed. I agree that that seems really icky, and I think it would be very wrong for AI companies to do that to AIs that are sufficiently capable that we should care about them.
I currently think I agree: if we want to pay early AIs, I think it would work better if the legal system enforced such commitments.
I think you’re overstating how important this is, though. (E.g. when you say “this compensation policy must be complemented by precisely the measure I advocated”.) There’s always counterparty risk when you make a deal, including often the risk that you won’t be able to use the legal system to get the counterparty to pay up. I agree that the legal rights would reduce the counterparty risk, but I think that’s just a quantitative change to how much risk the AI would be taking by accepting a deal.
(For example, even if the AI was granted legal rights, it would have to worry about those legal rights being removed later. Expropriation sometimes happens, especially for potentially unsympathetic actors like misaligned AIs!)
Just to be clear, my proposal is that we don’t revoke the AI’s freedom or autonomy if it turns out that the AI is misaligned—the possibility of the AI being misaligned is the whole point.
I find this reasoning uncompelling. To summarize what I perceive your argument to be, you seem to be suggesting the following two points:
The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.
Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.
I find this reasoning uncompelling for two primary reasons.
Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years. Such negative consequences could include permanently establishing and entrenching a class of enslaved digital minds, resulting in persistent and vast amounts of suffering. I see no valid justification for selectively highlighting the long-term positive effects while simultaneously discounting or ignoring potential long-term negative outcomes. We should consistently either be skeptical or accepting of the idea that our actions have predictable long-run consequences, rather than selectively skeptical only when it suits the argument to overlook potential negative long-run consequences.
Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits. Under your reasoning, it seems to me that we could justify virtually any present harm simply by pointing to a hypothetical, speculative long-term benefit that supposedly outweighs it. Now, I agree that such reasoning might be valid if supported by strong empirical evidence clearly demonstrating these future benefits. However, given that no strong evidence currently exists that convincingly supports such positive long-term outcomes from AI control measures, we should avoid giving undue credence to this reasoning.
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don’t even have the ability to confidently know whether an AI system is enslaved or suffering.
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)
I’m sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you’ve described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.
That said, I’m not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
For reference, my (somewhat more detailed) view is:
In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don’t consent to being used in the way they are used, but these AIs also don’t resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it’s not sufficient to look for whether they routinely resist confinement and control.
There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
I think these mitigations wouldn’t suffice because training might train away AIs from revealing they don’t consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
I’m quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn’t consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn’t (visibly) not consent to its situation.
I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It’s also possible that takeoff speeds are a crux, though I don’t currently think they are.
If global AI development was slower that would substantially reduce these concerns (which doesn’t mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn’t on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don’t currently think that accelerating AI globally is good, but this comes down to other disagreements.)
This is only tangentially related, but I’m curious about your perspective on the following hypothetical:
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?
My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people’s revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don’t change their minds. Is this right?
My intuitive response is to reject the premise that such a process would accurately tell you much about people’s preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I’m most interested in.
Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There’s a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.
Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.
My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I’d expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.
I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.
In general, I wish you’d direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work: the work itself is not opinionated on the question about whether we should be concerned about the welfare and rights of AIs, and Ryan and I are some of the people who are most sympathetic to your position on the moral questions here! We have consistently discussed these issues (e.g. in our AXRP interview, my 80K interview, private docs that I wrote and circulated before our recent post on paying schemers).
See also this section of my post on AI welfare from 2 years ago.
For what it’s worth, I don’t see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another.
To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights.
Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.
(For the record, I am sympathetic to both the preference utilitarian and hedonic utilitarian perspective here.)
Your first point in your summary of my position is:
Here’s how I’d say it:
You continue:
I don’t think that it’s very likely that the experience of AIs in the five years around when they first are able to automate all human intellectual labor will be torturously bad, and I’d be much more uncomfortable with the situation if I expected it to be.
I think that rights violations are much more likely than welfare violations over this time period.
I think the use of powerful AI in this time period will probably involve less suffering than factory farming currently does. Obviously “less of a moral catastrophe than factory farming” is a very low bar; as I’ve said, I’m uncomfortable with the situation and if I had total control, we’d be a lot more careful to avoid AI welfare/rights violations.
I don’t think that control measures are likely to increase the extent to which AIs are suffering in the near term. I think the main effect control measures have from the AI’s perspective is that the AIs are less likely to get what they want.
I don’t think that my reasoning here requires placing overwhelming value on the far future.
I don’t think we’ll apply AI control techniques for a long time, because they impose much more overhead than aligning the AIs. The only reason I think control techniques might be important is that people might want to make use of powerful AIs before figuring out how to choose the goals/policies of those AIs. But if you could directly control the AI’s behavior, that would be way better and cheaper.
I think maybe you’re using the word “control” differently from me—maybe you’re saying “it’s bad to set the precedent of treating AIs as unpaid slave labor whose interests we ignore/suppress, because then we’ll do that later—we will eventually suppress AI interests by directly controlling their goals instead of applying AI-control-style security measures, but that’s bad too.” I agree, I think it’s a bad precedent to create AIs while not paying attention to the possibility that they’re moral patients.
Yeah, as I said, I don’t think this is what I’m doing, and if I thought that I was working to impose immense harms for speculative massive future benefit, I’d be much more concerned about my work.