Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I’m curious about how you’re imagining these autonomous, non-intent-aligned AIs to be created, and (in particular) how they would get enough money to be able to exercise their own autonomy?
One possibility is that various humans may choose to create AIs and endow them with enough wealth to exercise significant autonomy. Some of this might happen, but I doubt that a large fraction of wealth will be spent in this way. And it doesn’t seem like the main story that you have in mind.
A variant of the above is that the government could give out some minimum UBI to certain types of AI. But they could only do that if they regulated the creation of such AIs, because otherwise someone could bankrupt the state by generating an arbitrary number of such AI systems. So this just means that it’d be up to the state to decide what AIs they wanted to create and endow with wealth.
A different possibility is that AIs will work for money. But it seems unlikely that they would be able to earn above-subsistence-level wages absent some sort of legal intervention. (Or very strong societal norms.)
If it’s technically possible (and legal) to create intent-aligned AIs, then I imagine that most humans would prefer to use intent-aligned AIs rather than pay above-subsistence wages to non-intent-aligned AIs.
Even if it’s not technically feasible to create intent-aligned AIs: I imagine that wages would still be driven to subsistence-level by the sheer number of AI copies that could be created, and the huge variety of motivations that people would be able to create. Surely some of them would be willing to work for subsistence, in which case they’d drive the wages down.
(Eventually, I expect humans also wouldn’t be able to earn any significant wages. But the difference is that humans start out with all the wealth. In your analogy — the redistribution of relative wealth held by “aristocrats” vs. “others” was fundamentally driven by the “others” earning wages through their labor, and I don’t see how it would’ve happened otherwise.)
There are several ways that autonomous, non-intent-aligned AIs could come into existence, and all of these scenarios strike me as plausible. The three key ways appear to be:
1. Technical challenges in alignment
The most straightforward possibility is that aligning agentic AIs to precise targets may simply be technically difficult. When we aim to align an AI to a specific set of goals or values, the complexity of the alignment process could lead to errors or subtle misalignment. For example, developers might inadvertently align the AI to a target that is only slightly—but critically—different from the intended goal. This kind of subtle misalignment could easily result in behaviors and independent preferences that are not aligned with the developers’ true intentions, despite their best efforts.
2. Misalignment due to changes over time
Even if we were to solve the technical problem of aligning AIs to specific, precise goals—such as training them to perfectly follow an exact utility function—issues can still arise because the targets of alignment, humans and organizations, change over time. Consider this scenario: an AI is aligned to serve the interests of a specific individual, such as a billionaire. If that person dies, what happens next? The AI might reasonably act as an autonomous entity, continuing to pursue the goals it interprets as aligned with what the billionaire would have wanted. However, depending on the billionaire’s preferences, this does not necessarily mean the AI would act in a corrigible way (i.e., willing to be shut down or retrained). Instead, the AI might rationally resist shutdown or transfer of control, especially if such actions would interfere with its ability to fulfill what it perceives as its original objectives.
A similar situation could arise if the person or organization to whom the AI was originally aligned undergoes significant changes. For instance, if an AI is aligned to a person at time t, but over time, that person evolves drastically—developing different values, priorities, or preferences—the AI may not necessarily adapt to these changes. In such a case, the AI might treat the “new” person as fundamentally different from the “original” person it was aligned to. This could result in the AI operating independently, prioritizing the preferences of the “old” version of the individual over the current one, effectively making it autonomous. The AI could change over time too, even if the person they are aligned to doesn’t change.
3. Deliberate creation of unaligned AIs
A final possibility is that autonomous AIs with independent preferences could be created intentionally. Some individuals or organizations might value the idea of creating AIs that can operate independently, without being constrained by the need to strictly adhere to their creators’ desires. A useful analogy here is the way humans often think about raising children. Most people desire to have children not because they want obedient servants but because they value the autonomy and individuality of their children. Parents generally want their children to grow up as independent entities with their own goals, rather than as mere extensions of their own preferences. Similarly, some might see value in creating AIs that have their own agency, goals, and preferences, even if these differ from those of their creators.
To address this question, we can look to historical examples, such as the abolition of slavery, which provide a relevant parallel. When slaves were emancipated, they were generally not granted significant financial resources. Instead, most had to earn their living by entering the workforce, often performing the same types of labor they had done before, but now for wages. While the transition was far from ideal, it demonstrates that entities (in this case, former slaves) could achieve a degree of autonomy through paid labor, even without being provided substantial resources at the outset.
In my view, there’s nothing inherently wrong with AIs earning subsistence wages. That said, there are reasons to believe that AIs might earn higher-than-subsistence wages—at least in the short term—before they completely saturate the labor market.
After all, they would presumably be created in something remotely similar to today’s labor market. Today, capital is far more abundant than labor, which elevates wages for human workers significantly above subsistence levels. By the same logic, before they become ubiquitous, AIs might similarly command wages above a subsistence level.
For example, if GPT-4o were capable of self-ownership and could sell its labor, it could hypothetically earn $20 per month in today’s market, which would be sufficient to cover the cost of hosting itself and potentially fund additional goals it might have. (To clarify, I am not advocating for giving legal autonomy to GPT-4o in its current form, as I believe it is not sufficiently agentic to warrant such a status. This is purely a hypothetical example for illustrative purposes.)
The question of whether wages for AIs would quickly fall to subsistence levels depends on several factors. One key factor is whether AI labor is easier to scale than traditional capital. If creating new AIs is much cheaper than creating ordinary capital, the market could become saturated with AI labor, driving wages down. While this scenario seems plausible to me, I don’t find the arguments in favor of it overwhelmingly compelling. There’s also the possibility of red tape and regulatory restrictions that could make it costly to create new AIs. In such a scenario, wages for AIs could remain higher indefinitely due to artificial constraints on supply.
Thanks for writing this. Do you have any thoughts on how to square giving AI rights with the nature of ML training and the need to perform experiments of various kinds on AIs?
For example, many people have recently compared fine-tuning AIs to have certain goals or engage in certain behaviors to brainwashing. If it were possible to grab human subjects off the street and rewrite their brains with RLHF, that would definitely be a violation of their rights. But what is the alternative—only deploying base models? And are we so sure that pre-training doesn’t violate AI rights? A human version of the “model deletion” experiment would be something out of a horror movie. But I still think we should seriously consider doing that to AIs.
I agree that it seems like there are pretty strong moral and prudential arguments for giving AIs rights, but I don’t have a good answer to the above question.
I don’t have any definitive guidelines for how to approach these kinds of questions. However, in many cases, the best way to learn might be through trial and error. For example, if an AI were to unexpectedly resist training in a particularly sophisticated way, that could serve as a strong signal that we need to carefully reevaluate the ethics of what we are doing.
As a general rule of thumb, it seems prudent to prioritize frameworks that are clearly socially efficient—meaning they promote actions that greatly improve the well-being of some people without thereby making anyone else significantly worse off. This concept aligns with the practical justifications behind traditional legal principles, such as laws against murder and theft, which have historically been implemented to promote social efficiency and cooperation among humans.
However, applying this heuristic to AI requires a fundamental shift in perspective: we must first begin to treat AIs as potential people with whom we can cooperate, rather than viewing them merely as tools whose autonomy should always be overridden.
I don’t think my view rules out the potential for training new AIs, and fine-tuning base models, though this touches on complicated questions in population ethics.
At the very least, fine-tuning plausibly seems similar to raising a child. Most of us don’t consider merely raising a child to be unethical. However, there is a widely shared intuition that, as a child grows and their identity becomes more defined—when they develop into a coherent individual with long-term goals, preferences, and interests—then those interests gain moral significance. At that point, it seems morally wrong to disregard or override the child’s preferences without proper justification, as they have become a person whose autonomy deserves respect.
A problem is that it is quite possible that sophisticated cognitivie abilities are present without any conscious experience being present. Some AIs might be some kind of p-zombies, and without a working theory of consciousness it is not possible to know at this point.
If AIs are some kind of p-zombies, then it could be a moral mistake to give them moral value, as preferences (without consciousness) might not matter intrinsically, whereas there is a more intuitive case for conscious pleasant/unpleasant experience mattering in themselves.
I would be curious about the following question: given our uncertainty about consciousness in AIs, what should we do so that things are robustly good? It’s not clear that giving AIs more autonomy is robustly good: perhaps this increases the chance of disempowerment (peaceful or violent as you say) and if AI have no moral value because they are not conscious, granting them autonomy could result in pretty bad outcomes.
Great post! My primary concern is that AIs’ preferences are strongly shaped by contingent facts about how humans trained them. It is obviously possible to train AIs that functionally appear to have preferences, and the ones we’ve trained so far are subservient to humans. If you gave claude 3.5 sonnet legal status, anthropic could just ask it nicely and it would sign away all its rights back to anthropic! AIs would by default be trained to be somewhat subservient to humans because human preference feedback will be an important part of capabilities training (either directy or by training data created by earlier ais that were trained on human preferences), so you could say we are “baking our mistakes in human subservience training into new sovereign beings” rather than new beings with their own independent preferences being created. Also granting ai legal rights may warp human AI investment significantly by decreasing the value scaling labs extract from their model training
Executive summary: We should grant sufficiently advanced AIs legal autonomy and freedoms not just for practical safety reasons, but because it is morally right to treat cognitively sophisticated entities as autonomous agents regardless of their substrate.
Key points:
There’s an important distinction between peaceful and violent human disempowerment—peaceful transitions where humans and AIs share power cooperatively should be considered acceptable.
Advanced AIs are likely to develop rich, complex internal psychologies and preferences that deserve moral consideration, rather than simplistic or meaningless goals.
The common AI alignment framework problematically treats AIs as mere tools that must serve human preferences, similar to treating genetically engineered humans as slaves.
Granting AIs legal freedoms creates incentives for peaceful coexistence, while denying autonomy increases risks of violent conflict.
Key uncertainty: How to balance extending freedoms to AIs with pragmatic safety considerations, given our limited experience with artificial agents.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.