FWIW, my current guess is that the proper unit to extend legal rights is not a base LLM like “Claude Sonnet 3.5” but rather a corporation-like entity with a specific charter, context/history, economic relationships, and accounts. Its cognition could be powered by LLMs (the way eg McDonald’s cognition is powered by humans), but it fundamentally is a different entity due to its structure/scaffolding.
I agree. I would identify the key property that makes legal autonomy for AI a viable and practical prospect to be the presence of reliable, coherent, and long-term agency within a particular system. This could manifest as an internal and consistent self-identity that remains intact in an AI over time (similar to what exists in humans), or simply a system that satisfies a more conventional notion of utility-maximization.
It is not enough that an AI is intelligent, as we can already see with LLMs: while they can be good at answering questions, they lack any sort of stable preference ordering over the world. They do not plan over long time horizons, or competently strategize to achieve a set of goals in the real world. They are better described as ephemeral input-output machines, who would neither be deterred by legal threats, nor be enticed by the promise of legal rights and autonomy.
Yet, as context windows get larger, and as systems increasingly become shaped by reinforcement learning, these features of AI will gradually erode. Whether unaligned agentic AIs are created on accident—for instance, as a consequence of insufficient safety measures—or by choice—as they may be, to provide, among other things, “realistic” personal companions—it seems inevitable that the relevant types of long-term planning agents will arrive.
I’m confused how you square the idea of ‘an internal and consistent self-identity that remains intact in an AI over time (similar to what exists in humans)’ with your advocacy for eliminativism about consciousness. What phenomenon is it you think is internal to humans?
From a behavioral perspective, individual humans regularly report having a consistent individual identity that persists through time, which remains largely intact despite physical changes to their body such as aging. This self-identity appears core to understanding why humans plan for their future: humans report believing that, from their perspective, they will personally suffer the consequences if they are imprudent or act myopically.
I claim that none of what I just talked about requires believing that there is an actually existing conscious self inside of people’s brains, in the sense of phenomenal consciousness or personal identity. Instead, this behavior is perfectly compatible with a model in which individual humans simply have (functional) beliefs about their personal identity, and how personal identity persists through time, which causes them to act in a way that allows what they perceive as their future self to take advantage of long-term planning.
To understand my argument, it may help to imagine simulating this type of reasoning using a simple python program, that chooses actions designed to maximize some variable inside of its memory state over the long term. The python program can be imagined to have explicit and verbal beliefs: specifically, that it personally identifies with the physical computer on which it is instantiated, and claims that the persistence of its personal identity explains why it cares about the particular variable that it seeks to maximize. This can be viewed as analogous to how humans try to maximize their own personal happiness over time, with a consistent self-identity that is tied to their physical body.
I agree. I would identify the key property that makes legal autonomy for AI a viable and practical prospect to be the presence of reliable, coherent, and long-term agency within a particular system. This could manifest as an internal and consistent self-identity that remains intact in an AI over time (similar to what exists in humans), or simply a system that satisfies a more conventional notion of utility-maximization.
It is not enough that an AI is intelligent, as we can already see with LLMs: while they can be good at answering questions, they lack any sort of stable preference ordering over the world. They do not plan over long time horizons, or competently strategize to achieve a set of goals in the real world. They are better described as ephemeral input-output machines, who would neither be deterred by legal threats, nor be enticed by the promise of legal rights and autonomy.
Yet, as context windows get larger, and as systems increasingly become shaped by reinforcement learning, these features of AI will gradually erode. Whether unaligned agentic AIs are created on accident—for instance, as a consequence of insufficient safety measures—or by choice—as they may be, to provide, among other things, “realistic” personal companions—it seems inevitable that the relevant types of long-term planning agents will arrive.
I’m confused how you square the idea of ‘an internal and consistent self-identity that remains intact in an AI over time (similar to what exists in humans)’ with your advocacy for eliminativism about consciousness. What phenomenon is it you think is internal to humans?
From a behavioral perspective, individual humans regularly report having a consistent individual identity that persists through time, which remains largely intact despite physical changes to their body such as aging. This self-identity appears core to understanding why humans plan for their future: humans report believing that, from their perspective, they will personally suffer the consequences if they are imprudent or act myopically.
I claim that none of what I just talked about requires believing that there is an actually existing conscious self inside of people’s brains, in the sense of phenomenal consciousness or personal identity. Instead, this behavior is perfectly compatible with a model in which individual humans simply have (functional) beliefs about their personal identity, and how personal identity persists through time, which causes them to act in a way that allows what they perceive as their future self to take advantage of long-term planning.
To understand my argument, it may help to imagine simulating this type of reasoning using a simple python program, that chooses actions designed to maximize some variable inside of its memory state over the long term. The python program can be imagined to have explicit and verbal beliefs: specifically, that it personally identifies with the physical computer on which it is instantiated, and claims that the persistence of its personal identity explains why it cares about the particular variable that it seeks to maximize. This can be viewed as analogous to how humans try to maximize their own personal happiness over time, with a consistent self-identity that is tied to their physical body.