It is becoming increasingly clear to many people that the term “AGI” is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as “superintelligence,” “aligned AI,” “power-seeking AI,” and “schemer,” suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.
To start with, the term “superintelligence” is vague because it encompasses an extremely broad range of capabilities above human intelligence. The differences within this range can be immense. For instance, a hypothetical system at the level of “GPT-8″ would represent a very different level of capability compared to something like a “Jupiter brain”, i.e., an AI with the computing power of an entire gas giant. When people discuss “what a superintelligence can do” the lack of clarity around which level of capability they are referring to creates significant confusion. The term lumps together entities with drastically different abilities, leading to oversimplified or misleading conclusions.
Similarly, “aligned AI” is an ambiguous term because it means different things to different people. For some, it implies an AI that essentially perfectly aligns with a specific utility function, sharing a person or group’s exact values and goals. For others, the term simply refers to an AI that behaves in a morally acceptable way, adhering to norms like avoiding harm, theft, or murder, or demonstrating a concern for human welfare. These two interpretations are fundamentally different.
First, the notion of perfect alignment with a utility function is a much more ambitious and stringent standard than basic moral conformity. Second, an AI could follow moral norms for instrumental reasons—such as being embedded in a system of laws or incentives that punish antisocial behavior—without genuinely sharing another person’s values or goals. The same term is being used to describe fundamentally distinct concepts, which leads to unnecessary confusion.
The term “power-seeking AI” is also problematic because it suggests something inherently dangerous. In reality, power-seeking behavior can take many forms, including benign and cooperative behavior. For example, a human working an honest job is technically seeking “power” in the form of financial resources to buy food, but this behavior is usually harmless and indeed can be socially beneficial. If an AI behaves similarly—for instance, engaging in benign activities to acquire resources for a specific purpose, such as making paperclips—it is misleading to automatically label it as “power-seeking” in a threatening sense.
To employ careful thinking, one must distinguish between the illicit or harmful pursuit of power, and a more general pursuit of control over resources. Both can be labeled “power-seeking” depending on the context, but only the first type of behavior appears inherently concerning. This is important because it is arguably only the second type of behavior—the more general form of power-seeking activity—that is instrumentally convergent across a wide variety of possible agents. In other words, destructive or predatory power-seeking behavior does not seem instrumentally convergent across agents with almost any value system, even if such agents would try to gain control over resources in a more general sense in order to accomplish their goals. Using the term “power-seeking” without distinguishing these two possibilities overlooks nuance and can therefore mislead discussions about AI behavior.
The term “schemer” is another example of an unclear or poorly chosen label. The term is ambiguous regarding the frequency or severity of behavior required to warrant the label. For example, does telling a single lie qualify an AI as a “schemer,” or would it need to consistently and systematically conceal its entire value system? As a verb, “to scheme” often seems clear enough, but as a noun, the idea of a “schemer” as a distinct type of AI that we can reason about appears inherently ambiguous. And I would argue the concept lacks a compelling theoretical foundation. (This matters enormously, for example, when discussing “how likely SGD is to find a schemer”.) Without clear criteria, the term remains confusing and prone to misinterpretation.
In all these cases—whether discussing “superintelligence,” “aligned AI,” “power-seeking AI,” or “schemer”—it is possible to define each term with precision to resolve ambiguities. However, even if canonical definitions are proposed, not everyone will adopt or fully understand them. As a result, the use of these terms is likely to continue causing confusion, especially as AI systems become more advanced and the nuances of their behavior become more critical to understand and distinguish from other types of behavior. This growing complexity underscores the need for greater precision and clarity in the language we use to discuss AI and AI risk.
I agree, I’ve also been thinking about this. I think there’s a great deal of interesting work here, to try to put together better terminology.
My guess is that it would be difficult to change all dialogue using this vocabulary anytime soon, but even shifting some of the research dialogue could go a long way.
I worked in advertising agencies for almost a decade. People there complain about terminology too. But it never gets fixed because that’s not how linguistics / culture works. This is an intractable problem and only useful for insiders who feel like venting.
Most analytic philosophers, lawyers, and scientists have converged on linguistic norms that are substantially more precise than the informal terminology employed by LessWrong-style speculation about AI alignment. So this is clearly not an intractable problem; otherwise these people in other professions could not have made their language more precise. Rather, success depends on incentives and the willingness of people within the field to be more rigorous.
I don’t think this is true, or at least I think you are misrepresenting the tradeoffs and diversity here. There is some publication bias here because people are more precise in papers, but honestly, scientists are also not more precise than many top LW posts in the discussion section of their papers, especially when covering wider-ranging topics.
Predictive coding papers use language incredibly imprecisely, analytic philosophy often uses words in really confusing and inconsistent ways, economists (especially macroeconomists) throw out various terms in quite imprecise ways.
But also, as soon as you leave the context of official publications, but are instead looking at lectures, or books, or private letters, you will see people use language much less precisely, and those contexts are where a lot of the relevant intellectual work happens. Especially when scientists start talking about the kind of stuff that LW likes to talk about, like intelligence and philosophy of science, there is much less rigor (and also, I recommend people read a human’s guide to words as a general set of arguments for why “precise definitions” are really not viable as a constraint on language)
It is becoming increasingly clear to many people that the term “AGI” is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as “superintelligence,” “aligned AI,” “power-seeking AI,” and “schemer,” suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.
To start with, the term “superintelligence” is vague because it encompasses an extremely broad range of capabilities above human intelligence. The differences within this range can be immense. For instance, a hypothetical system at the level of “GPT-8″ would represent a very different level of capability compared to something like a “Jupiter brain”, i.e., an AI with the computing power of an entire gas giant. When people discuss “what a superintelligence can do” the lack of clarity around which level of capability they are referring to creates significant confusion. The term lumps together entities with drastically different abilities, leading to oversimplified or misleading conclusions.
Similarly, “aligned AI” is an ambiguous term because it means different things to different people. For some, it implies an AI that essentially perfectly aligns with a specific utility function, sharing a person or group’s exact values and goals. For others, the term simply refers to an AI that behaves in a morally acceptable way, adhering to norms like avoiding harm, theft, or murder, or demonstrating a concern for human welfare. These two interpretations are fundamentally different.
First, the notion of perfect alignment with a utility function is a much more ambitious and stringent standard than basic moral conformity. Second, an AI could follow moral norms for instrumental reasons—such as being embedded in a system of laws or incentives that punish antisocial behavior—without genuinely sharing another person’s values or goals. The same term is being used to describe fundamentally distinct concepts, which leads to unnecessary confusion.
The term “power-seeking AI” is also problematic because it suggests something inherently dangerous. In reality, power-seeking behavior can take many forms, including benign and cooperative behavior. For example, a human working an honest job is technically seeking “power” in the form of financial resources to buy food, but this behavior is usually harmless and indeed can be socially beneficial. If an AI behaves similarly—for instance, engaging in benign activities to acquire resources for a specific purpose, such as making paperclips—it is misleading to automatically label it as “power-seeking” in a threatening sense.
To employ careful thinking, one must distinguish between the illicit or harmful pursuit of power, and a more general pursuit of control over resources. Both can be labeled “power-seeking” depending on the context, but only the first type of behavior appears inherently concerning. This is important because it is arguably only the second type of behavior—the more general form of power-seeking activity—that is instrumentally convergent across a wide variety of possible agents. In other words, destructive or predatory power-seeking behavior does not seem instrumentally convergent across agents with almost any value system, even if such agents would try to gain control over resources in a more general sense in order to accomplish their goals. Using the term “power-seeking” without distinguishing these two possibilities overlooks nuance and can therefore mislead discussions about AI behavior.
The term “schemer” is another example of an unclear or poorly chosen label. The term is ambiguous regarding the frequency or severity of behavior required to warrant the label. For example, does telling a single lie qualify an AI as a “schemer,” or would it need to consistently and systematically conceal its entire value system? As a verb, “to scheme” often seems clear enough, but as a noun, the idea of a “schemer” as a distinct type of AI that we can reason about appears inherently ambiguous. And I would argue the concept lacks a compelling theoretical foundation. (This matters enormously, for example, when discussing “how likely SGD is to find a schemer”.) Without clear criteria, the term remains confusing and prone to misinterpretation.
In all these cases—whether discussing “superintelligence,” “aligned AI,” “power-seeking AI,” or “schemer”—it is possible to define each term with precision to resolve ambiguities. However, even if canonical definitions are proposed, not everyone will adopt or fully understand them. As a result, the use of these terms is likely to continue causing confusion, especially as AI systems become more advanced and the nuances of their behavior become more critical to understand and distinguish from other types of behavior. This growing complexity underscores the need for greater precision and clarity in the language we use to discuss AI and AI risk.
I agree, I’ve also been thinking about this. I think there’s a great deal of interesting work here, to try to put together better terminology.
My guess is that it would be difficult to change all dialogue using this vocabulary anytime soon, but even shifting some of the research dialogue could go a long way.
I worked in advertising agencies for almost a decade. People there complain about terminology too. But it never gets fixed because that’s not how linguistics / culture works. This is an intractable problem and only useful for insiders who feel like venting.
Most analytic philosophers, lawyers, and scientists have converged on linguistic norms that are substantially more precise than the informal terminology employed by LessWrong-style speculation about AI alignment. So this is clearly not an intractable problem; otherwise these people in other professions could not have made their language more precise. Rather, success depends on incentives and the willingness of people within the field to be more rigorous.
I don’t think this is true, or at least I think you are misrepresenting the tradeoffs and diversity here. There is some publication bias here because people are more precise in papers, but honestly, scientists are also not more precise than many top LW posts in the discussion section of their papers, especially when covering wider-ranging topics.
Predictive coding papers use language incredibly imprecisely, analytic philosophy often uses words in really confusing and inconsistent ways, economists (especially macroeconomists) throw out various terms in quite imprecise ways.
But also, as soon as you leave the context of official publications, but are instead looking at lectures, or books, or private letters, you will see people use language much less precisely, and those contexts are where a lot of the relevant intellectual work happens. Especially when scientists start talking about the kind of stuff that LW likes to talk about, like intelligence and philosophy of science, there is much less rigor (and also, I recommend people read a human’s guide to words as a general set of arguments for why “precise definitions” are really not viable as a constraint on language)