Distinctions when Discussing Utility Functions

An agent considering the relative costs and benefits of a long list of options

Epistemic Status: Early. The categories mentioned come mostly from experience and reflection, as opposed to existing literature.

On its surface, a utility function is an incredibly simple and generic concept. An agent has a set of choices with some numeric preferences over them. This can be Von Neumann–Morgenstern (VNM) compatible, but even that isn’t saying too much.

Trouble comes when people assume more specific meanings. Some people use utility function to mean “What does a certain human ultimately care about”, others use it to mean things like, “How much do humans prefer widgets A to widgets B?”. Utility functions are used in programs as a way of performing optimization. These are all very different uses of the term, confusion is frequent.

In our work at QURI, this is important because we want to programmatically encode utility functions and use them directly for decision-making. We’ve experienced a lot of confusion both in our internal thinking and in our trying to explain things to others, so have been working on clarifying the topic. This topic also has implications for discussions of AI alignment.

I think we can break down some interesting uses of the idea on a few different axes. I’m going to suggest some names here, but the names aren’t the important part—the categories and distinctions are.

Cheat Sheet

A Terminal utility function describes an agent’s ultimate values. A Proximal utility function describes their preferences over specific world states or actions.
An Initial terminal utility function is an agent’s initial utility function over values. A Modified terminal utility function includes modifications to one’s values over time, perhaps for reasons of optimizing their earlier terminal utility functions.
Utility functions can be Descriptive (a best-guess direct estimate), Empirical (estimated from an algorithm), Prescriptive (estimated and extrapolated from a different observer), or Operational (a specific program, perhaps executed as code).
Utility functions can be Precomputed (determined all at once, before use), or On-Demand (parts are computed when needed).
Utility functions can represent an agent’s initial intuitions (Deliberation Level 0), or their impressions after a lot of deliberation (Deliberation Level 1 to Deliberation Level N).
Utility functions can be made Public. When this happens, expect that some data will be hidden or modified.
There’s a morass of other options too, we’ve listed some more ideas at the end.

Terminal vs. Proximal

Humans can arguably be optimizing for a relatively small set of terminal values. For example, one human might ultimately be trying to optimize for the wellbeing of themselves and their family, while another human might be optimizing for “biodiversity in the universe.” It’s easy to encode these terminal values as simple VNM utility functions.

A (incredibly simple) “terminal utility function” might look something like:

1 unit personal happiness: 5
1 unit happiness by a family member: 3
1 unit of interestingness in the universe: 1

This is very different, but can coincide with, utility functions of specific actions or world states. These values will be approximations of how much these actions or world states would influence one’s terminal values. This might look something like:

Gaining $1: 5
An average headache: −30
Seeing a family member smile: 10

The terminal utility function is effectively highly compressed, and the proximal one is (theoretically) infinitely long.

There are some profound philosophical questions regarding the shape and meaning of the terminal utility function, but I don’t think we need to get into that now.

Alternative names: “Upstream vs. Downstream”. “Terminal/Intrinsic vs. Instrumental.” “Proximal vs. Ultimate”

Initial vs. Modified

The terminal utility function itself can be quite complex. One way it’s complex is because it might change over time. People who “terminally care” about another individual weren’t born that way, it developed.

Some ways that utility functions might change could include:

Intentional Terminal Utility Function Modification

On one’s initial terminal utility function, it seemed optimal for them to modify their utility function. In an idealized example, say that a person who cares about their own well-being is told that if they start caring a lot about their country/leader/partner, then they will be rewarded in ways that will help their well-being. This might be a permanent or temporary change.

There are many situations where it’s possible to have a prosocial utility function. If one person gets a lot of happiness when others near them are happy, then other people would be more inclined to be close to this person.

Likely, there are times when one might want to pre-commit to having an adversarial utility function to another person. “If you betray me, then I’ll start to feel a lot of happiness when you suffer, and this will then make me more likely to try to damage you.”

Instrumental Terminal Utility Function Modification

There are times that one’s terminal utility function stays constant, but changes make it easier to model by adjusting it. Maybe a person gains a pet, and whenever the pet is happy, their brain lights up and they gain personal well-being, which to them is a terminal value. In these cases, it might be useful to model their utility as coupled to other agents.

It could be useful to distinguish “a person’s original, early utility function” from “the utility function that a person wound up having later on, perhaps because these changes were positive to that initial utility function.” One simple way to do this is to refer to the original terminal utility function as their Initial function, and the later terminal utility function as their Modified function. Here their modified function would be their current or active function.

Reflection

Over time, an agent might reflect on their values and learn more about philosophy. This might change terminal utility functions. This is relevant to one’s proximal utility function, or if there’s a fuzzy line between one’s terminal and proximal utility functions.

Alternative names: “Level 0 vs. Level 1”, “Initial vs. Active”

Descriptive / Empirical / Prescriptive / Operational

(Note: The terms here are not exhaustive, and can be particularly confusing. Comments and ideas are appreciated.)

Descriptive

A guess at a utility function, whether that be terminal or proximal. The function can describe a hard-coded program (operational) or a pattern of behaviors that can be modeled as a utility function.

Prescriptive

A guess on what one’s utility function should be. A good fit for one person guessing another’s idealized proximal utility function, given their terminal utility function or other data.

Operational

There’s a program/algorithm that manually optimizes for a utility function. This function might not actually represent the true values of any certain agent. It’s very possible that writing a program by using some certain utility function might just be an effective way of getting it to do any other arbitrary aim. At the same time, we might want to write explicit programs to optimize a person’s utility, by using a very rough algorithm approximation of their utility function—this algorithm would still be an operational utility function.

There’s a separate philosophical question on if humans execute specific operational utility functions, or if their behaviors and motives can just be modeled as a descriptive utility function.

Empirical

An approximation of a descriptive utility function based on observational data, solely using a specific calculation. Could be the result of inverse reinforcement learning. Similar in theory to empirical distributions.

Precomputed vs. On-Demand

In examples or partial-domain operational implementations, all utilities can be explicitly and statically modeled. But in most real-life situations, neither humans nor computers precompute utility functions over all possible worlds or actions.

In terms of decision-making, an agent doesn’t need a great utility estimate on everything. Agents typically face a tiny subset of possible decisions, and when these come up, there’s often time to then perform calculations or execute similar heuristics. An agent doesn’t need some complete precomputed utility function—they just need some procedure that effectively estimates a partial utility function for decisions that they will encounter. This can be understood as an on-demand, or lazy, utility function.

Fully explicit utility functions are typically classified as precomputed.

Levels of Deliberation

It can take a lot of research and thinking to precisely estimate a part of one’s utility function. This might be true for one’s terminal utility function, but it’s definitely true for one’s proximal utility function. Even if these functions are precomputed, that doesn’t mean that they are the ideal estimates that one would reach after infinite deliberation.

We can call a utility function created from momentary intuitions one’s Deliberation-0, or D0, utility function.

One way of making a scale after this is to use each subsequent number as the exponent of the number of “effective hours spent by strong researchers investigating the topic”. So, if a person spent a research effort equivalent to “1,000 hours of quality-adjusted research” on estimating their utility function, the result would be a D3 function. (That said, again, the generic concept is much more important than the specific numbers here)

A person’s D0 utility function should approximate their Dn utility function, from their standpoint. Put in other terms, a proper bayesian shouldn’t expect to be influenced in any specific way with new evidence, in expectation.

Deliberation becomes more relevant when it comes from the perspective of other actors. Say Jane is trying to estimate Bob’s utility function. Bob expresses his D0 function, but Jane thinks it’s wrong in ways that will eventually be obvious to Bob. Jane estimates Bob’s D4 function, assuming that the result will more closely match many of Jane’s assumptions. This could be a reasonable or unreasonable thing to do, based on the circumstances.

More nuanced versions of “levels of deliberation” could include specific reasoning improvements. Perhaps, “Agent X’s Utility Function, after gaining information X, experts A and B, and deliberation time Z.” In the case of computational reasoning this could be more like, “Agent X’s Utility Function, after spending 10^9 CPU cycles per choice, and with access to computing capabilities Z*”.

Joe Carlsmith discussed related issues recently in this essay.

Private vs. Public

Utility functions can be modeled with different levels of privacy.

They can be explicitly modeled and made completely public.
They can be explicitly modeled and shared with a group of people.
They can be explicitly modeled, and kept private.
They can be intentionally not explicitly modeled, for having plausible deniability.

It should be remembered that true utility functions can contain a great deal of confidential information. In situations where this is made explicit and public, they are likely to be modified in order to be more presentable or palatable. If there is no explicit way of hiding information, then the utility functions will be edited directly. Some of these edits might be possible for others to reverse—for example, an agent might share a utility function suggesting that they are purely selfless, but other individuals making decisions for them might adjust this by using additional knowledge.

More Distinctions

The above are some key distinctions I’ve dealt with recently, but this doesn’t mean it’s an exhaustive list at all. Here are some other ideas. (Anthropic’s Claude AI helped here)

Explicit vs. Implicit/Tacit
Transitive vs. Intransitive
Total vs. [Partial or Domain-specific]
State-based vs. Action-based. (Utilities over worlds, vs. utilities over actions)
Cardinal vs. Ordinal
Type of agent: Individual person, organization, abstract idea, etc.
For humans: How to precisely choose what time slice of a human we should use for its utility?
1. Humans are always changing, and at very least their proximal (not necessarily their terminal) utility functions are as well.
2. We still don’t have precise notions of how to define an individual. As a simple example, if an individual has multiple personalities, how should this be incorporated into an operational utility function for them?
3. What time-discount to use? Perhaps others should value an agent’s future self more than the current agent does.
The information encoding format. One could use tables of specific units, or relative value functions, or neural nets.
Level of Precision: For explicit utility functions over large sets of world states or actions, there’s a choice of the level of granularity to use.
For programmatic operational utility functions, there are a lot of questions about the specific function definitions. These could be parameterized in many different ways.
Monistic vs. Pluralistic. It’s possible in theory for a function to return several flavors of value that might be very difficult or impossible to directly compare. Arguably this would cross outside of “utility function” territory, but in theory it can be very similar, just one step more complicated.

Going Forward

I think that there can be a great deal of interesting work to explicitly estimate and understand utility functions. I don’t see much work happening here right now, perhaps some of this is due to confused terminology.

There’s been a lot of writing about the challenges of getting AIs to best model humanity’s utility function. Arguably we can make some progress here by trying to get humanity to model humanity’s utility function.

Explicit or programmatic utility functions seem like a practical, general-purpose method of precisely describing an agent or group’s preferences. This can be used for negotiation or even advanced programmatic automation. Explicit utility functions can force people to be honest about values, but I believe this is something we should probably attempt sooner than later.

Thanks to Nuño Sempere for comments.