# MichaelStJules comments on MichaelStJules’s Shortform

• Utility functions (preferential or ethical, e.g. social welfare functions) can have lexicality, so that a difference in category can be larger than the maximum difference in category , but we can still make probabilistic tradeoffs between them. This can be done, for example, by having separate utility functions, and for and , respectively, such that

• for all satisfying the condition and all satisfying (e.g. can be the negation of , although this would normally lead to discontinuity).
• is bounded to have range in the interval (or range in an interval of length at most 1).

Then we can define our utility function as the sum , so

This ensures that all outcomes with are at least as good as all outcomes with , without being Pascalian/​fanatical to maximize regardless of what happens to . Note, however, that may be increasingly difficult to change as the number of moral patients increases, so we may approximate Pascalian fanaticism in this limit, anyway.

For example, if there is any suffering in that meets a certain threshold of intensity, , and if there is no suffering at all in , . can still be continuous this way.

If the probability that this threshold is met is and the expected value of conditional on this is bounded below by , , regardless of for the choices available to you, then increasing by at least , which can be small, is better than trying to reduce .

As another example, an AI could be incentivized to ensure it gets monitored by law enforcement. Its reward function could look like

where is 1 if the AI is monitored by law enforcement and passes some test (or did nothing?) in period , and 0 otherwise. You could put an upper bound on the number of periods or use discounting to ensure the right term can’t evaluate to infinity since that would allow to be ignored (maybe the AI will predict its expected lifetime to be infinite), but this would eventually allow to overcome the , unless you also discount the future in .

This should also allow us to modify the utility function , if preventing the modification would cause a test to be failed.

Furthermore, satisfying the strongly lexically dominates increasing , but we can still make expected tradeoffs between them.

The problem then reduces to designing the AI in such a way that it can’t cheat on the test, which might be something we can hard-code into it (e.g. its internal states and outputs are automatically sent to law enforcement), and so could be easier than getting right.

This overall approach can be repeated for any finite number of functions, . Recursively, you could define

for increasing and bounded with range in an interval of length at most 1, e.g. some sigmoid function. In this way, each dominates the previous ones, as above.

• To adapt to a more deontological approach (not rule violation minimization, but according to which you should not break a rule in order to avoid violating a rule later), you could use geometric discounting, and your (moral) utility function could look like:

where

1. is the act and its consequences without uncertainty and you maximize the expected value of f over uncertainty in ,

2. is broken into infinitely many disjoint intervals , with coming just before temporally (and these intervals are chosen to have the same time endpoints for each possible ),

3. if a rule is broken in , and otherwise, and

4. is a constant, .

So, the idea is that if and only if the earliest rule violation in happens later than the earliest one in (at the level of precision determined by how the intervals are broken up). The value of ensures this. (Well, there are some rare exceptions if ). You essentially count rule violations and minimize the number of them, but you use geometric discounting based on when the rule violation happens in such a way to ensure that it’s always worse to break a rule earlier than to break any number of rules later.

However, breaking up into intervals this way probably sucks for a lot of reasons, and I doubt it would lead to prescriptions people with deontological views endorse when they maximize expected values.

This approach basically took for granted that a rule is broken not when I act, but when a particular consequence occurs.

If, on the other hand, a rule is broken at the time I act, maybe I need to use some functions instead of the , because whether or not I act now (in time interval ) and break a rule depends on what happens in the future. This way, however, could basically always be , so I don’t think this approach works.

• This nesting approach with above also allows us to “fix” maximin/​leximin under conditions of uncertainty to avoid Pascalian fanaticism, given a finite discretization of welfare levels or finite number of lexical thresholds. Let the welfare levels be , and define:

i.e. is the number of individuals with welfare level at most , where is the welfare of individual , and is 1 if and 0 otherwise. Alternatively, we could use .

In situations without uncertainty, this requires us to first choose among options that minimize the number of individuals with welfare at most , because takes priority over , for all , and then, having done that, choose among those that minimize the number of individuals with welfare at most , since takes priority over , for all , and then choose among those that minimize the number of individuals with welfare at most , and so on, until .

This particular social welfare function assigns negative value to new existences when there are no impacts on others, which leximin/​maximin need not do in general, although it typically does in practice, anyway.

This approach does not require welfare to be cardinal, i.e. adding and dividing welfare levels need not be defined. It also dodges representation theorems like this one (or the stronger one in Lemma 1 here, see the discussion here), because continuity is not satisfied (and welfare need not have any topological structure at all, let alone be real-valued). Yet, it still satisfies anonymity/​symmetry/​impartiality, monotonicity/​Pareto, and separability/​independence. Separability means that whether one outcome is better or worse than another does not depend on individuals unaffected by the choice between the two.

• Here’s a way to capture lexical threshold utilitarianism with a separable theory and while avoiding Pascalian fanaticism, with a negative threshold and a positive threshold > 0:

• The first term is just standard utilitarianism, but squashed with a function into an interval of length at most 1.

• The second/​middle sum is the number of individuals (or experiences or person-moments) with welfare at least , which we add to the first term. Any change in number past this threshold dominates the first term.

• The third/​last sum is the number of individuals with welfare at most , which we subtract from the rest. Any change in number past this threshold dominates the first term.

Either of the second or third term can be omitted.

We could require for all , although this isn’t necessary.

More thresholds could be used, as in this comment: we would apply to the whole expression above, and then add new terms like the second and/​or the third, with thresholds and , and repeat as necessary.