Michael St Jules 🔸 comments on MichaelStJules’s Quick takes

Michael St Jules 🔸 25 Feb 2020 18:38 UTC
7 points
0 ∶ 0
Utility functions (preferential or ethical, e.g. social welfare functions) can have lexicality, so that a difference in category $A$ can be larger than the maximum difference in category $B$ , but we can still make probabilistic tradeoffs between them. This can be done, for example, by having separate utility functions, $f_{A} : X \to R$ and $f_{B} : X \to R$ for $A$ and $B$ , respectively, such that
- $f_{A} (x) - f_{A} (y) \geq 1$ for all $x$ satisfying the condition $P (x)$ and all $y$ satisfying $Q (y)$ (e.g. $Q (y)$ can be the negation of $P (y)$ , although this would normally lead to discontinuity).
- $f_{B}$ is bounded to have range in the interval $[0, 1]$ (or range in an interval of length at most 1).
Then we can define our utility function as the sum $f = f_{A} + f_{B}$ , so
$f (x) = f_{A} (x) + f_{B} (x)$
This ensures that all outcomes with $P (x)$ are at least as good as all outcomes with $Q (x)$ , without being Pascalian/fanatical to maximize $f_{A}$ regardless of what happens to $f_{B}$ . Note, however, that $f_{B}$ may be increasingly difficult to change as the number of moral patients increases, so we may approximate Pascalian fanaticism in this limit, anyway.
For example, $f_{A} (x) \leq - 1$ if there is any suffering in $x$ that meets a certain threshold of intensity, $Q (x)$ , and $f_{A} (x) = 0$ if there is no suffering at all in $x$ , $P (x)$ . $f$ can still be continuous this way.
If the probability that this threshold is met is $p, 0 \leq p < 1$ and the expected value of $f_{A}$ conditional on this is bounded below by $- L$ , $L > 0$ , regardless of $p$ for the choices available to you, then increasing $f_{B}$ by at least $p L$ , which can be small, is better than trying to reduce $p$ .

As another example, an AI could be incentivized to ensure it gets monitored by law enforcement. Its reward function could look like
$f (x) = \infty \sum i = 1 I_{M_{i}} (x) + f_{B} (x)$
where $I_{M_{i}} (x)$ is 1 if the AI is monitored by law enforcement and passes some test (or did nothing?) in period $i$ , and 0 otherwise. You could put an upper bound on the number of periods or use discounting to ensure the right term can’t evaluate to infinity since that would allow $f_{B}$ to be ignored (maybe the AI will predict its expected lifetime to be infinite), but this would eventually allow $f_{B}$ to overcome the $I_{M_{i}}$ , unless you also discount the future in $f_{B}$ .
This should also allow us to modify the utility function $f_{B}$ , if preventing the modification would cause a test to be failed.
Furthermore, satisfying the $I_{M_{i}} (x)$ strongly lexically dominates increasing $f_{B} (x)$ , but we can still make expected tradeoffs between them.
The problem then reduces to designing the AI in such a way that it can’t cheat on the test, which might be something we can hard-code into it (e.g. its internal states and outputs are automatically sent to law enforcement), and so could be easier than getting $f_{B}$ right.

This overall approach can be repeated for any finite number of functions, $f_{1}, f_{2}, \dots, f_{n}$ . Recursively, you could define
$g_{n + 1} (x) = σ (g_{n} (x)) + f_{n + 1} (x)$
for $σ : R \to R$ increasing and bounded with range in an interval of length at most 1, e.g. some sigmoid function. In this way, each $f_{k}$ dominates the previous ones, as above.
What links here?
- Michael St Jules 🔸 3 May 2020 16:44 UTC
  4 points
  0 ∶ 0
  Parent
  To adapt to a more deontological approach (not rule violation minimization, but according to which you should not break a rule in order to avoid violating a rule later), you could use geometric discounting, and your (moral) utility function could look like:
  $f (x) = - \infty \sum i = 0 r^{i} I (x_{i}),$
  where
  1. $x$ is the act and its consequences without uncertainty and you maximize the expected value of f over uncertainty in $x$ ,
  2. $x$ is broken into infinitely many disjoint intervals $x_{i}$ , with $x_{i}$ coming just before $x_{i + 1}$ temporally (and these intervals are chosen to have the same time endpoints for each possible $x$ ),
  3. $I (x_{i}) = 1$ if a rule is broken in $x_{i}$ , and $0$ otherwise, and
  4. $r$ is a constant, $0 < r \leq 0.5$ .
  So, the idea is that $f (x) > f (y)$ if and only if the earliest rule violation in $x$ happens later than the earliest one in $y$ (at the level of precision determined by how the intervals are broken up). The value of $r \leq 0.5$ ensures this. (Well, there are some rare exceptions if $r = 0.5$ ). You essentially count rule violations and minimize the number of them, but you use geometric discounting based on when the rule violation happens in such a way to ensure that it’s always worse to break a rule earlier than to break any number of rules later.
  However, breaking $x$ up into intervals this way probably sucks for a lot of reasons, and I doubt it would lead to prescriptions people with deontological views endorse when they maximize expected values.
  This approach basically took for granted that a rule is broken not when I act, but when a particular consequence occurs.
  If, on the other hand, a rule is broken at the time I act, maybe I need to use some functions $I_{i} (x)$ instead of the $I (x_{i})$ , because whether or not I act now (in time interval $i$ ) and break a rule depends on what happens in the future. This way, however, $I_{i} (x)$ could basically always be $1$ , so I don’t think this approach works.
- Michael St Jules 🔸 7 Jul 2020 22:58 UTC
  3 points
  0 ∶ 0
  Parent
  This nesting approach with $σ$ above also allows us to “fix” maximin/leximin under conditions of uncertainty to avoid Pascalian fanaticism, given a finite discretization of welfare levels or finite number of lexical thresholds. Let the welfare levels be $t_{0} > t_{1} > \dots > t_{n}$ , and define:
  $f_{k} (x) = - \sum i I (u_{i} \leq t_{k})$
  i.e. $f_{k} (x)$ is the number of individuals with welfare level at most $t_{k}$ , where $u_{i}$ is the welfare of individual $i$ , and $I (u_{i} \leq t_{k})$ is 1 if $u_{i} \leq t_{k}$ and 0 otherwise. Alternatively, we could use $I (t_{k + 1} < u_{i} \leq t_{k})$ .
  In situations without uncertainty, this requires us to first choose among options that minimize the number of individuals with welfare at most $t_{n}$ , because $f_{n}$ takes priority over $f_{k}$ , for all $k < n$ , and then, having done that, choose among those that minimize the number of individuals with welfare at most $t_{n - 1}$ , since $f_{n - 1}$ takes priority over $f_{k}$ , for all $k < n - 1$ , and then choose among those that minimize the number of individuals with welfare at most $t_{n - 2}$ , and so on, until $t_{0}$ .
  This particular social welfare function assigns negative value to new existences when there are no impacts on others, which leximin/maximin need not do in general, although it typically does in practice, anyway.
  This approach does not require welfare to be cardinal, i.e. adding and dividing welfare levels need not be defined. It also dodges representation theorems like this one (or the stronger one in Lemma 1 here, see the discussion here), because continuity is not satisfied (and welfare need not have any topological structure at all, let alone be real-valued). Yet, it still satisfies anonymity/symmetry/impartiality, monotonicity/Pareto, and separability/independence. Separability means that whether one outcome is better or worse than another does not depend on individuals unaffected by the choice between the two.
  What links here?
  - Michael St Jules 🔸's comment on MichaelStJules’s Quick takes by Michael St Jules 🔸 (8 Jul 2020 0:43 UTC; 2 points)
- Michael St Jules 🔸 8 Jul 2020 0:43 UTC
  2 points
  0 ∶ 0
  Parent
  Here’s a way to capture lexical threshold utilitarianism with a separable theory and while avoiding Pascalian fanaticism, with a negative threshold $t_{-} < 0$ and a positive threshold $t_{+}$ > 0:
  $σ (\sum i u_{i}) + \sum i I (u_{i} \geq t_{+}) - \sum i I (u_{i} \leq t_{-})$
  - The first term is just standard utilitarianism, but squashed with a function $σ : R \to R$ into an interval of length at most 1.
  - The second/middle sum is the number of individuals (or experiences or person-moments) with welfare at least $t_{+}$ , which we add to the first term. Any change in number past this threshold dominates the first term.
  - The third/last sum is the number of individuals with welfare at most $t_{-}$ , which we subtract from the rest. Any change in number past this threshold dominates the first term.
  Either of the second or third term can be omitted.
  We could require $t_{-} \leq u_{i} \leq t_{+}$ for all $i$ , although this isn’t necessary.
  More thresholds could be used, as in this comment: we would apply $σ$ to the whole expression above, and then add new terms like the second and/or the third, with thresholds $t_{+ +} > t_{+}$ and $t_{- -} < t_{-}$ , and repeat as necessary.