A Model of Rational Policy: When Is a Goal “Good”?

Why think about goalsets?

So­cieties need many dis­tinct sys­tems: a trans­port sys­tem, a school sys­tem, etc. Th­ese sys­tems can­not be jus­tified if they are amoral, so they must serve moral­ity. Each sys­tem can­not, how­ever, achieve the best moral out­come on its own: If your trans­port sys­tem doesn’t cure can­cer, it prob­a­bly isn’t do­ing ev­ery­thing you want; if it does cure can­cer, it isn’t just a “trans­port” sys­tem. So each sys­tem must have a bounded do­main and, ideally, still be morally op­ti­mal. To solve this prob­lem, sys­tems can be as­signed goals, which are state­ments that we want the sys­tem to satisfy. Goals weakly con­strain a sys­tem’s do­main be­cause if a sys­tem satis­fies its goals, there is no more for it to do.

The prob­lem, then, is de­cid­ing which goals to pur­sue. Each goal should not be eval­u­ated on its own. In­stead, we should eval­u­ate your en­tire set of goals, your goalset. There are two rea­sons for this. (1) Any prob­lem with a goal will man­i­fest in a goalset that con­tains it, so we can rephrase any crit­i­cisms of a goal in terms of goalsets. And (2) there are times when we have to look at the goalset as a whole be­fore iden­ti­fy­ing an is­sue. For ex­am­ple, sup­pose two iden­ti­cal twins, Robin­son and Cru­soe, are stranded on an is­land. Sup­pose the best out­come oc­curs when ei­ther Robin­son hunts for food and Cru­soe builds a shelter or vice versa. So “Cru­soe builds a shelter” is a goal that fits in one ideal out­come, and “Robin­son builds a shelter” fits in an­other. How­ever, if their goalset con­tains both those state­ments, the pair will starve (albeit in the com­fort of an ex­cel­lent shelter).

Clearly, your goalset should be re­lated in some way to your moral­ity. But how, pre­cisely? Can goalsets even be a com­po­nent of mak­ing ideal sys­tems? Does it make sense to talk about “ideal” sys­tems? What does it mean for a sys­tem to “satisfy” a goal? What are the “con­se­quences” of a goalset? What is “equiv­alence” be­tween goalsets? We’ll define all of these terms from the same four ob­jects, which al­lows us to (1) see how these con­cepts re­late to each other, (2) gen­er­ate the­o­rems about policy ar­gu­ments, and (3) provide stronger foun­da­tions for in­for­mal rea­son­ing.

Some of the later of conclusions

  • Vague goals have no place within a goalset. How­ever, we can satisfy con­crete goals that are re­lated to the vague goals.

  • If goals are “nar­row” when they are not de­com­pos­able into smaller goals, it is in­valid to crit­i­cize a goal for be­ing nar­row.

  • It’s in­valid to crit­i­cize a goalset for not in­clud­ing its con­se­quences.

  • Eval­u­at­ing the goals that a sys­tem satis­fies is not suffi­cient to de­ter­mine whether the sys­tem is ideal or not.

  • Crit­i­ciz­ing a sys­tem be­cause it doesn’t satisfy ad­di­tional goals be­yond what it was as­signed is in­valid, un­less the sys­tem is not fully de­ter­mined by its as­signed goals.

  • Ideal sys­tems don’t nec­es­sar­ily do the great­est good at the mar­gin. In other words, they don’t nec­es­sar­ily do con­strained max­i­miza­tion of an ob­jec­tive func­tion.

  • We can’t fully eval­u­ate sys­tems when we eval­u­ate them in­di­vi­d­u­ally.

  • Avoid se­lect­ing ar­bi­trary num­bers for your sys­tems: Try to se­lect a sys­tem whose goals im­ply that it will de­ter­mine your op­ti­mal num­bers for you.

The base concepts

Timelines and forecasts

A timeline can be rep­re­sented as the set con­tain­ing all state­ments that are em­piri­cally true at all times and places within that timeline. So “[Unique iden­ti­fier for you] is read­ing [unique iden­ti­fier for this sen­tence]” would not be in the rep­re­sen­ta­tion of the timeline that we ex­ist in, be­cause once you be­gin the next sen­tence, it’s no longer true. In­stead, the state­ment would have to say some­thing like “[Unique iden­ti­fier for you] read [unique iden­ti­fier for that sen­tence] dur­ing [the time pe­riod at which you read that sen­tence]”. For brevity’s sake, the re­main­ing ex­am­ple state­ments won’t con­tain unique iden­ti­fiers or time pe­ri­ods, as the pre­cise mean­ing of the state­ments should be clear from the con­text.

The trol­ley prob­lem makes you choose be­tween two timelines (). Our timeline rep­re­sen­ta­tion al­lows us to neatly state whether some­thing is true within a given timeline or not: “You pull the lever” , and “You pull the lever” . Timelines con­tain state­ments that are com­bined as well as state­ments that are at­om­ized. For ex­am­ple, since “You pull the lever”, “The five live”, and “The one dies” are all el­e­ments of , you can string these into a larger state­ment that is also in : “You pull the lever, and the five live, and the one dies”. There­fore, each timeline con­tains a very large state­ment that uniquely iden­ti­fies it within any finite set of timelines (i.e. any finite sub­set of ). Timelines won’t be our unit of anal­y­sis be­cause the state­ments they con­tain have no sub­jec­tive em­piri­cal un­cer­tainty.

This un­cer­tainty can be in­cor­po­rated by us­ing fore­casts (), each of which con­tains all state­ments that are ei­ther em­piri­cally true at all times and places or false at all times and places, ex­cept each state­ment is amended with an as­so­ci­ated cre­dence. Though there is no un­cer­tainty in the trol­ley prob­lem, we could still rep­re­sent it as a choice be­tween two fore­casts: guaran­tees (the pull-the-lever timeline) and guaran­tees (the no-ac­tion timeline). So, would con­tain the state­ment “The five live with a cre­dence of 1”. Since each timeline con­tains a state­ment that uniquely iden­ti­fies it within a finite set of timelines, each fore­cast can roughly be thought of as a prob­a­bil­ity dis­tri­bu­tion of timelines. So, the trol­ley prob­lem re­veals that you ei­ther morally pre­fer (de­noted as ), pre­fer (de­noted as ), or you be­lieve that both fore­casts are morally equiv­a­lent (de­noted as ).

Mo­ral rules

Rather than eval­u­ate ev­ery moral dilemma with your in­tu­itions, you could think of a moral rule that would give a moral rank­ing over fore­casts. A moral rule does this by se­lect­ing the best fore­cast (or fore­casts) from any given set of fore­casts. Let’s say Jane be­lieves that pul­ling the lever (fore­cast ) is morally bet­ter. Her first guess at a moral rule is “More peo­ple are bet­ter than fewer”. This rule se­lects the pull-the-lever fore­cast as best, which fits with her in­tu­ition; so far, so good. She then imag­ines an­other dilemma: Choose a fore­cast with 1,000,000 happy peo­ple in per­pe­tu­ity, or one with 1,000,001 mis­er­able peo­ple in per­pe­tu­ity. Her rule se­lects the mis­er­able fore­cast. This does not fit with her in­tu­ition, so she dis­cards her hy­poth­e­sized moral rule and thinks of a new rule to test.

You might won­der, if moral in­tu­itions ar­bi­trate which moral rules are “cor­rect”, why not just use moral in­tu­ition to eval­u­ate fore­casts? What’s the point of a moral rule? Un­less we’re born with moral in­tu­itions that are perfectly ac­cu­rate, we’re go­ing to have some pri­ors that are wrong.[1] Some­times our moral pri­ors are out­right con­tra­dic­tory. For ex­am­ple, “Violence is never moral” and “Cul­ture de­ter­mines moral­ity” con­tra­dict be­cause of the pos­si­bil­ity of vi­o­lent cul­tures. Some­times our stated jus­tifi­ca­tions for our moral in­tu­itions are con­tra­dic­tory, but those in­tu­itions could be rep­re­sented with a non-con­tra­dic­tory moral rule. Often, our in­tu­itive rank­ing will be in­com­plete, which makes moral de­ci­sions difficult even if we could perfectly pre­dict the con­se­quences. Find­ing a moral rule gives us a way to cor­rect poor moral in­tu­itions, ex­plain our good moral in­tu­itions with more clar­ity, and guide us where our in­tu­itions don’t ex­ist.

Feedback loop for moral intuitions and moral rules

Figure 1: Mo­ral in­tu­itions (the left set) help us guess a moral rule (). Your moral rule im­plies a rank­ing of fore­casts (the right set), which you com­pare to your moral in­tu­itions. Whether you dis­re­gard your rule or up­date your in­tu­ition is de­pen­dent on how well the rule fits your in­tu­itions, and how adamant you are about your con­flict­ing pri­ors.

Only some moral the­o­ries have a moral rule. Some con­se­quen­tial­ist moral rules, such as to­tal util­i­tar­i­anism, as­sign each fore­cast a num­ber: The higher the num­ber, the bet­ter the moral stand­ing of the fore­cast. A de­on­tolog­i­cal the­ory has a moral rule if it re­solves all dilem­mas where pro­hi­bi­tions and im­per­a­tives are traded off against each other, ei­ther by pri­ori­tiz­ing them over one an­other (lex­i­co­graphic prefer­ences), weight­ing the im­por­tance of each pro­hi­bi­tion and im­per­a­tive (which effec­tively as­signs each fore­cast a num­ber), or by as­sign­ing any vi­o­la­tion “in­finite bad­ness” (i.e. any fore­cast in which you break any pro­hi­bi­tion or im­per­a­tive is un­ac­cept­able, and any other fore­cast is ac­cept­able). A moral rule can re­late to the ac­tions and thoughts of all peo­ple (which is typ­i­cal of con­se­quen­tial­ism), or only to your own ac­tions and thoughts (which is typ­i­cal of de­on­tol­ogy, e.g. whether your non­vi­o­lence causes vi­o­lence in oth­ers has no effect on your moral rule).

Em­piri­cal models

Some ar­gu­ments un­der­mine a pro­posed moral rule with em­piri­cal facts. Th­ese ar­gu­ments can show that a moral rule’s base con­cepts need to be bet­ter defined or are un­defin­able, such as the ideas of per­sonal iden­tity or causal­ity. They can also show that a moral rule’s real-world ap­pli­ca­tion leads to ridicu­lous moral eval­u­a­tions, such as the paral­y­sis ar­gu­ment. Thus, fully analysing a moral rule re­quires us to have an em­piri­cal model.

Fit­ting the base con­cepts to­gether with plans

Your em­piri­cal model () is used to make a fore­cast that is con­di­tional on you[2] at­tempt­ing a given plan ():

There­fore, it can be used to de­ter­mine the set of cur­rently fea­si­ble fore­casts ():

Fore­casts can be eval­u­ated by your moral rule:

The set of ideal fore­casts () is the sub­set of fea­si­ble fore­casts that are morally preferred to any other fea­si­ble fore­cast:

Note an ideal fore­cast is, with the right plan, pos­si­ble. They are the best pos­si­ble fu­tures.

Goalsets and their properties

If each of your goals is satis­fied by its re­spec­tive sys­tem, you have satis­fied your goalset (). This avoids any is­sues of as­sign­ing causal­ity from a sys­tem to its goals. Goalsets are satis­fied by the fea­si­ble fore­casts that con­tain all of your goals:

The prob­lem, then, is to show that your goals “match” your moral rule.

Ideal goalset: . There ex­ists a fea­si­ble fore­cast that satis­fies your goalset and all such fore­casts are ideal.

Note that an ideal goalset is, by con­struc­tion, pos­si­ble.

The crite­rion for an ideal goalset shows us what we hope to achieve: Satis­fy­ing an ideal goalset will guaran­tee an ideal fore­cast.

Since we ul­ti­mately need to for­mu­late a plan, the pur­pose of cre­at­ing a goalset is to slice up an ideal fore­cast in such a way that each goal is solv­able by a sys­tem, and prov­ably so.

The prob­lem with ideal­ness as a crite­rion is that it can only differ­en­ti­ate be­tween goalsets that have the best pos­si­ble fore­cast and ones that don’t. It’s too re­stric­tive: Re­ject­ing a goalset be­cause it isn’t ideal would be like re­ject­ing a the­o­rem be­cause it doesn’t solve all of math­e­mat­ics. We need a more re­al­is­tic con­di­tion on which to judge policy ar­gu­ments.

Aligned goalset: . There ex­ists an ideal fore­cast that satis­fies your goalset.

For each al­igned goalset, there’s a cor­re­spond­ing su­per­set that is ideal. That is, an al­igned goalset can be turned into an ideal goalset by ap­pend­ing more goals—you don’t need to re­move any. Thus, goals that be­long to an al­igned goalset are ideal, and the sys­tems that satisfy those goals might be as well. You can de­ter­mine whether your goalset is al­igned by ask­ing “Would sys­tems in an ideal fore­cast satisfy this goalset?”

Fea­si­ble goalset: . Some fea­si­ble fore­casts satisfy your goalset.

Fea­si­bil­ity is the weak­est crite­rion by which to judge a goalset. And yet, it’s not always satis­fied. De­mand­ing an in­fea­si­ble goalset is in­valid since it’s an in­dis­crim­i­nate ar­gu­ment: Its users can crit­i­cize any pos­si­ble policy, so it can­not be used to differ­en­ti­ate be­tween our op­tions, which is the very pur­pose of an ar­gu­ment. Thus, demon­strat­ing a goalset is in­fea­si­ble re­moves that goalset from our set of choices. An ex­am­ple of this is Ar­row’s Im­pos­si­bil­ity The­o­rem. Vague goals, such as “Sup­port vet­er­ans”, are also in­fea­si­ble, be­cause only pre­dic­tions that can be em­piri­cally eval­u­ated are con­tained within fore­casts.

Equiv­a­lent goalsets: . Goalsets and are equiv­a­lent when a fore­cast will satisfy if and only if it satis­fies .

An ex­am­ple of equiv­a­lent goals are the two state­ments “Per­son births are max­i­mized” and “Per­son deaths are max­i­mized”. They’re equiv­a­lent be­cause ev­ery born per­son must die and ev­ery dead per­son must have pre­vi­ously been born. Of course, any­one who says they want to max­i­mize hu­man death would prob­a­bly con­cern you quite a bit more, but that’s only be­cause goals con­ven­tion­ally don’t have nega­tive ter­mi­nal value. But that’s not ac­tu­ally nec­es­sary: An ideal goal can be an “un­de­sir­able” side-effect. Be­cause these two goals are equiv­a­lent, say­ing one of the goals is ideal while the other is not is con­tra­dic­tory.

Equiv­a­lent goalsets can show us that crit­i­ciz­ing a goal for “nar­row scope” is in­valid. Sup­pose that a goal is nar­row if it can’t be de­com­posed into an equiv­a­lent set of nar­rower goals. Take any goalset com­posed of non-nar­row goals. Each goal can be re­placed with its de­com­po­si­tion to gen­er­ate an equiv­a­lent goalset. This pro­cess can be re­peated on these de­com­posed goals un­til they’re not de­com­pos­able, i.e. un­til they are nar­row. Thus, any goalset has an equiv­a­lent that con­tains nar­row goals. If crit­i­ciz­ing the nar­row­ness of a goal were valid, this equiv­a­lent goalset is worse than the origi­nal. But equiv­a­lent goalsets are equiv­a­lent—they can’t be worse. There­fore, crit­i­ciz­ing a goal for be­ing nar­row is in­valid. In­tu­itively this makes sense: Sys­tems can satisfy mul­ti­ple goals, so it doesn’t mat­ter whether or not a par­tic­u­lar goal is nar­row.

Con­se­quences: . The set of all con­se­quences of is the set of all state­ments that are true in all fore­casts that satisfy .

Note that this in­cludes state­ments of the form “ or ”. So if “” is true for some goalset-satis­fy­ing fore­casts and “” is true for the re­main­ing fore­casts, “ or ” is a con­se­quence, but nei­ther “” nor “” is a con­se­quence.

Goalsets losslessly com­press their set of con­se­quences: Each of your goalset-satis­fy­ing fore­casts con­tains the con­se­quences of your goalset ( for all ), so amend­ing your goalset with its con­se­quences pro­duces an equiv­a­lent goalset (i.e. for any that is a su­per­set of where is a sub­set of , ). Thus, it’s in­valid to crit­i­cize a goalset for not con­tain­ing some of its con­se­quences be­cause those con­se­quences are, effec­tively, already part of your goalset. (None of this im­plies that ex­plain­ing con­se­quences has no value, just that con­se­quences have no value in your goalset.)

Any num­ber of de­sir­able con­se­quences can fol­low from the satis­fac­tion of a sin­gle goal. There­fore, a sys­tem that has only one goal is not nec­es­sar­ily “do­ing less” than a com­pet­ing sys­tem with many goals.

Fixed-value goalset: . All fore­casts that satisfy your goalset are morally equiv­a­lent.

A fixed-value al­igned goalset must be ideal, and vice versa. So, given that you don’t have ideal goalset, your al­igned goalset can’t have a fixed moral value. For an ex­treme ex­am­ple, sup­pose your goalset is the null set, which is satis­fied by all fore­casts from the very best (thus an empty goalset is al­igned) to the very worst. Since al­igned goalsets can be satis­fied by fore­casts with such large moral differ­ences, we shouldn’t be con­tent by look­ing only at the goalset; we need to see how the goalset is satis­fied. We need to look at the plan.

Goalset types and their relationships to each other

Figure 2: Goalset types and their re­la­tion­ships to each other.

Plans and their properties

Let be the fore­cast that re­sults from at­tempt­ing plan :

Valid plan: . Your plan is valid if its fore­cast satis­fies your goalset.

Valid plans must have a fea­si­ble goalset, be­cause in­fea­si­ble goalsets can’t be satis­fied. But in­valid plans can hap­pen even with fea­si­ble goalsets. An ex­am­ple of this is the story of the co­bra effect, where the ex­act op­po­site of the goal was “achieved”.

Ideal plan: . At­tempt­ing your plan would en­gen­der an ideal fore­cast.

Given a valid plan, an ideal goalset im­plies an ideal plan—but an ideal plan only im­plies an al­igned goalset, not an ideal one. You could, for in­stance, have an ideal plan with­out any spe­cific goals what­so­ever (note that an empty goalset is always al­igned and not ideal).

I think of a plan as be­ing a set of sub­plans (par­tic­u­larly where each sub­plan speci­fies how to im­ple­ment and main­tain a sys­tem). This al­lows us to define the set of fore­casts where you at­tempt a su­per­set of a given plan:

Aligned plan: . At least one of the fore­casts in which you at­tempt a su­per­set of your plan is ideal.

If your plan is al­igned, each of your sub­plans is ideal. You can de­ter­mine whether your plan is al­igned by ask­ing “Would sys­tems in an ideal fore­cast be ex­actly like this?”

Given a valid plan, an al­igned plan im­plies an al­igned goalset—but an al­igned goalset does not im­ply an al­igned plan, be­cause it’s pos­si­ble you get what you asked for (i.e. you have a valid plan) which is what you wanted (i.e. your goalset is al­igned), but you’re also get­ting some­thing you definitely didn’t want (i.e. your plan is not al­igned) be­cause you didn’t say that you didn’t want it. For ex­am­ple, want­ing a cup of tea and get­ting it, but break­ing a vase in the pro­cess.

Relationships between idealness and alignment

Figure 3: As­sum­ing a valid plan, an ideal goalset im­plies an ideal plan, which im­plies an al­igned plan, which im­plies an al­igned goalset, and all of these im­pli­ca­tions are one-way. The small­est el­lipse that en­cap­su­lates a word is the set to which the word refers (e.g. “Ideal plan” refers to the sec­ond-small­est el­lipse) and each el­lipse is a suffi­cient con­di­tion for any el­lipse that en­cap­su­lates it.

As we shouldn’t eval­u­ate goals in­di­vi­d­u­ally, we shouldn’t eval­u­ate sys­tems in­di­vi­d­u­ally. An er­ror re­lated to the Nar­row Goal fal­lacy is de­mand­ing a sin­gle sys­tem han­dles ad­di­tional goals be­yond what it was as­signed. Let’s call it “The Kitchen Sink” fal­lacy. The ear­lier ex­am­ple of this fal­lacy was the trans­port sys­tem that cures can­cer, which is ob­vi­ously ridicu­lous. But peo­ple of­ten miss the same er­ror when sys­tems are closely re­lated. For ex­am­ple, a prison sys­tem doesn’t need to de­ter­mine what the laws of a so­ciety should be, nor does it need to pre­vent the crimes of first-time offen­ders, nor does it need to de­ter­mine the pun­ish­ments for in­mates—these goals can be bet­ter han­dled by other sys­tems. For ex­am­ple, pun­ish­ments are bet­ter speci­fied within the law rather than rely­ing on the ad hoc cru­elty of the prison officers or the other in­mates.

A re­lated but valid crit­i­cism may show that amend­ing the goalset with higher-pri­or­ity goals pro­duces an in­fea­si­ble goalset.

Another ex­am­ple of the fal­lacy is de­mand­ing that ev­ery sys­tem fixes in­equal­ity. Some peo­ple re­ject sta­ble hos­pi­tal-in­tern match­ing sys­tems be­cause “The best in­terns would go to the best hos­pi­tals, which are un­af­ford­able to the poor­est pa­tients.” But equal health out­comes are not ob­vi­ously in­com­pat­i­ble with a sta­ble hos­pi­tal-in­tern match­ing sys­tem: You might be able to get sta­bil­ity and equal­ity by, for in­stance, re­dis­tribut­ing in­come via your tax­a­tion sys­tem. Whether that would work or not is beside the point: To show in­equal­ity is a con­se­quence, you must show that no pos­si­ble sys­tem can achieve equal­ity in the pres­ence of sta­ble hos­pi­tal-in­tern match­ing sys­tem. You should also show that equal­ity, how­ever you define it, is an ideal goal. The cor­rect re­sponse to the fal­lacy is usu­ally “Another sys­tem can han­dle that prob­lem”.

The Kitchen Sink and Nar­row Scope er­rors are not to be con­fused with the valid crit­i­cism that a sys­tem is not fully de­ter­mined by its goals, i.e. the sys­tem is ar­bi­trary. (One caveat is that the choice be­tween two ideal fore­casts is nec­es­sar­ily ar­bi­trary, which is one rea­son why it’s prob­a­bly good if you have a moral rule that is highly dis­crim­i­na­tory, i.e. it doesn’t rank many pairs of fore­casts as morally equiv­a­lent). For ex­am­ple, we might want a welfare pay­ment to satisfy the goals where (1) it’s always in the in­ter­ests of el­i­gible cus­tomers to ap­ply, and (2) more pri­vate in­come is always bet­ter. Th­ese goals can be rep­re­sented math­e­mat­i­cally, which al­lows us to show that cer­tain pay­ment types, like a max rate of pay­ment with a hard in­come cut-off, fail the goals. How­ever, these goals are not suffi­cient to de­ter­mine ex­actly what the pay­ment func­tion should be and, in this case, some of the goalset-satis­fy­ing func­tions are clearly not ideal (e.g. “Pay ev­ery­one a mil­lion dol­lars each week”): Lots of differ­ent pay­ment func­tions satisfy both goals, so choos­ing the win­ner from that set of func­tions can’t be based on those goals. The goals do not need to be changed; they’re al­igned, so re­mov­ing them can’t be benefi­cial, but we need more goals be­fore the best pay­ment func­tion can be de­rived.

Be­sides al­ign­ment, there are other prop­er­ties of plans that we’re in­ter­ested in. You don’t want your plan to be im­proved if you re­move sub­plans from it: If you can get a bet­ter fore­cast by sim­ply re­mov­ing some sub­set of your plan, you should do so (at least tem­porar­ily). To state this prop­erty, we should first define the set of fore­casts where you at­tempt any sub­set of a given plan:

Sub­set-dom­i­nant plan: . Your plan’s fore­cast is at least as good as all the fore­casts where you at­tempt a sub­set of your plan.

In other words, each sub­set of your plan has a non-nega­tive marginal con­tri­bu­tion to your moral rank­ing of the plan’s fore­cast. Sub­set dom­i­nance is a nec­es­sary con­di­tion for an ideal plan. And it leads us to a sym­met­ri­cal prop­erty: plans that can’t be im­proved by in­clud­ing more sub­plans.

Su­per­set-dom­i­nant plan: . Your plan’s fore­cast is at least as good as all the fore­casts where you at­tempt a su­per­set of your plan.

Su­per­set dom­i­nance is worth men­tion­ing for a few rea­sons: (1) it can be used to define ideal­ness, i.e. a plan is ideal if and only if it is al­igned and su­per­set dom­i­nant, and (2) it can be used to define lo­cal op­ti­mal­ity, i.e. a plan is lo­cally op­ti­mal if and only if it is su­per­set and sub­set dom­i­nant, which is use­ful to know about so that you don’t con­fuse it with ideal­ness.

Lo­cally op­ti­mal: . Your plan’s fore­cast is at least as good as all the fore­casts where you at­tempt a sub­set or a su­per­set of your plan.

Peo­ple of­ten con­fuse ideal plans with lo­cally op­ti­mal ones. Let’s sup­pose that longer prison sen­tences cause in­mates to be more likely to re­offend. Should we de­crease sen­tence lengths? Not nec­es­sar­ily. In an ideal fore­cast, pris­ons prob­a­bly re­ha­bil­i­tate their in­mates, in which case, maybe ideal fore­casts have longer prison sen­tences. Ideal sys­tems don’t nec­es­sar­ily do the great­est good at the mar­gin un­der all cir­cum­stances.

Then what is goal and plan al­ign­ment for? Shouldn’t you just gen­er­ate some plans and choose one that re­sults in a fore­cast se­lected by your moral rule? Yes, but if an ideal sys­tem is not the best at the mar­gin, you can ex­pand the set of things you want to re­form un­til the set of sys­tems you’re con­sid­er­ing only does the great­est good at the mar­gin when they are ideal. This way, you do not get stuck with lo­cally op­ti­mal, non-ideal plans. Usu­ally this isn’t a prob­lem since ideal sys­tems tend to pro­duce fore­casts that your moral rule ranks higher. For ex­am­ple, sup­pose you’re im­ple­ment­ing a re­tire­ment sav­ings sys­tem. If you think “max­i­miz­ing re­turns” is an ideal goal, then some sys­tem that max­i­mizes re­turns will prob­a­bly do the most good. But ideal policy op­tions are not always available. There are prob­lems that, so far, lack ideal solu­tions but they have sys­tems that seem to do well enough for the time be­ing. So, I think it makes sense to di­vide your plan into a mis­al­igned (but satis­fac­tory) par­ti­tion and an al­igned par­ti­tion that you amend over time.

The prop­er­ties of plans can be thought of in terms of a moral land­scape (termed by Sam Har­ris, un­der a differ­ent in­ter­pre­ta­tion). Imag­ine a moun­tain range with many valleys and peaks. The higher the point on the ter­ri­tory, the bet­ter that place is. All peaks are lo­cally op­ti­mal. Some peaks are ideal be­cause they are sec­ond to none. Su­per­set dom­i­nance means you can’t get higher by any com­bi­na­tion of ex­clu­sively for­ward steps (i.e. amend­ing sub­plans). Sub­set dom­i­nance means you can’t get higher by any com­bi­na­tion of ex­clu­sively back­ward steps (i.e. re­mov­ing sub­plans). Plan al­ign­ment means there’s some com­bi­na­tion of ex­clu­sively for­ward steps that lead you to the high­est peak.

Estab­lish­able: and . Your plan can be­come ideal by adding sub­plans with­out re­mov­ing any, and can­not be im­proved by re­mov­ing sub­plans with­out adding any.

Plan types and their relationships to each other

Figure 4: Re­la­tion­ships be­tween plan types. (The small­est con­vex shape that en­cap­su­lates a word is the set to which the word refers. E.g. ac­cord­ing to the di­a­gram, “Aligned” must re­fer to the en­tire left cir­cle, so ideal and es­tab­lish­able plans must also be al­igned).

Other applications

The law of round num­bers: 100 per­cent of the time, they’re wrong

Misal­igned goals of­ten con­tain nu­meric tar­gets, for ex­am­ple aiming for rice to be be­tween $3 and $4 per kilo­gram. One way to avoid pick­ing num­bers that are ar­bi­trary (i.e. num­bers that are not well jus­tified) is to not se­lect num­bers at all. In­stead, se­lect a sys­tem whose goal im­plies that it will de­ter­mine your op­ti­mal num­bers for you. For ex­am­ple, rather than se­lect a price for rice out of thin air, you could let a Pareto-effi­cient mar­ket de­ter­mine prices (if you con­sider the goal of Pareto effi­ciency to be al­igned). Note that mar­kets have differ­ent prices for the ex­act same good at the same time (due to things like trans­porta­tion costs), so hav­ing a goal for a uni­ver­sal rice price is the wrong ap­proach from the out­set. If a sin­gle num­ber is the right ap­proach, ar­bi­trary nu­meric goals are not al­igned when the goal’s speci­fied range does not con­tain the ideal value. And even if you pick the right num­ber upon the sys­tem’s im­ple­men­ta­tion, the ideal value might move out of the range at a later time. Ar­bi­trary nu­meric goals al­most always seem to end in a zero or a five. Peo­ple’s skep­ti­cism ap­par­ently dis­en­gages at the sight of a round num­ber. For an ex­am­ple, a call for a fixed 15 per­cent fee on all stu­dent loans re­sulted in zero challenges in the com­ment sec­tion. If they had pro­posed 14.97 per­cent in­stead, I imag­ine many of the com­ments would be ask­ing where they got the spe­cific num­ber from. Of course, round num­bers aren’t wrong 100 per­cent of the time. But when round num­bers for sys­tems are pro­posed, you’ll tend to find that it’s not clear why in­creas­ing or de­creas­ing those num­bers by a small amount would lead to a worse sys­tem. This means you have no rea­son to be­lieve these num­bers are ideal.

Goalsets are personal

A per­son’s goalset is for their moral rule, not for yours, not for mine, and not for some hy­po­thet­i­cal rep­re­sen­ta­tive of hu­man­ity. Each per­son has a differ­ent idea of what is best for so­ciety. So ar­gu­ing that some­one’s pro­posed sys­tem is not al­igned with your moral rule won’t change that per­son’s mind (though it could con­vince oth­ers who share your moral rule). Your moral prefer­ences can af­fect some­one else’s policy choices, but not be­yond what is already in­cor­po­rated into their moral rule. The valid mod­ifi­ca­tion of this ar­gu­ment is to try to change some­one’s moral rule by show­ing them im­pli­ca­tions of their rule that (you ex­pect) they will find trou­bling.

Iden­ti­fy­ing and re­fut­ing un­sound fea­si­bil­ity arguments

Dur­ing my gov­ern­ment work in the Farm House­hold Allowance team, I demon­strated that our welfare pay­ment could avoid the prob­lems of the old and new pay­ment func­tions, which the rest of the team thought was in­fea­si­ble. The old pay­ment func­tion failed to en­sure it was always in the in­ter­ests of el­i­gible cus­tomers to ap­ply; the new pay­ment func­tion, a max pay­ment rate with a hard in­come cut-off, failed to en­sure that cus­tomers would always be bet­ter off if they earned more pri­vate in­come. I rep­re­sented these goals math­e­mat­i­cally and showed both goals were satis­fied by my pro­posed sys­tem. Some­one out­side the team said that “Maybe we also want to pay the max rate of pay­ment for farm­ers with higher in­come, so that’s why we can’t satisfy both goals”, which is an in­fea­si­bil­ity ar­gu­ment. Rather than go down the rab­bit hole of moral ar­gu­ment, I sim­ply showed there was a func­tion that satis­fied both goals while still pay­ing the max rate to ev­ery­one who cur­rently re­ceived it.

You can, in a round­about way, satisfy vague goals

For pris­ons, some peo­ple think that we should “In­crease in­mate wellbe­ing”. Okay, to what level? We prob­a­bly shouldn’t max­i­mize it, be­cause it’s pos­si­ble that do­ing so comes at the ex­pense of to­tal so­cietal wellbe­ing. So how to we figure out the op­ti­mal amount of in­mate wellbe­ing? What prox­ies do we have to mea­sure it? The prob­lem is that in­mate wellbe­ing is hard to “goal­ify”. But this doesn’t mean we can’t have higher in­mate wellbe­ing. A prison policy that max­i­mizes the to­tal so­cietal con­tri­bu­tion has bet­ter in­mate wellbe­ing as a “con­se­quence” for sev­eral rea­sons: (1) So­cietal con­tri­bu­tion in­cludes the crimes that oc­cur within the prison, so pris­ons want to pre­vent their in­mates from com­mit­ting crimes against each other, and (2) so­cietal con­tri­bu­tion is prob­a­bly max­i­mized when former in­mates in­te­grate into so­ciety, so pris­ons want to their in­mates to learn new skills and treat any psy­cholog­i­cal is­sues they have. When you can’t prop­erly ex­press a goal, se­lect an­other goal that should have the con­se­quences you want.

...

Origi­nally posted Au­gust 21,2020 4:51 PM AEST.

Edited Oc­to­ber 11, 2020 3:11 AM AEDT


  1. Since our pri­ors can be wrong, rather than say we pre­fer over , we should be as­sign­ing cre­dences to the three pos­si­bil­ities: , , or . This rep­re­sen­ta­tion is equiv­a­lent to the Par­li­a­men­tary Model if we gave a cre­dence to ev­ery pos­si­ble moral rule. Alter­na­tively, you might want to have a dis­tri­bu­tion over car­di­nal differ­ences in your moral val­u­a­tions (e.g. “I give a 3.0 per­cent cre­dence that is 1.4 times as good as ”). ↩︎

  2. As a side note, you could mod­ify this set of tools to talk about goal al­ign­ment be­tween mul­ti­ple pre-ex­ist­ing agents. Start with a set of agents, , and have an em­piri­cal func­tion that takes all those agents’ plans as ex­oge­nous, i.e. , and then go from there. ↩︎

No comments.