Shapley values: Better than counterfactuals

[Epistemic sta­tus: Pretty con­fi­dent. But also, en­thu­si­asm on the verge of par­ti­san­ship]

One in­tu­itive func­tion which as­signs im­pact to agents is the coun­ter­fac­tual, which has the form:

Coun­ter­fac­tu­alIm­pact(Agent) = Value(World) - Value(World/​Agent)

which reads “The im­pact of an agent is the differ­ence be­tween the value of the world with the agent and the value of the world with­out the agent”.

It has been dis­cussed in the effec­tive al­tru­ism com­mu­nity that this func­tion leads to pit­falls, para­doxes, or to un­in­tu­itive re­sults when con­sid­er­ing sce­nar­ios with mul­ti­ple stake­hold­ers. See:

In this post I’ll pre­sent some new and old ex­am­ples in which the coun­ter­fac­tual func­tion seems to fail, and how, in each of them, I think that a less known func­tion does bet­ter: the Shap­ley value, a con­cept from co­op­er­a­tive game the­ory which has also been brought up be­fore in such dis­cus­sions. In the first three ex­am­ples, I’ll just pre­sent what the Shap­ley value out­puts, and halfway through this post, I’ll use these ex­am­ples to ar­rive at a defi­ni­tion.

I think that one of the main hin­drances in the adop­tion of Shap­ley val­ues is the difficulty in its calcu­la­tion. To solve this, I have writ­ten a Shap­ley value calcu­la­tor and made it available on­line: shap­ley­value.com. I en­courage you to play around with it.

Ex­am­ple 1 & re­cap: Some­times, the coun­ter­fac­tual im­pact ex­ceeds the to­tal value.

Sup­pose there are three pos­si­ble out­comes: P has cost $2000 and gives 15 util­ity to the world Q has cost $1000 and gives 10 util­ity to the world R has cost $1000 and gives 10 util­ity to the world

Sup­pose Alice and Bob each have $1000 to donate. Con­sider two sce­nar­ios:

Sce­nario 1: Both Alice and Bob give $1000 to P. The world gets 15 more util­ity. Both Alice and Bob are coun­ter­fac­tu­ally re­spon­si­ble for giv­ing 15 util­ity to the world.

Sce­nario 2: Alice gives $1000 to Q and Bob gives $1000 to R. The world gets 20 more util­ity. Both Alice and Bob are coun­ter­fac­tu­ally re­spon­si­ble for giv­ing 10 util­ity to the world.

From the world’s per­spec­tive, sce­nario 2 is bet­ter. How­ever, from Alice and Bob’s in­di­vi­d­ual per­spec­tive (if they are max­i­miz­ing their own coun­ter­fac­tual im­pact), sce­nario 1 is bet­ter. This seems wrong, we’d want to some­how co­or­di­nate so that we achieve sce­nario 2 in­stead of sce­nario 1.
Source
At­tri­bu­tion: rohinmshah

In Sce­nario 1:
Coun­ter­fac­tual im­pact of Alice: 15 util­ity.
Coun­ter­fac­tual im­pact of Bob: 15 util­ity.
Sum of the coun­ter­fac­tual im­pacts: 30 util­ity. To­tal im­pact: 15 util­ity.

The Shap­ley value of Alice would be: 7.5 util­ity.
The Shap­ley value of Bob would be: 7.5 util­ity.
The sum of the Shap­ley val­ues always adds up to the to­tal im­pact, which is 15 util­ity.

In Sce­nario 2:
Coun­ter­fac­tual im­pact of Alice: 10 util­ity.
Coun­ter­fac­tual im­pact of Bob: 10 util­ity.
Sum of the coun­ter­fac­tual im­pacts: 20 util­ity. To­tal im­pact: 20 util­ity.

The Shap­ley value of Alice would be: 10 util­ity.
The Shap­ley value of Bob would be: 10 util­ity.
The sum of the Shap­ley val­ues always adds up to the to­tal im­pact, which is 10+10 util­ity = 20 util­ity.

In this case, if Alice and Bob were each in­di­vi­d­u­ally op­ti­miz­ing for coun­ter­fac­tual im­pact, they’d end up with a to­tal im­pact of 15. If they were, each of them, in­di­vi­d­u­ally, op­ti­miz­ing for the Shap­ley value, they’d end up with a to­tal im­pact of 20, which is higher.

It would seem that we could use a func­tion such as

Coun­ter­fac­tu­alIm­pactMod­ified = Coun­ter­fac­tu­alIm­pact /​ NumberOfStakeholders

to solve this par­tic­u­lar prob­lem. How­ever, as the next ex­am­ple shows, that some­times doesn’t work. The Shap­ley value, on the other hand, has the prop­erty that it always adds up to to­tal value.

Prop­erty 1: The Shap­ley value always adds up to the to­tal value.

Ex­am­ple 2: Some­times, the sum of the coun­ter­fac­tu­als is less than to­tal value. Some­times it’s 0.

Con­sider the in­ven­tion of Calcu­lus, by New­ton and Leib­niz at roughly the same time. If New­ton hadn’t ex­isted, Leib­niz would still have in­vented it, and vice-versa, so the coun­ter­fac­tual im­pact of each of them is 0. Thus, you can’t nor­mal­ize like above.

The Shap­ley value doesn’t have that prob­lem. It has the prop­erty that equal peo­ple have equal im­pact, which to­gether with the re­quire­ment that it adds up to to­tal value is enough to as­sign 12 of the to­tal im­pact to each of New­ton and Leib­niz.

In­ter­est­ingly, GiveWell has Io­dine Global Net­work as a stand­out char­ity, but not as a recom­mended char­ity, be­cause of con­sid­er­a­tions re­lated to the above. If it were the case that, had IGN not ex­isted, an­other or­ga­ni­za­tion would have taken its place, its coun­ter­fac­tual value would be 0, but its Shap­ley value would be 12 (of the im­pact of iodiz­ing salt in de­vel­op­ing coun­tries).

Prop­erty 2: The Shap­ley as­signs equal value to equiv­a­lent agents.

Ex­am­ple 3: Order in­differ­ence.

Con­sider Sce­nario 1 from Ex­am­ple 1 again.

P has cost $2000 and gives 15 util­ity to the world.

Sup­pose Alice and Bob each have $1000 to donate. Both Alice and Bob give $1000 to P. The world gets 15 more util­ity. Both Alice and Bob are coun­ter­fac­tu­ally re­spon­si­ble for giv­ing 15 util­ity to the world.

Alice is now a pure coun­ter­fac­tual-im­pact max­i­mizer, but some­thing has gone wrong. She now views Bob ad­ver­sar­i­ally. She thinks he’s a sucker, and she waits un­til Bob has donated to make her own dona­tion. There are no wor­lds in which he doesn’t donate be­fore her, and Alice as­signs all 15 util­ity to her­self, and 0 to Bob. Note that she isn’t ex­actly calcu­lat­ing the coun­ter­fac­tual im­pact, but some­thing slightly differ­ent.

The Shap­ley value doesn’t con­sider any agent to be a sucker, doesn’t con­sider any vari­ables to be in the back­ground, and doesn’t care whether peo­ple try to donate strate­gi­cally be­fore or af­ter some­one else. Here is a per­haps more fa­mil­iar ex­am­ple:

Sce­nario 1:
Sup­pose that the In­dian gov­ern­ment cre­ates some big and ex­pen­sive in­fras­truc­ture to vac­ci­nate peo­ple, but peo­ple don’t use it. Sup­pose an NGO then comes in, and sends re­minders to peo­ple to vac­ci­nate their peo­ple, and some end up go­ing.

Sce­nario 2:
Sup­pose that an NGO could be send­ing re­minders to peo­ple to vac­ci­nate their chil­dren, but it doesn’t, be­cause the vac­ci­na­tion in­fras­truc­ture is nonex­is­tent, so there would be no point. Then, the gov­ern­ment steps in, and cre­ates the needed in­fras­truc­ture, and vac­ci­na­tion re­minders are sent.

Again, it’s tempt­ing to say that in the first sce­nario, the NGO gets all the im­pact, and in the sec­ond sce­nario the gov­ern­ment gets all the im­pact, per­haps be­cause we take ei­ther the NGO or the In­dian gov­ern­ment to be in the back­ground. To re­peat, the Shap­ley value doesn’t differ­en­ti­ate be­tween the two sce­nar­ios, and doesn’t leave vari­ables in the back­ground. For how this works nu­mer­i­cally, see the ex­am­ples be­low.

Prop­erty 3: The Shap­ley value doesn’t care about who comes first.

The Shap­ley value is uniquely de­ter­mined by sim­ple prop­er­ties.

Th­ese prop­er­ties:

  • Prop­erty 1: Sum of the val­ues adds up to the to­tal value (Effi­ciency)

  • Prop­erty 2: Equal agents have equal value (Sym­me­try)

  • Prop­erty 3: Order in­differ­ence: it doesn’t mat­ter which or­der you go in (Lin­ear­ity). Or, in other words, if there are two steps, Value(Step1 + Step2) = Value(Step1) + Value(Step2).

And an ex­tra prop­erty:

  • Prop­erty 4: Null-player (if in ev­ery world, adding a per­son to the world has no im­pact, the per­son has no im­pact). You can ei­ther take this as an ax­iom, or de­rive it from the first three prop­er­ties.

are enough to force the Shap­ley value func­tion to take the form it takes:

At this point, the reader may want to con­sult Wikipe­dia to fa­mil­iarize them­selves with the math­e­mat­i­cal for­mal­ism, or, for a book-length treat­ment, The Shap­ley value: Es­says in honor of Lloyd S. Shap­ley. Ul­ti­mately, a quick way to un­der­stand it is as “the func­tion uniquely de­ter­mined by the prop­er­ties above”.

I sus­pect that or­der in­differ­ence will be the most con­tro­ver­sial op­tion. In­tu­itively, it pre­vents stake­hold­ers from ad­ver­sar­i­ally choos­ing to col­lab­o­rate ear­lier or later in or­der to as­sign them­selves more im­pact.

Note that in the case of only one agent the Shap­ley value re­duces to the coun­ter­fac­tual func­tion, and that the Shap­ley value uses many coun­ter­fac­tual com­par­i­sons in its for­mula. It some­times just re­duces to Coun­ter­fac­tu­alValue/​ Num­berOfS­take­hold­ers (though it some­times doesn’t). Thus, the Shap­ley value might be best un­der­stood as an ex­ten­sion of coun­ter­fac­tu­als, rather than as some­thing com­pletely alien.

Ex­am­ple 4: The Shap­ley value can also deal with leveraging

Or­gani­sa­tions can lev­er­age funds from other ac­tors into a par­tic­u­lar pro­ject. Sup­pose that AMF will spend $1m on a net dis­tri­bu­tion. As a re­sult of AMF’s com­mit­ment, the Gates Foun­da­tion con­tributes $400,000. If AMF had not acted, Gates would have spent the $400,000 on some­thing else. There­fore, the coun­ter­fac­tual im­pact of AMF’s work is:
AMF’s own $1m on bed­nets plus Gates’ $400,000 on bed­nets minus the benefits of what Gates would oth­er­wise have spent their $400,000 on.
If Gates would oth­er­wise have spent the money on some­thing worse than bed­nets, then the lev­er­ag­ing is benefi­cial; if they would oth­er­wise have spent it on some­thing bet­ter than bed­nets, the lev­er­ag­ing re­duces the benefit pro­duced by AMF.
Source: The coun­ter­fac­tual im­pact of agents act­ing in con­cert.

Let’s con­sider the case in which the Gates Foun­da­tion would oth­er­wise have spent their $400,000 on some­thing half as valuable.

Then the coun­ter­fac­tual im­pact of the AMF is 1,000,000+400,000-(400,000)*0.5 = $1,2m.
The coun­ter­fac­tual im­pact of the Gates Foun­da­tion is $400,000.
And the sum of the coun­ter­fac­tual im­pacts is $1,6m, which ex­ceeds to­tal im­pact, which is $1,4m.

The Shap­ley value of the AMF is $1,1m.
The Shap­ley value of the Gates Foun­da­tion is $300,000.

Thus, the Shap­ley value as­signs to the AMF part, but not all, of the im­pact of the Gates Foun­da­tion dona­tion. It takes into ac­count their out­side op­tions when do­ing so: if the Gates Foun­da­tion would have in­vested on some­thing equally valuable, the AMF wouldn’t get any­thing from that.

Ex­am­ple 5: The Shap­ley value can also deal with funging

Sup­pose again that AMF com­mits $1m to a net dis­tri­bu­tion. But if AMF had put noth­ing in, DFID would in­stead have com­mit­ted $500,000 to the net dis­tri­bu­tion. In this case, AMF funges with DFID. AMF’s coun­ter­fac­tual im­pact is there­fore:
AMF’s own $1m on bed­nets minus the $500,000 that DFID would have put in plus the benefits of what DFID in fact spent their $500,000 on.
Source

Sup­pose that the DFID spends their money on some­thing half as valuable.

The coun­ter­fac­tual im­pact of the AMF is $1m - $500,000 + ($500,000)*0.5 = $750,000.
The coun­ter­fac­tual im­pact of DFID is $250,000.
The sum of their coun­ter­fac­tual im­pacts is $1m; lower than the to­tal im­pact, which is $1,250,000.

The Shap­ley value of the AMF is, in this case, $875,000.
The Shap­ley value of the DFID is $375,000.
The AMF is pe­nal­ized: even though it paid $1,000,000, its Shap­ley value is less than that. The DFID’s Shap­ley-im­pact is in­creased, be­cause it could have in­vested its money in some­thing more valuable, if the AMF hadn’t in­ter­vened.

For a per­haps cleaner ex­am­ple, con­sider the case in which the DFID’s coun­ter­fac­tual im­pact is $0: It can’t use the money ex­cept to dis­tribute nets, and the AMF got there first. In that sce­nario:

The coun­ter­fac­tual im­pact of the AMF is $500,000.
The coun­ter­fac­tual im­pact of DFID is $0.
The sum of their coun­ter­fac­tual im­pacts is $500,000. This is lower than the to­tal im­pact, which is $1,000,000.

The Shap­ley value of the AMF is $750,000.
The Shap­ley value of the DFID is $250,000.
The AMF is pe­nal­ized: even though it paid $1,000,000, its Shap­ley value is less than that. The DFID shares some of the im­pact,

Ex­am­ple 6: The coun­ter­fac­tual value doesn’t deal cor­rectly tragedy of the com­mons sce­nar­ios.

Imag­ine a sce­nario in which many peo­ple could repli­cate the GPT-2 model and make it freely available, but the dam­age is already done once the first per­son does it. Imag­ine that 10 peo­ple end up do­ing it, and that the dam­age done is some­thing big, like −10 mil­lion util­ity.

Then the coun­ter­fac­tual dam­age done by each per­son would be 0, be­cause the other nine would have done it re­gard­less.

The Shap­ley value deals with this by as­sign­ing an im­pact of −1 mil­lion util­ity to each per­son.

Ex­am­ple 7: Hiring in EA

Sup­pose that there was a po­si­tion in an EA org, for which there were 7 qual­ified ap­pli­cants which are oth­er­wise “idle”. In ar­bi­trary units, the per­son in that po­si­tion in that or­ga­ni­za­tion can pro­duce an im­pact of 100 util­ity.

The coun­ter­fac­tual im­pact of the or­ga­ni­za­tion is 100.
The coun­ter­fac­tual im­pact of any one ap­pli­cant is 0.

The Shap­ley value of the the or­ga­ni­za­tion is 85.71.
The Shap­ley value of any one ap­pli­cant is 2.38.

As there are more ap­pli­cant, the value skews more in fa­vor of the or­ga­ni­za­tion, and the op­po­site hap­pens with less ap­pli­cants. If there were in­stead only 3 ap­pli­cants, the val­ues would be 75 and 8.33, re­spec­tively. If there were only 2 ap­pli­cants, the Shap­ley value of the or­ga­ni­za­tion is 66.66, and that of the ap­pli­cants is 16.66. With one ap­pli­cant and one or­ga­ni­za­tion, the im­pact is split 5050.

In gen­eral, I sus­pect, but I haven’t proved it, that if there are n oth­er­wise id­dle ap­pli­cants, the Shap­ley value as­signed to the or­ga­ni­za­tion is (n-1)/​n. This sug­gests that a lot of the im­pact of the po­si­tion goes to whomever cre­ated the po­si­tion.

Ex­am­ple 8: The Shap­ley value makes the price of a life rise with the num­ber of stake­hold­ers.

Key:

  • Shap­ley value—coun­ter­fac­tual value /​ coun­ter­fac­tual impact

  • Shap­ley price—coun­ter­fac­tual price. The amount of money needed to be coun­ter­fac­tu­ally re­spon­si­ble for 1 unit of X /​ The amount of money needed for your Shap­ley value to be 1 unit of X.

  • Shap­ley cost-effec­tive­ness—coun­ter­fac­tual cost-effec­tive­ness.

Sup­pose that, in or­der to save a life, 4 agents have to be there: AMF to save a life, GiveWell to re­search them, Peter Singer to pop­u­larize them and a per­son to donate $5000. Then the coun­ter­fac­tual im­pact of the dona­tion would be 1 life, but its Shap­ley value would be 1/​4th. Or, in other words, the Shap­ley cost of sav­ing a life though a dona­tio­nis four times higher than the coun­ter­fac­tual cost.

Why is this? Well, sup­pose that, to save a life, each of the or­ga­ni­za­tions spent $5000. Be­cause all of them are nec­es­sary, the coun­ter­fac­tual cost of a life is $5000 for any of the stake­hold­ers. But if you wanted to save an ad­di­tional life, the amount of money which would be spend must be $5000*4 = $20,000, be­cause some­one would have to go through the four nec­es­sary steps.

If, in­stead of 4 agents there were 100 agents in­volved, then the coun­ter­fac­tual price stays the same, but the Shap­ley price rises to 100x the coun­ter­fac­tual price. In gen­eral, I’ve said “AMF”, or “GiveWell”, as if they each were only one agent, but that isn’t nec­es­sar­ily the case, so the Shap­ley price (of sav­ing a life) might po­ten­tially be even higher.

This is a prob­lem be­cause if agents are re­port­ing their cost-effec­tive­ness in terms of coun­ter­fac­tu­als, and one agent switches to con­sider their cost-effec­tive­ness in terms of Shap­ley val­ues, their cost effec­tive­ness will look worse.

This is also a prob­lem if or­ga­ni­za­tions are re­port­ing their cost-effec­tive­ness in terms of coun­ter­fac­tu­als, but in some ar­eas there are 100 nec­es­sary stake­hold­ers, and in other ar­eas there are four.

Shap­ley value and cost effec­tive­ness.

So we not only care about im­pact, but also about cost-effec­tive­ness. Let us con­tinue with the ex­am­ple in which an NGO sends re­minders to un­dergo vac­ci­na­tion, and let us give us some num­bers.

Lets say that a small In­dian state with 10 mil­lion in­hab­itants spends $60 mil­lion to vac­ci­nate 30% of their pop­u­la­tion. An NGO which would oth­er­wise be do­ing some­thing re­ally in­effec­tive (we’ll come back to this), comes in, and by send­ing re­minders, in­creases the vac­ci­na­tion rate to 35%. They do this very cheaply, for $100,000.

The Shap­ley value of the In­dian gov­ern­ment would be 32.5%, or 3.25 mil­lion peo­ple vac­ci­nated.
The Shap­ley value of the small NGO would be 2.5%, or 0.25 mil­lion peo­ple vac­ci­nated.

Di­vid­ing this by the amount of money spent:
The cost-effec­tive­ness in terms of the Shap­ley value of the In­dian gov­ern­ment would be $60 mil­lion /​ 3.25 mil­lion vac­ci­na­tions = $18.46/​vac­ci­na­tion.
The cost-effec­tive­ness in terms of the Shap­ley value of the NGO would be $100,000 /​ 250,000 vac­ci­na­tions = $0.4/​vac­ci­na­tion.

So even though the NGO’s Shap­ley value is smaller, it’s cost-effec­tive­ness is higher, as one might ex­pect.

If the out­side op­tion of the NGO were some­thing which has a similar im­pact to vac­ci­nat­ing 250,000 peo­ple, we’re back at the fung­ing/​lev­er­ag­ing sce­nario: be­cause the NGO’s out­side op­tion is bet­ter, its Shap­ley value rises.

Cost effec­tive­ness in terms of Shap­ley value changes when con­sid­er­ing differ­ent group­ings of agents.

Con­tin­u­ing with the same ex­am­ple, con­sider that, in­stead of the ab­stract “In­dian gov­ern­ment” as a ho­mo­ge­neous whole, there are differ­ent sub­agents which are all nec­es­sary to vac­ci­nate peo­ple. Con­sider: The Cen­tral In­dian Govern­ment, the Ministry of Fi­nance, the Ministry of Health and Fam­ily Welfare, and within any one par­tic­u­lar state: the State’s Coun­cil of Ministers, the Fi­nance Depart­ment, the Depart­ment of Med­i­cal Health and Fam­ily Welfare, etc. And within each of them there are sub-agen­cies, and sub-sub­agen­cies.

In the end, sup­pose that there are 10 or­ga­ni­za­tions which are needed for the vac­cine to be de­liv­ered, for a nurse to be there, for a hos­pi­tal or a similar build­ing to be available, and for there to be money to pay for all of it. For sim­plic­ity, sup­pose that the bud­get of each of those or­ga­ni­za­tions is the same: $60 mil­lion /​ 10 = $6 mil­lion. Then the Shap­ley-cost effec­tive­ness is differ­ent:

The Shap­ley value of each gov­ern­men­tal or­ga­ni­za­tion would be 110 * (30 mil­lion + 1011 * 0.5 mil­lion) = 345,454 peo­ple vac­ci­nated.
The Shap­ley value of the NGO would be 111 * 500,000 = 45,454 peo­ple vac­ci­nated.

The cost effec­tive­ness of each gov­ern­men­tal or­ga­ni­za­tion would be ($6 mil­lion)/​(345,454 vac­ci­na­tions) = $17 /​ vac­ci­na­tion.
The cost effec­tive­ness of the NGO would be $100,000 /​ 45,454 vac­ci­na­tions = $2.2 /​ vac­ci­na­tion.

That’s in­ter­est­ing. Th­ese con­crete num­bers are all made up, but they’re in­spired by re­al­ity and “plau­si­ble”, and I was ex­pect­ing the re­sult to be that the NGO would be less cost-effec­tive than a gov­ern­ment agency. It’s cu­ri­ous to see that, in this con­crete ex­am­ple, the NGO seems to be ro­bustly more cost-effi­cient than the gov­ern­ment un­der differ­ent group­ings. I sus­pect that some­thing similar is go­ing on with 80,000h.

Bet­ter op­ti­mize Shap­ley.

If each agent in­di­vi­d­u­ally max­i­mizes their coun­ter­fac­tual im­pact per dol­lar, we get sub­op­ti­mal re­sults, as we have seen above. In par­tic­u­lar, con­sider a toy world in which twenty peo­ple can ei­ther:

  • Each be an in­dis­pens­able part of a pro­ject which has a value of 100 util­ity, for a to­tal im­pact of 100 utility

  • Each can by them­selves un­der­take a pro­ject which has 10 util­ity, for a to­tal im­pact of 200 util­ity.

Then if each per­son was op­ti­miz­ing for the coun­ter­fac­tual im­pact, they would all choose the first op­tion, for a lower to­tal im­pact. If they were op­ti­miz­ing for their Shap­ley value, they’d choose the sec­ond op­tion.

Can we make a more gen­eral state­ment? Yes. Agents in­di­vi­d­u­ally op­ti­miz­ing for cost-effec­tive­ness in terms of Shap­ley value globally op­ti­mize for to­tal cost-effec­tive­ness.

In­for­mal proof: Con­sider the case in which agents have con­stant bud­gets and can di­vide them be­tween differ­ent pro­jects as they like. Then, con­sider the case in which each $1 is an agent: pro­jects with higher Shap­ley value per dol­lar get funded first, then those with less im­pact per dol­lar, etc. To­tal cost-effec­tive­ness is max­i­mized. Be­cause of or­der in­differ­ence, both cases pro­duce the same dis­tri­bu­tion of re­sources. Thus, agents in­di­vi­d­u­ally op­ti­miz­ing for cost effec­tive­ness in terms of Shap­ley-value globally op­ti­mize for to­tal cost-effec­tive­ness.

Note: Think­ing in terms of marginal cost-effec­tive­ness doesn’t change this con­clu­sion. Think­ing in terms of time/​units other than money prob­a­bly doesn’t change the con­clu­sion.

Am I bean count­ing?

I don’t have a good an­swer to that ques­tion.

Conclusion

The coun­ter­fac­tual im­pact func­tion is well defined, but it fails to meet my ex­pec­ta­tions of what an im­pact func­tion ought to do when con­sid­er­ing sce­nar­ios with mul­ti­ple stake­hold­ers.

On the other hand, the Shap­ley value func­tion flows from some very gen­eral and sim­ple prop­er­ties, and can deal with the ex­am­ples in which the coun­ter­fac­tual func­tion fails. Thus, in­stead of op­ti­miz­ing for coun­ter­fac­tual im­pact, it seems to me that op­ti­miz­ing for Shap­ley value is less wrong.

Fi­nally, be­cause the Shap­ley value is not pretty to calcu­late by hand, here is a calcu­la­tor.

Ques­tion: Is there a sce­nario in which the Shap­ley value as­signs im­pacts which are clearly non­sen­si­cal, but with which the coun­ter­fac­tual value, or a third func­tion, deals cor­rectly?


Ad­den­dum: The Shap­ley value is not eas­ily com­putable.

For large val­ues the Shap­ley value will not be com­pu­ta­tion­ally tractable (but ap­prox­i­ma­tions might be pretty good), and work on the topic has been done in the area of in­ter­pret­ing ma­chine learn­ing re­sults. See, for ex­am­ple:

This was a very sim­ple ex­am­ple that we’ve been able to com­pute an­a­lyt­i­cally, but these won’t be pos­si­ble in real ap­pli­ca­tions, in which we will need the ap­prox­i­mated solu­tion by the al­gorithm. Source: https://​​to­wards­data­science.com/​​un­der­stand­ing-how-ime-shap­ley-val­ues-ex­plains-pre­dic­tions-d75c0fceca5a

Or

The Shap­ley value re­quires a lot of com­put­ing time. In 99.9% of real-world prob­lems, only the ap­prox­i­mate solu­tion is fea­si­ble. An ex­act com­pu­ta­tion of the Shap­ley value is com­pu­ta­tion­ally ex­pen­sive be­cause there are 2^k pos­si­ble coal­i­tions of the fea­ture val­ues and the “ab­sence” of a fea­ture has to be simu­lated by draw­ing ran­dom in­stances, which in­creases the var­i­ance for the es­ti­mate of the Shap­ley val­ues es­ti­ma­tion. The ex­po­nen­tial num­ber of the coal­i­tions is dealt with by sam­pling coal­i­tions and limit­ing the num­ber of iter­a­tions M. De­creas­ing M re­duces com­pu­ta­tion time, but in­creases the var­i­ance of the Shap­ley value. There is no good rule of thumb for the num­ber of iter­a­tions M. M should be large enough to ac­cu­rately es­ti­mate the Shap­ley val­ues, but small enough to com­plete the com­pu­ta­tion in a rea­son­able time. It should be pos­si­ble to choose M based on Ch­er­noff bounds, but I have not seen any pa­per on do­ing this for Shap­ley val­ues for ma­chine learn­ing pre­dic­tions. Source: https://​​christophm.github.io/​​in­ter­pretable-ml-book/​​shap­ley.html#dis­ad­van­tages-13

That be­ing said, here is a non­triv­ial ex­am­ple:

Foun­da­tions and pro­jects.

Sup­pose that within the EA com­mu­nity, OpenPhilantropy, a foun­da­tion whose ex­is­tence I ap­pre­ci­ate, has the op­por­tu­nity to fund 250 out of 500 pro­jects ev­ery year. Say that you also have 10 smaller foun­da­tions: Foun­da­tion1,..., Foun­da­tion10, each of which can af­ford to fund 20 pro­jects, that there aren’t any more sources of fund­ing, and that each pro­ject costs the same.

On the other hand, we will also con­sider the situ­a­tion in which OpenPhil is a monopoly. In the end, per­haps all these other foun­da­tions and cen­ters might be founded by OpenPhilantropy them­selves. Con­sider the as­sump­tion that OpenPhil has the op­por­tu­nity to fund 450 pro­jects out of 500, and that there are no other sources in the EA com­mu­nity.

Ad­di­tion­ally, we could model the dis­tri­bu­tion of pro­jects with re­spect to how much good they do in the world by or­der­ing all pro­jects from 1 to 500, and say­ing that:

  • Im­pact1 of the k-th pro­ject = I1(k) = 0.99^k.

  • Im­pact2 of the k-th pro­ject = I2(k) = 2/​k^2 (a power law).

With that in mind, here are our re­sults for the differ­ent as­sump­tions. Power In­dex= Shap­ley(OP) /​ To­tal Impact

Monopoly? Im­pact mea­sure To­tal Im­pact Shap­ley(OP) Power in­dex
0 I(k) = 0.99^k 97.92 7.72 7.89%
0 I(k) = 2/​k^2 3.29 0.028 0.86%
1 I(k) = 0.99^k 97.92 48.96 50%
1 I(k) = 2/​k^2 3.29 1.64 50%

For a ver­sion of this table which has coun­ter­fac­tual im­pact as well, see here.

The above took some time, and re­quired me to beat the for­mula for the Shap­ley value into be­ing com­pu­ta­tion­ally tractable for this par­tic­u­lar case (see here for some maths, which as far as I’m aware, are origi­nal, and here for some code).