Oxford Prioritisation Project: Version 0

By Tom Sit­tler.

Our pre­vi­ous sys­tem of al­low­ing com­ments on Google Doc­u­ment ver­sions of our posts had some ad­van­tages, but led to un­helpful dis­per­sion of com­ments. We are now try­ing a new policy of cen­tral­is­ing dis­cus­sion on the EA fo­rum, so please com­ment here.


On Fe­bru­ary 19, we reached what we call ver­sion 0. Ver­sion 0 was a self-im­posed dead­line, five weeks into the pro­ject, for pro­duc­ing a min­i­mum vi­able product, i.e. the name of a grantee and some jus­tifi­ca­tion for it. One way we thought of it was pre­tend­ing the whole pro­ject was only five weeks long. (That’s not quite right, since for the fi­nal de­ci­sion, our aim is to reach a con­sen­sus, while for ver­sion 0 we did not at­tempt this, and sim­ply pro­duced an or­di­nal rank­ing of po­ten­tial grantees).

On Fe­bru­ary 15, I asked ev­ery­one to sub­mit a rank­ing of can­di­date grantees for ver­sion −1. Any grantee name could be sub­mit­ted. Nat­u­rally, not ev­ery team mem­ber would rank ev­ery grantee. Our vot­ing rule, the Schulze Method (a con­dorcet ex­ten­sion), takes un­ranked can­di­dates to mean that the voter (i) strictly prefers all ranked to all un­ranked can­di­dates, and (ii) is in­differ­ent among all un­ranked can­di­dates.

Along with the ver­sion −1 rank­ing, we each pro­duced a doc­u­ment defend­ing our (cur­rent) top ranked grantee, and say­ing what would change our minds. We call these “cur­rent view” doc­u­ments.

We then had four days un­til ver­sion 0. The plan was to spend these read­ing and think­ing about each other’s views, each of us at­tempt­ing to make up­dates to­wards the truth and helping oth­ers to do so; in par­tic­u­lar, I hoped that the “what would change my mind” sec­tion of our cur­rent views would help sug­gest ways to up­date.

In fact, no team mem­ber changed their view be­tween ver­sion −1 and ver­sion 0 (al­though some team mem­bers changed their view sub­se­quently). This was dis­ap­point­ing, and turned into a ma­jor learn­ing point for me.

The bot­tom line for ver­sion 0: the Against Malaria Foun­da­tion, Ma­chine In­tel­li­gence Re­search In­sti­tute, and Good Food In­sti­tute were tied for first place. The large num­ber of ties is a re­sult of the small num­ber of vot­ers and of our vot­ing rule. You can see the full re­sults here, which gives you each team mem­ber’s full rank­ing, and a global rank­ing un­der var­i­ous con­dorcet vot­ing rules. See also the reg­u­larly up­dated rank­ings spread­sheet here.

If we ex­clude the rank­ings from the three team mem­bers who did not sub­mit a cur­rent view doc­u­ment, the re­sult is a tie be­tween the Ma­chine In­tel­li­gence Re­search In­sti­tute and StrongMinds (this can be com­puted eas­ily us­ing the pro­ce­dure de­scribed here).

Cur­rent views

Each re­search fel­low has out­lined their rea­son­ing for choos­ing a cer­tain top char­ity in sep­a­rate blog­posts:

Sindy Li, Against Malaria Foundation

Daniel May, Ma­chine In­tel­li­gence Re­search Institute

Tom Sit­tler, Ma­chine In­tel­li­gence Re­search Institute

Do­minik Peters, Good Food Institute

Kon­stantin Sietzy, StrongMinds (Kon­stantin’s top choice was origi­nally StrongMinds. For ver­sion 0 he changed his rank­ing to hav­ing MIRI on top, but shortly af­ter that changed his mind back to StrongMinds.)

Lo­visa Teng­berg, StrongMinds

Les­sons learned from ver­sion 0

Ver­sion 0 was an im­por­tant step, and I’m glad I de­cided on this self-im­posed dead­line at the start of the pro­ject. Forc­ing team mem­bers to pro­duce an ex­plicit rank­ing was good. Hid­ing be­hind vague pro­nounce­ments was no longer an op­tion. In­stead, team mem­bers had to ac­tu­ally de­velop a view of their own.

Tak­ing a more di­rect stab at the fi­nal de­ci­sion, how­ever, re­vealed some challenges.

Prob­lems with the epistemic atmosphere

Per­cep­tions of au­thor­ity vs introspection

Ver­sion 0 pro­duced some ev­i­dence that there ex­ists a ten­dency to defer to per­ceived au­thor­ity within the group rather than con­sult one’s own be­liefs. The fact that MIRI and GFI ranked so highly de­spite lit­tle pre­vi­ous dis­cus­sion in full-team meet­ings es­pe­cially alerted me to this.

This is a difficult prob­lem to solve. Per­haps peo­ple do not feel that the Oxford Pri­ori­ti­sa­tion Pro­ject is a suffi­ciently safe en­vi­ron­ment to de­velop or ex­press their own views. Per­haps they an­chored on first few pro­posed rank­ings, which were those of the most con­fi­dent team mem­bers.

Things I plan to try in or­der to solve this prob­lem in­clude:

  • As di­rec­tor, pre­sent­ing the most com­pel­ling rea­sons against my cur­rent view and in favour of another

  • Giv­ing pos­i­tive re­in­force­ment when some­one challenges one of the more con­fi­dent mem­bers of the group

  • Challeng­ing as­ser­tions that ap­pear to sim­ply defer to per­ceived au­thor­ity while claiming to be the re­sult of introspection

There is a bias to­wards dis­cussing fun topics

Some­times, peo­ple on the team, in­clud­ing my­self, found our­selves dis­cussing top­ics that are en­ter­tain­ing, or that serve show off our knowl­edge, rather than helping us pri­ori­tise be­tween grantees. At times, even though we were os­ten­si­bly dis­cussing ver­sion 0, we talked about top­ics that had no chance of af­fect­ing our re­spec­tive rank­ings.

On the pos­i­tive side, we usu­ally rec­tify this when it’s ex­plic­itly pointed out.

We are re­luc­tant to give true rea­sons for our beliefs

Our “cur­rent view” doc­u­ments in­clude a sec­tion called “what would change my mind”. I en­courage peo­ple to be as spe­cific as pos­si­ble by say­ing pre­cisely which (op­er­a­tional­ised) new pieces of ev­i­dence would cause them to make which changes in their rank­ing.

Team mem­bers have strug­gled to find “I would change my mind if”-state­ments that ac­tu­ally re­flect their be­liefs and are truly use­ful for re­solv­ing dis­agree­ment. (In other words, it’s difficult to find and com­mu­ni­cate the true cruxes of one’ view.)

The fol­low­ing ex­am­ples of the prob­lem don’t use real names and each char­ac­ter is a mix­ture of differ­ent traits I have wit­nessed on the team.

  • Alice says she’d change her mind about Awe­someChar­ity (her cur­rent top choice) if she found some ran­domised eval­u­a­tions show­ing that their in­ter­ven­tion is not as cost-effec­tive as we thought, and other in­ter­ven­tions in the same area as Awe­someChar­ity are bet­ter. But this is not the crux of Alice’s view. Alice finds it difficult to say which hy­po­thet­i­cal pieces of ev­i­dence would change her view more rad­i­cally, to a grantee work­ing in a com­pletely differ­ent area.

  • Carol men­tions she’d change her mind if she found ev­i­dence “X”. Some­one asks, “what would it look like, con­cretely, for you to get up one day and find X?”. The sub­se­quent dis­cus­sion re­veals that Carol already has good rea­sons to ex­pect X will not ever ma­te­ri­al­ise.

Ar­bi­trary path-dependency

It feels as though we are not ex­plor­ing the space of pos­si­ble grantees sys­tem­at­i­cally. Any­one on the Oxford Pri­ori­ti­sa­tion Pro­ject can sub­mit a promis­ing po­ten­tial grantee. But in­stead of be­ing effi­cient, I sus­pect that this pro­cess may have lead to some path-de­pen­dency. Ran­dom fac­tors early on in the pro­ject de­ter­mined which or­gani­sa­tions were first pro­posed, and the rest of the pro­ject so far has been, to an ex­tent, de­pen­dent on these ar­bi­trary ini­tial con­di­tions. There are likely to be grantees which we are not con­sid­er­ing, for no good rea­son.

Does no­body have be­liefs?

My over­all sense from the above is that peo­ple don’t have be­liefs. What I mean by that is not that peo­ple will give you a blank stare if you ask them what their be­liefs are. They will gen­er­ate some­thing to say. What I mean is that the pro­cess that gen­er­ates the ver­bal state­ment is not one of look­ing at one’s mod­els of the world. It’s some­thing else, per­haps a com­bi­na­tion of an­chor­ing, defer­ring to au­thor­ity, ask­ing one’s Sys­tem 1 which an­swer would most raise one’s sta­tus within the group, etc.

What might be go­ing on here? I sus­pect that most of us don’t ac­tu­ally have mod­els of the world. We don’t have the be­liefs about em­piri­cal facts, causal chains, and coun­ter­fac­tu­als, that would con­sti­tute a model. Or per­haps we do have some mod­els, but these are so un­so­phis­ti­cated that we are em­bar­rassed to re­veal them.

Hav­ing mod­els is not bi­nary, of course. One can have mod­els of differ­ent lev­els of so­phis­ti­ca­tion and pre­dic­tive power, and the ex­tent to which one con­sciously con­sults one’s im­plicit mod­els also varies.

Try­ing to improve

In­tro­spec­tion exercise

I looked at each team mem­ber’s rank­ing, and asked them to com­pare their top choice (X) to the first choice in their rank­ing that is from a differ­ent fo­cus area than their first choice (Y). For ex­am­ple, if Alice’s rank­ing is:

  • An­i­mal Char­ity Evaluators

  • Good Food Institute

  • GiveDirectly

I would ask Alice to com­pare An­i­mal Char­ity Eval­u­a­tors (X) to GiveDirectly (Y). The goal of this as­pect of the ex­er­cise was to force us to pri­ori­tise across fo­cus ar­eas.

The ex­act ques­tions I asked were:

  • You ranked X above Y. What would have made you rank Y above X?

  • What made you in fact rank X above Y?

  • You have a cre­dence p that X is bet­ter than Y, and a cre­dence 1-p that Y is bet­ter than X. What is p? In other words, how con­fi­dent are you?

The catch, how­ever, is that ev­ery­one had to an­swer these ques­tions with­out ac­cess­ing the in­ter­net or their own past writ­ings. I sus­pect that in the pro­ject so far, we have not been de­vel­op­ing our own mod­els in part be­cause we sim­ply took the most con­vinc­ing-look­ing an­swer we could find by search­ing through what other peo­ple have writ­ten. This as­pect of the ex­er­cise was a mechanism to force us to in­tro­spect, to look at our own be­liefs/​mod­els.

My hope was that this in­tro­spec­tion ex­er­cise might jar some of us into re­al­is­ing we don’t in fact have be­liefs to the ex­tent that we thought we did. This could be the first step to­wards build­ing mod­els.

Build­ing quan­ti­ta­tive models

The next step was to build some mod­els. By mod­els, I do not nec­es­sar­ily mean quan­ti­ta­tive mod­els. A model is any­thing that al­lows you to make pre­dic­tions about the world. How­ever, the eas­iest way to force one­self to make one’s mod­els un­am­bigu­ous is to use num­bers. Math­e­mat­ics doesn’t al­low am­bi­guity.

There­fore, we re­peated the “no lookups” ex­er­cise for model-build­ing. I asked each team mem­ber to choose a met­ric mea­sur­ing the im­pact of their top ranked or­gani­sa­tion, and then to build a very sim­ple model es­ti­mat­ing this met­ric, with­out us­ing the in­ter­net or their pre­vi­ous writ­ings.

At the end of this ses­sion, we dis­cussed our mod­els, and talked about the ex­pe­rience of work­ing purely from what we have stored in our mem­ory. For next time, we will be iter­at­ing on the two doc­u­ments we pro­duced (the in­tro­spec­tion ex­er­cise and the sim­ple quan­ti­ta­tive model). We’ll pub­lish the re­sults on the blog.