Is it better to be a wild rat or a factory farmed cow? A systematic method for comparing animal welfare.

TLDR: We looked at a lot of differ­ent sys­tems to com­pare welfare and ended up com­bin­ing a few com­mon ones into a weighted an­i­mal welfare in­dex (or welfare points for short). We think this sys­tem cap­tures a broad range of eth­i­cal con­sid­er­a­tions and should be ap­pli­ca­ble across a wide range of both farm and wild an­i­mals in a way that al­lows us to com­pare in­ter­ven­tions.

The goal of Char­ity En­trepreneur­ship is to com­pare differ­ent char­i­ta­ble in­ter­ven­tions and ac­tions so that new, strong char­i­ties can be founded. One of the nec­es­sary steps in such a pro­cess is hav­ing a way to com­pare differ­ent an­i­mals in differ­ent con­di­tions. For ex­am­ple, how does mov­ing a chicken from a bat­tery cage to cage free com­pare welfare wise for the chicken, or how does giv­ing up red meat, thus re­sult­ing in one less cow be­ing brought into ex­is­tence, com­pare to an in­sect dy­ing more hu­manely be­cause of a change in which in­sec­ti­cide is used. Th­ese are com­plex ques­tions sur­rounded by both eth­i­cal and epistemic un­cer­tainty. In the health com­mu­nity, DALYs have be­come fairly com­mon and es­tab­lished as a met­ric. Sadly, there is not the same level of con­sen­sus within the an­i­mal rights com­mu­nity. We ex­pected there would be mul­ti­ple com­pet­ing sys­tems, so we first out­lined what we would look for within a sys­tem to as­sess its helpful­ness to us. This could be de­scribed as the “goal” or pur­pose of the met­ric. Of course, the fun­da­men­tal goal is to help us eval­u­ate differ­ent pos­si­ble ac­tions, but more speci­fi­cally, we broke down what we were look­ing for in the crite­ria be­low.

Un­der­ly­ing goals of metrics

  • Prox­ies’ eth­i­cal value accuracy

    • Strength of cor­re­la­tion be­tween the met­ric and eth­i­cal value

    • En­cap­su­la­tion—cap­tures a broad range of what is important

    • Directness

    • Gamability

  • Cross-applicability

    • Cross-in­ter­ven­tion ap­pli­ca­bil­ity

    • Cross-an­i­mal ap­pli­ca­bil­ity

    • Eth­i­cal robustness

    • Ex­ter­nally understandable

    • Ex­ter­nal prece­dent of use

  • Operationalizability

    • Amenable to nu­mer­i­cal quan­tifi­ca­tion

    • Ease/​speed of use

    • Ob­jec­tive­ness

    • Gen­er­ates few false pos­i­tives or false negatives

    • In­tu­itive to work with

    • Easy to collect

    • Easy to explain

After es­tab­lish­ing what we were look­ing for, the next step was to take a look at all cur­rent sys­tems and see if any of them was con­ducive or could be used partly by an or­ga­ni­za­tion like ours. We ended up find­ing quite a wide range.

EA community

We first looked within the EA com­mu­nity, since there had been some solid at­tempts at quan­tifi­ca­tion and the ones be­low are just a few of many ex­am­ples.

Within the EA community

Th­ese met­rics were gen­er­ally very hard, quan­tified, and of­ten even ex­plic­itly cost-effec­tive­ness fo­cused. Sadly, they were also ex­tremely spe­cific and not built for gen­er­al­iza­tion across differ­ent in­ter­ven­tions and char­i­ties. Thus, for our pur­poses, they were more helpful as in­spira­tion for the fac­tors to con­sider, or stan­dards that we would want to be able to mea­sure, rather than for prac­ti­cal cross-in­ter­ven­tion use.

Biol­ogy-based markers

The next set of met­rics we looked at was biol­ogy-based mark­ers. We had some back­ground knowl­edge about cor­ti­sol read­ings as a mea­sure of stress and hoped that we would find other ob­jec­tive mark­ers that could make up part of a more in­clu­sive sys­tem and add some ob­jec­tivity to other soft sys­tems. Some of the ones we con­sid­ered (al­though, there are many other pos­si­ble biolog­i­cal in­di­ca­tors) are listed be­low.

Biol­ogy-based markers

  • Cortisol

  • Dopamine

  • En­docrine changes

  • Cir­cu­lat­ing cat­e­cholamines and corticosteroids

  • Death rate

  • Be­hav­ior changes

  • Visi­ble in­jury rate

  • Re­duced life expectancy

  • Im­paired growth

  • Im­paired reproduction

  • Body damage

  • Disease

  • Immunosuppression

  • Adrenal activity

  • Be­hav­ior anomalies

  • Self-narcotization

Biolog­i­cal mark­ers were use­ful in that they were much less sub­jec­tive than other met­rics but sadly, it was also very hard to find con­sis­tent data across an­i­mals on many of them (with the death rate be­ing a no­table ex­cep­tion). We ended up think­ing these would make up a part of a larger sys­tem, but even an in­dex of them would not be in­clu­sive enough to cover all the pos­si­ble sources of an­i­mal welfare situ­a­tions that could oc­cur.

Aca­demic mea­sures of quality

The third type of sys­tem we con­sid­ered was “aca­demic mea­sures of qual­ity of life”. WAS re­search had a great sum­mary of many of the differ­ent sys­tems used, but we also looked out­side of their re­search for other pos­si­ble sys­tems.

Aca­demic mea­sures

  • Five freedoms

  • The Five Do­mains model

  • Five Pro­vi­sions model

  • Botreau’s twelve criteria

  • McMillan’s five el­e­ments, which play a fun­da­men­tal role in qual­ity of life

  • Fraser’s an­i­mal welfare’s four core values

  • Web­ster’s an­i­mal welfare’s three questions

  • Tay­lor and Mills’s do­mains for as­sess­ing com­pan­ion an­i­mal’s qual­ity of life:

  • Swais­good’s ten mo­ti­va­tional the­o­ries which have cur­rency among an­i­mal-welfare researchers

Many of these sys­tems were beau­tifully com­pre­hen­sive and de­scribed met­rics and crite­ria in such a way that it would be cross-ap­pli­ca­ble to a wide range of an­i­mals across a wide range of con­di­tions. Some even speci­fied differ­ent grade lev­els (al­though, these were gen­er­ally not nu­meric) to provide more con­sis­tency across re­ports. It seemed pos­si­ble that some re­searchers would have already used these sys­tems, though sadly, we did not find much re­search show­cas­ing the mod­ern prac­ti­cal use of these sys­tems. The main draw­back of these sys­tems was their sub­jec­tivity. Even with the ones with spe­cific grade lev­els, a lot would be left up to the eval­u­a­tor about mak­ing calls be­tween one sys­tem and an­other: for ex­am­ple, how does not be­ing fed for sev­eral days, while be­ing oth­er­wise perfectly fed, com­pare to semi-chronic but low level hunger. Over­all, we took a large num­ber of el­e­ments of our sys­tem from the five do­mains model, which felt like the most ex­ten­sively quan­tified and broad one of these mod­els.

Sys­tems used in global poverty

Next, we con­sid­ered the cur­rent sys­tems used in global poverty alle­vi­a­tion and other cause as­sess­ment ar­eas. We thought it might be pos­si­ble to mod­ify one of these met­rics to be use­fully ap­pli­ca­ble to an­i­mals.

Mod­ified poverty based metrics

  • An­i­mal QALYs

  • An­i­mal DALYs

  • An­i­mal Income

  • An­i­mal sub­jec­tive well-be­ing estimates

  • Equiv­a­lent lives saved

  • Prefer­ence from be­hind the veil of ig­no­rance

Gen­er­ally, these met­rics were too un­ap­pli­ca­ble (e.g. in­come) or would have re­quired con­sid­er­ably more time to mod­ify and put into the an­i­mal welfare con­text (e.g. DALYs do not have a way to have a net nega­tive ex­is­tence, which is a key con­sid­er­a­tion in the case of fac­tory farmed an­i­mals).

Creat­ing our own system

Fi­nally, we con­sid­ered cre­at­ing a cross-ap­pli­ca­ble sys­tem from scratch

Our own ideas for pos­si­ble systems

  • SAD—suffer­ing-ad­justed life-day

  • Sen­tience-ad­justed suffer­ing years

  • Net nega­tive lives averted

  • To­tal world net ex­pected value

  • Numer­i­cal crite­ria for an­i­mals’ qual­ity of life, e.g. a −100 to 100 rating

We did end up us­ing some of the ideas drawn from con­sid­er­ing this op­tion but, over­all, found that tak­ing el­e­ments from other sys­tems would both in­crease qual­ity and re­duce the time that we would oth­er­wise spend on cre­at­ing a new sys­tem from scratch.

Re­sults: an in­clu­sive in­dex

We ended up putting many of these sys­tems onto a spread­sheet and com­par­ing them on the origi­nal met­ric crite­ria we had de­rived. Some crite­ria ended up get­ting nar­rowed down. For ex­am­ple, we com­bined var­i­ous biolog­i­cal mark­ers into a sin­gle “biolog­i­cal mark­ers” cat­e­gory. Some crite­ria were made more nu­mer­i­cal and cross-com­pa­rable, for ex­am­ple, by trans­lat­ing the 5 do­mains model into num­ber-based scores, in­stead of grades. Other el­e­ments were given their own cat­e­gory and weight­ing based on how well they met the top line crite­ria (for ex­am­ple, death rate). Most crite­ria were ruled out as re­dun­dant or not helpful for our pur­poses.

We ended up with 8 crite­ria with an im­por­tance weight­ing for each. Com­bined, they added to a range of +100 (an ideal life) to −100 (a perfectly unideal life) with 0 rep­re­sent­ing un­cer­tainty about the life be­ing net pos­i­tive or nega­tive. Each area can have pos­i­tive or nega­tive welfare scores and is to be rated in­de­pen­dently, giv­ing a more ro­bust cluster ap­proach to the over­all endline score. The weight­ing of each fac­tor is differ­ent, de­pend­ing on how well it scored on our origi­nal met­ric crite­ria. For ex­am­ple, death rate gets a rel­a­tively higher weight­ing (20 welfare points) than our in­dex of other biolog­i­cal mark­ers (4 welfare points) due to its ease to work with and its clearer re­la­tion to di­rect an­i­mal suffer­ing (e.g. we are more con­fi­dent that an­i­mals with very high and painful death rates will cor­re­late more strongly with a life not worth liv­ing than the more ab­stract biolog­i­cal mark­ers will).

Fac­tors we ended up us­ing:

  • Death rate/​rea­son − 20

  • Hu­man prefer­ence from be­hind the veil of ig­no­rance − 20

  • Disease/​in­jury/​func­tional im­pair­ment − 17

  • Thirst/​hunger/​malnu­tri­tion − 15

  • Anx­iety/​fear/​pain/​dis­tress − 15

  • En­vi­ron­men­tal challenge − 5

  • In­dex of Biolog­i­cal mark­ers − 4

  • Be­hav­ioral/​in­ter­ac­tive re­stric­tion − 4

Our full spread­sheet with fac­tors, scores, and met­ric crite­ria scores gives a deeper sense of why differ­ent ar­eas were given the weight­ing they were, as well as a nar­ra­tive ex­pla­na­tion of what a nega­tive, mid­dling, and pos­i­tive score would look like in each cat­e­gory.

Over­all, we felt like this sys­tem gave us a good bal­ance be­tween both the more sub­jec­tive met­rics that could cap­ture more data and the harder met­rics that were more ob­jec­tive. We feel that this sys­tem could be used across a wide range of both an­i­mals and in­ter­ven­tions, and lead to cross-com­pa­rable re­sults.