Explaining the discrepancies in cost effectiveness ratings: A replication and breakdown of RP’s animal welfare cost effectiveness calculations

I’d like to thank Derek Shiller from Rethink priorities for extensive discussions and looking over this post.

Introduction

I’ve been following the “animal welfare” debate this week on the EA forum, and noticed that a key crux for a lot of people was that calculations showed that animal welfare campaigns (specifically the “caged chicken corporate campaign”) was much more cost effective than a human global health development project like the against malaria foundation.

But while most estimates agreed that AW was more effective than GHD, I noticed there was a wide discrepancy in how much more effective it was. Vasco Grilo claimed it was 1500 times better. This report by Laura Duffy of rethink priorities (when you convert from order of magnitude to real numbers in table 1) claimed it was about 60 times better. Whereas if you go the Cross cause comparison website, also by RP, and compare the default results of chicken vs “givewell bar”, AW only comes off as 35 times better.

This wouldn’t be so surprising, except that all three estimates seem to be using almost the same assumptions and sources. All are adopting hedonic utilitiarianism. All are using the moral weights from the moral weights sequence of RP. All are using the estimates from this report by Saulius about the amount of chickens affected per dollar. All are using the this report by welfare footprint for the amount of time chickens spent in pain under different raising conditions.

You might think this doesn’t matter, but the difference between 1500 times and 35 times is actually quite important: if you’re at 1500 times, you can disagree with a few assumptions a little and still be comfortable in the superiority of AW. But if it’s 35 times, this is no longer the case, and as we shall see, there are some quite controversial assumptions that these models share.

Over the last few days, I have attempted to replicate these estimates, determine the key assumptions and cruxes of each calculation, and determine the sources of disagreement. This was developed in a dialogue with RP members, in particular Derek Shiller, you can view us hashing the thing out in excruciating detail in the comments of this post.

In this article, I will outline what I found in my replication. I will primarily focus on replicating the work of RP. I think I’ve identified the source of disagreement with Vasco’s work as well, but I haven’t looked into their work as much. Note that I’m not trying to state that this model is the correct way to do things. This article is showing you how other people came to their conclusions, to allow for you, the reader, to decide whether you agree or disagree.

The RP model has a few components, which I will break down in turn in the next few sections. I will be using point estimates for the values for clarity, but please note that in the actual reports RP uses monte-carlo style simulations to deal with uncertainty: I think this is good practice.

I have shared my replication of the model here.

Underlying model:

First we need to be clear on what is being compared here. Generally, the comparison is with disability-adjusted life years (DALYs).

RP describes DALY’s as:

the number of “disability-adjusted life years (DALYs) averted.” A DALY is a health measure with two parts: years of human life lost and years of human life lost to disability. The former measures the extent to which a condition shortens a human’s life; the latter measures the health impact of living with a condition in terms of years of life lost. Together, these values represent the overall burden of the condition. So, averting a DALY is averting a loss—namely, the loss of a single year of human life that’s lived at full health.

The use of DALY’s is pretty common in the world of public health. I assume plenty of people have discussed it’s assumptions, but it is well established for public health decisions.

Note that as far as I can tell these DALYS are purely based on two aspects of morality:

  1. How long a human lives

  2. How much suffering a human experiences.

Other aspects of morality are not included in the results. So for example, if you were trying to calculate the cost of saving the mona lisa from a fire, the DALY cost would be only be seen in the sadness people feel at missing out on seeing it and the economic cost to louvre employees. A lot of calculations also don’t factor in knock-on effects either: as far as I’m aware GHD doesn’t try to factor in the pain of a mother losing a child, assuming that it is not large compared to the lost years of life of the child.

Following on from these, these calculations then use the assumptions used by the RP moral weights project. These assumptions are:

  • Utilitarianism, according to which you ought to maximize (expected) utility.

  • Hedonism, according to which welfare is determined wholly by positively and negatively valenced experiences (roughly, experiences that feel good and bad to the subject).

  • Valence symmetry, according to which positively and negatively valenced experiences of equal intensities have symmetrical impacts on welfare.

  • Unitarianism, according to which equal amounts of welfare count equally, regardless of whose welfare it is.

They discuss these assumptions a lot in the sequence of articles, and explore what happens if you relax some of them. Unitarianism is an assumption which differs a lot from everyday morality, for example, saying that if a human and chicken experience the same amount of pain, they are equally important when making moral decisions. I recommend you check out at least some of the articles in that sequence for their arguments, I will explore the effect this has on calculations later on.

Another important point is that these are estimates for specifically corporate chicken campaigns, which seem to be the best interventions we have semi-decent numbers for. It’s possible that investing in AW research will uncover better interventions (like shrimp welfare or something).

The Saulius report of Chicken years per dollar estimate

The first thing people need to answer is: how many chickens does donating dollars to a corporate campaign actually affect? That is, if you donate a thousand dollars to a cage free campaign, how many chickens do you expect to move, and for how long?

RP and pretty much everyone are using this 2019 report from Saulius to estimate the effectiveness of corporate chicken campaigns. It’s good work, and incredibly detailed, so I’d recommend looking over it yourself.

To summarise, what Saulius did was to thoroughly investigate the campaigns and successes of corporate campaigns for hen and broiler welfare over the period between 2005 and 2018.

First, they estimated how much money was spent on such campaigns from 2005 to 2018 by all animal rights campaigns that participated in such campaigns. For cage free hens, this was 57 million dollars [1]

Next, they estimated the flock affected per year by companies that had made cage-free or better broiler commitments in that time period as a result of the campaigns.[2] For cage free, that was a flock of 310 million birds per year.

They then multiplied this by the estimated chance of companies actually following through on their commitments (64%), and how many years they expect the commitment to last for (15 years), and divided by the cost of the campaign, to determine that every dollar would, in expectation, move 42 chickens from cages to cage-free environments over the intervention period.

Each chicken, when moved, is estimated to live for 1.3 years in the improved conditions (taken from this report here) . This means the total effect, in terms of chicken-years per dollar, is 54 chickens-years per dollar. [3]

Another thing I want to emphasise: this is an estimate of past performance of the entire animal rights movement. It is not an estimate of the future cost effectiveness of campaigns done by EA in particular. They are not accounting for tractableness, neglectedness, etc of future donations.

Saulius gives one indication that the cost effectiveness could increase (some of the infrastructure costs were one-offs that don’t need to be done again). The rest of their indicators are negative, indicating that future campaigns will be less cost effective. One factor of note is that, as Saulius notes, nearly 70% of the US flock of hens has already been committed to be cage free. If these were the “low hanging fruit” companies, there may be diminishing returns to future campaigns with other countries proving more difficult, and the costs might increase over time.

In the RP report, they accounted for this probable drop in effectiveness by dropping the effectiveness by a range of 20%-60%. This number is not backed up by any source: it appears to be a guesstimate based on the pros and cons listed by Saulius. Hence there is signifcant room for disagreement here.

If we take Sallius’s estimate of 42 chickens affected per dollar, and discount it by 40%, we get a median of 42*0.6 = 25.2 chickens affected per dollar.

Chicken DAlYs per chicken affected

Once you’ve established the number of chicken-years that have been affected by the intervention, you then have to include an estimate for how “good” it is to free each chicken. This involves first estimating how much pain is experienced by hens as a result of being confined to a cage, and then converting those pains to numerical values.

The estimate of length of time of pain is taken from the welfare footprint project. This project tracked the experience of a bunch of chickens in normal cages, and in cage-free aviaries and counted the number of hours on average each chicken spent in different types of pain over the course of its breeding period (which conveniently enough, it estimates as 1 year). For example, it says that caged hens experienced 323 hours of “nest deprivation”, which it classified as “disabling pain”, whereas cage-free hens only experienced 16 hours of nest deprivation, so if time periods were the same, the move averted 323-16 = 307 hours of nest deprivation over the lifetime of the chicken.

How do you convert this to DALY’s? Well, the RP report takes it’s conversion from this report of a global burden of disease study, summarised by RP as:[4]

  • 1 year of annoying pain = 0.01 to 0.02 DALYs

  • 1 year of hurtful pain = 0.1 to 0.25 DALYs

  • 1 year of disabling pain = 2 to 10 DALYs

  • 1 year of excruciating pain = 60 to 150 DALYs

So our 307 hours (307/​(365*24) years) of nest deprivation pain averted is worth 307/​(365*24)*(2 to 10) = 0.07 to 0.35 chicken-DALYs per bird over it’s laying period. When you add up all of these, you get a conversion factor.

If I take roughly the average DALY conversion factor here, and take into account all the pain I get a total conversion factor of 0.24 DALY’s per chicken. The most important type of pain was disabling, which accounted for 80% of the conversion rate: so where you land on that “2 to 10” conversion rate for disabling pain matters a huge amount for the final result.

This seems to be a key difference with Vasco grilo’s analysis here. I think Vasco’s analysis ends up using DALY conversions that are roughly ten times higher than RP is using (and roughly 100 times in the case of excruciating pain).

This also seems to be where I think RP might have made an error. In their calculation, which can be found here, they put the time period of this laying as 1.6 years for caged hens, and 1.2 years for cage free hens. This disagrees with both the ACE report number of 1.3 years for both, and the website reporting the pain numbers, which estimates about 1 year for both.

This disagreement makes no difference for my analysis, because I just looked at DALY’s over the chicken’s life. But they normalised everything to be per year, which makes caged hens look better because the reported pain is supposedly over a longer time period. They then used Saulius “chicken-years”[5] estimate, which already included the length of laying time in it (the 1.3 figure), and assumed it was the same for both interventions. This had the effect of reducing the estimated impact of the intervention by something around 40%.

It’s worth pointing out that the RP method shouldn’t actually care about the average wellbeing of the chicken: only the total wellbeing over the lifetime of each intervention. I didn’t look too much into Vascos model, but I think it does matter the way he does it.

If we combine Saulius’s estimate with this conversion factor, we get 25.2 chickens per dollar, times 0.24 DALY’s per chicken per year = 6.014 DALY’s per dollar, or 6014 DALYs per thousand dollars. Note that these are Chicken-DALY’s, not human-DALYS, we cover the conversion in the next section.

Capacity for welfare

When we have referred to DALY’s so far in these calculations, we have been referring specifically to “chicken DALY’s”: how much we are reducing bad experiences in chickens. However, when doing a cross cause comparison, this number has to be compared with “human DALYS”.

This means we need to answer a question of how much we should trade off human suffering for animal suffering.

For example, say you are forced to choose between inducing a headache in one human for an hour, versus inducing a headache in X chickens for an hour. How high would X have to be before you chose to hurt the human instead of the chickens?

How about if instead of a regular headache for one hour, it was a painful migraine lasting for an entire year? How many chickens would you hurt to spare a human from this experience?[6]

One extreme end, you can imagine an extreme speciest, who doesn’t care about the suffering of non-humans at all. They would place X as “unlimited”: they would happily torture a billion chickens to spare one human a headache. Given that factory farming is broadly opposed by the public, this does not seem like a common view.

On the other extreme end, you can imagine a total equalist.[7] They would say that a chicken and a human are morally equivalent, X=1. They would rather save two chickens from suffering than one human. Given that 86% of the world eats meat, this would also be a rather unpopular opinion.

A common view has been that you should weight the importance of animals by neuron weight. Either because you think more intelligent creatures matter more, or perhaps that they experience pain and pleasure “less intensely”. In the post “ why neuron counts shouldn’t be used as proxies for moral weight”, RP critiques this view, explaining that there aren’t a lot of good arguments for using it. Neverthelees, it does accord more with intutitions about which animals are more important. If you stick to this view, Chickens have approximately 250 million neurons, humans have about 86 billion neurons, hence 1 human is as important as 430 chickens.

So what is the number for Rethink priorities? They have outlined their methodology in extensive detail in the “moral weights sequence”. I will try and summarise here, but you should trust what they write over my summary.

Essentially they are trying to gauge the intensity of the range of pleasure/​pain experiences in each animal, compared to a human.

In the sequence they summarise their method:

  1. Make some plausible assumptions about the evolutionary function of valenced experiences

  2. Given those functions, identify a lot of empirical traits that could serve as proxies for variation with respect to those functions

  3. Survey the literature for evidence about those traits

  4. Aggregate the results

In essence, they are trying to find various animal behaviours that could indicate a level of sensory experience, and then use those to determine the intensity of experience. These are summarised in this spreadsheet. An example of a quantitative measure would be the change in heart rate when subjected to painful heat: a humans hart rate changes about twice as much as a chicken for a similar stimulus. . Or qualitavely, they can look at different behaviours: A chicken will react negatively when their child is distressed, whereas a shrimp will not, so this gives some indication a chicken is more morally salient than a shrimp. Overall there is evidence that chickens feel pain and pleasure, and react to these in ways at a first level similar to what a human would.

They also factor in the probability of chickens being sentient at all, which they put at around 80%. Read their report on this matter for their justifications.

So, with all that analysis, what value do they put on for X?

It’s about 3, only slightly higher than the extreme equalist. The moral weight of chickens is set at 0.332 of humans. I take them to be saying that chickens experience pain at about a third of the intensity of humans (with 1 or 2 orders of magnitude of uncertainty)

Note that there is uncertainty here. The 95% range for octpuses actually rises above 1 to 1.47, implying they think theres a decent chance that octopuses feel more hedonic pleasure/​pain than humans. it’s also very important to state that you have to take lifespan into account if you are talking about saving lives. They are not saying that saving a childs life is as important as saving three chickens lives. Since humans live ten times as long, they are saying that saving a childs life is as important as saving 24 chickens lives.

There’s plenty of arguments about these figures, so I won’t continue the argument here. In my doc I examine the effect of switching from RP weights to neuron weights, which would drop the estimated cost effectiveness by two orders of magnitude.

If we take the our value of 6014 chicken DALYs per thousand dollars, and weight it by 0.332, we get a final value of of 6014*0.332 = 1996 Human-DALYS equivalent per thousand dollars.

In my replication document, I did a similar estimate for Broilers, getting an answer of 246 DALYS per thousand dollars.

Human welfare comparison:

The GHD estimate was pretty similar between sources: they claim that a top tier global health charity like AMF is yielding about 20 DALYs per thousand dollars. I think this estimate originates from this report. Note that this is for saving lives: The conversion rate used seems to be saying that saving a childs life in a poor african country is equivalent to preventing about 50 DALY’s of suffering.

My final answers for cost effectiveness:

So factoring this in, correcting what I believe to be errors, and also accounting for the controversial moral weights, my estimate, if I follow the RP methodology and assumptions (which should not be taken as an endorsement of these methods), is as follows.

For the intervention of cage free campaigns, using RP’s moral weights, the intervention saves 1996 DALY’s per thousand dollars, about a 100 times as effective as AMF.

For the intervention of cage free campaigns, using naive neuron count weights, the intervention saves 17 DALY’s per thousand dollars, about the same effectiveness as AMF.

For the intervention of better broiler campaigns, using RP’s moral weights, the intervention saves 246 DALY’s per thousand dollars, about a 10 times as effective as AMF.

For the intervention of better broiler campaigns, using naive neuron count weights, the intervention saves 2.2 DALY’s per thousand dollars, about 10 times less effective than AMF.

Explaining the various figures I introduced the post with:

The 60 times figure from this report seems to be a result of the laying time error I discussed earlier. I replicated their answer (i believe) in the Laura replication tab on my spreadsheet.

The 35 times figure on the cross intervention comparison website is inexplicable to me: I think the numbers entered in as the default values are just wrong. I think it’s a cool tool though.

Vasco’s figure of 1500 times as effective appears to be a result of having DALY conversion factors that are roughly ten times what RP uses, along with not discounting for less effective future campaigns. I’m not saying he’s wrong, just that these are the key differences in assumptions. I haven’t looked too closely, but I was able to get similar figures by upping the DALY weights in the broiler model.

You may be able to spot further problems with this analysis: hopefully I have made it easier to do so. Remember: Don’t just take numbers as they are: all numbers rely on assumptions and calculations made by fallible humans who make mistakes sometimes. Many numbers floating around EA are not checked thoroughly, and it is often extremely easy to make mistakes.

I have gained a lot of respect for the extreme amount of work, effort and transparency RP put into their research: however I believe they still made mistakes which substantially affected their results. You should take the numbers from any org that is less thorough with even more skepticism.

  1. ^

    (reported median values for ease of following)

  2. ^

    a certain number of hens would have been freed without the campaign: an estimate of this is included in the calculation.

  3. ^

    You have to be careful: this is not chickens per year per dollar, this is chickens times years per dollar. Doubling the number of chickens or doubling the length of intervention would both double this final value. This is the source of a lot of confusion, but fortunately, we can cancel this out.

  4. ^

    Note that these are human years to human DALY’s: RP assumes these are the same conversions as chicken years to chicken DALY’s.

  5. ^

    if you look at the hens affected per dollar CF campaign field, you get a value of 36.2. This is just Saulius figure of 54 discounted by 40%.

  6. ^

    Note that there are knock on effects from the human being . Cost effectiveness evaluations include some, but not all knock on effects.

  7. ^

    footnote: I guess you could also have someone who thinks chickens are more important than humans, but let’s not go that far.