“Dimensions of Pain” workshop: Summary and updated conclusions

Executive Summary

  1. Background: The workshop’s goal was to leverage expertise in pain to identify strategies for testing whether severity or duration looms larger in the overall badness of negatively valenced experiences. The discussion was focused on how to compare welfare threats to farmed animals.

  2. No gold standard behavioral measures: Although attendees did not express confidence in any single paradigm, several felt that triangulating results across several paradigms would increase clarity about whether nonhuman animals are more averse to severe pains or long-lasting pains.

    1. Consistent results across different methodologies only strengthens a conclusion if they have uncorrelated or opposing biases. Fortunately, while classical conditioning approaches are probably biased towards severity mattering more, operant conditioning approaches are probably biased towards duration mattering more. Unfortunately, the biases might be too large to produce convergent results.

  3. Behavioral experiments may lack external validity: Attendees believed that a realistic experiment would not involve pains of the magnitude that characterize the worst problems farmed animals endure. Thus, instead of prioritizing external validity, we recommend whatever study designs create the largest differences in severity.

    1. Studies of laboratory animals and (especially) humans seem more likely to generate large differences in severity than studies of farmed animals.

  4. No gold standard biomarkers: Biomarkers could elide the biases that behavioral and self-report data inevitably introduce. However, attendees argued that there are no currently known biomarkers that could serve as an aggregate measure of pain experience over the course of a lifetime.

  5. Priors should favor prioritizing duration: Attendees had competing ideas about how to prioritize between severity and duration in the absence of compelling empirical evidence. In cases where long-lasting harms are at least thousands of times longer than more severe harms and are of at least moderate severity, we favor a presumption that long-lasting pains cause more disutility overall.

    1. Nevertheless, due to empirical and moral uncertainty, we would recommend putting some credence (~20%) in the most severe harms causing farmed animals at least as much disutility as the longest-lasting harms they experience.

Background

The Dimensions of Pain workshop was held April 27-28, 2023 at University of British Columbia. Attendees included animal welfare scientists (viz., Dan Weary, Thomas Ede, Leonie Jacobs, Ben Lecorps, Cynthia Schuck, Wladimir Alonso, and Michelle Lavery), pain scientists (Jeff Mogil, Gregory Corder, Fiona Moultrie, Brent Vogt), and philosophers (Bob Fischer, Murat Aydede, Walter Veit). William McAuliffe and Adam Shriver, the authors of this report, guided the discussion.

Funders who want to cost-effectively improve animal welfare have to decide whether attenuating brief, severe pains (e.g., live-shackle slaughter) or chronic, milder pains (e.g., lameness) reduces more suffering overall. Farmers also face similar tradeoffs when deciding between multiple methods for achieving the same goal (e.g., single-stage versus multi-stage stunning). Our original report exploring the considerations that would favor prioritizing one dimension over another, The Relative Importance of the Severity and Duration of Pain, identified barriers to designing experiments that would provide clear-cut empirical evidence. The goal of the workshop was to ascertain whether an interdisciplinary group of experts could overcome these issues.

No gold standard behavioral measures

We spent one portion of the workshop reviewing some of the confounds that we believe would plague behavioral experiments testing whether individuals find severe or long-lasting pains more aversive overall. Afterwards, we had attendees spend 30 minutes brainstorming paradigms on their own that could be implemented in the population they study. Although attendees shared highly creative ideas, none in our judgment represented a silver bullet. Attendees were also generally skeptical of relying on any one paradigm. They were more optimistic about a research program that might provide consistent results across several different paradigms.

The paradigms we discussed could be broadly categorized as relying on classical conditioning or operant conditioning. An exemplar of classical conditioning is the conditioned place aversion paradigm, where the experimenter has subjects repeatedly undergo a presumably aversive experience in one location but not in another (e.g., Dixon et al., 2013). Once subjects have learned the association between the location and the aversive experience, the experimenter observes the degree to which subjects avoid the location where the aversive experience was administered when given a free choice. To study trade-offs, the paradigm can be adapted such that the location with the aversive stimulus also contains a reward that the alternative location lacks. In the present case, one location could be associated with a more severe but also briefer pain than the alternative location. Indifference between the two locations (e.g., not significantly different than 50% time in each location) would reveal the point at which a milder pain becomes just as bad as a more severe pain due to its greater duration.

Operant conditioning paradigms instead have subjects learn an association between a voluntary behavior and a desired outcome, allowing the experimenter to observe how much effort subjects are willing to expend to obtain a given amount of the reward (Jensen & Pedersen, 2008). In the present context, the experimenter could vary two pains by severity, and compute a severity ratio by observing how hard subjects are willing to work for relief from the two pains. This severity ratio could be multiplied by a duration ratio (computed by how long each of the pains are expected to occur in farmed settings) to compute the relative overall aversiveness of the two pains.

Attendees appeared to agree with us that classical conditioning paradigms would likely show that individuals care more about avoiding severe harms than long-lasting harms. This is because severe pain is normally indicative of an impending catastrophic outcome, and so its phenomenology probably evolved such that individuals have limited willpower to do anything other than avoid its continuance. (Of course, if individuals choose to avoid the long-lasting pain even in classical conditioning studies, then we would have particularly strong evidence that long-lasting pains are worse overall.) Although ignoring chronic pain also reduces fitness, temporarily prioritizing competing motives does not necessarily result in death, and may in fact be necessary to address a more immediate threat. For example, workshop attendees mentioned experiments showing that laboratory animals can voluntarily withstand some pain to meet a basic need (e.g., eating, drinking). This makes sense insofar as a failure to meet basic needs threatens survival and reproduction just as the causes of severe pain can. Also, some rewards suppress pain (Foo & Mason, 2005; Hargraves & Hentall, 2005), consistent with the possibility that animals have reduced willpower to ignore severe pain, even if doing so was necessary to satisfy basic needs.

Operant conditioning paradigms face the same willpower issue, at least if they require individuals to work in order to escape pain they are currently experiencing, as they do in escape learning paradigms. Escape learning also risks a different type of capacity issue— physical incapability instead of volitional incapability. At the extreme, the injury causing the more severe pain may make it physically impossible to complete the task that yields pain relief. Even putting gross mutilation aside, if completing the task required for escaping pain requires the individual to, say, use an appendage that was injured to inflict pain, then the cost of escaping pain would effectively be higher in the severe pain condition than in the mild pain condition.

A better operant conditioning paradigm would involve avoidance learning— working to reduce the duration or severity of an impending pain. Avoidance learning may elide the willpower issue to some degree because it is less clear that individuals are compelled to work harder to avoid a more severe outcome in the same way that they might have limited volition to proactively choose a more severe harm. Still, in order to adequately test whether brief severe pains are worse than milder pains that in real life last thousands, if not hundreds of thousands times longer, animals need to be able to work much, much harder for their preferred outcome. Workshop attendees had trouble thinking of ways within a single test session to provide the ability to work ~100,000 times harder to avoid one outcome over another. As a result, a study that compared two severity levels while keeping duration constant is at risk of vastly underestimating how much worse the more severe pain is. The solution might be making the pain of lesser severity much longer than the more severe pain, so that only small differences in willingness-to-work would be sufficient to show which pain is worse overall.

Gold standard tests are not necessary when measurement methods that have opposing biases yield converging results. Classical and operant conditioning paradigms do appear to have opposing biases– methods that rely on classical conditioning would probably show severe pain is worse than long-lasting pain, while methods that rely on operant conditioning would likely show that longer-lasting pain is overall worse— but the biases seem large enough to us that their results probably would not converge.

Behavioral experiments may lack external validity

Attendees believed it would be impractical to study the durations and the levels of pain severity that afflict farmed animals. For instance, one of the more promising experiment suggestions we heard involved comparing a 5% formalin injection (causing only 60-90 minutes of pain) to complete Freund’s adjuvant, which would induce hypersensitivity for only three weeks or so (Mogil, 2022).

More externally valid ideas quickly ran into feasibility issues. For instance, we talked about testing how much worse sepsis is for broiler breeders versus chronic underfeeding. Not only would it be unlikely to obtain approval from university ethical review boards to purposely give breeders peritonitis, but doing so would make it physically impossible for breeders to complete behavioral paradigms. Even underfed chickens would be difficult to study because their strong urge to forage interferes with learning (Buckley et al., 2011).

The practical difficulties of studying severe pain in a controlled experiment are problematic if there are decision-relevant non-linearities between different severity levels. For example, the pain-track model defines four severity categories–Annoying, Hurtful, Disabling, and Excruciating (Alonso & Schuck-Paim, 2021). For now, these categories have only an ordinal ranking, but suppose that operant conditioning paradigms revealed that animals were more strongly motivated to avoid Disabling pains and Hurtful pains of equal duration, but by less than an order of magnitude. If we regard these results as representative of the differences between all adjacent severity categories, then brief pains would not outweigh pains that last thousands of times longer, even if the former were Excruciating and the latter were merely Annoying. Of course, one could reasonably hypothesize that the difference between Excruciating and Disabling pain is far larger than the difference between lesser categories, but the practical difficulty of studying Excruciating pain renders the hypothesis speculative.

One attendee suggested a way around concerns about external validity: measure the welfare of animals throughout the actual production process, instead of inflicting welfare harms in a controlled setting using subjects set aside for research purposes. One formidable challenge would be identifying welfare proxies that are equally appropriate for harms that unfold on different timescales. This is clearest when considering harms that like slaughter, where humaneness is measured using proxies like reflexes (e.g., Jacobs et al., 2021); measuring overall well-being afterwards is obviously not possible (at least until researchers identify appropriate biomarkers that remain intact postmortem). In contrast, the effects of living conditions will have to be measured using proxies for overall welfare (e.g., pessimism bias; Ligasz et al., 2020), and ideally a composite of such proxies to account for measurement error. Nevertheless, there may be specific comparisons where the same measures would be equally valid for a brief, severe harm and a milder, longer-lasting harm (e.g., perhaps comparing the effects of transportation to slaughterhouses to living in a cage).

Our original hope was to find an experimental paradigm that was externally valid in the sense that it could be used in farmed animals and involves harms of comparable severity and duration to those that actually occur on industrial farms. The fact that the workshop did not yield any paradigms that possess both high internal and external validity has changed our mind about the value of running experiments that closely mimic conditions on industrial farms. The more realistic goal would be trying to obtain general evidence about the plausibility that the most severe harms can be several orders of magnitude worse than more moderate harms, as this is what it would take for a brief severe harm to morally matter as much very long-lasting harm (assuming we reject the idea some pains have lexical priority over others; see the Reasons to Prioritize Severity section of our original report). This approach would allow for studying species other than those farmed in large numbers because if the severity range is extremely wide for humans or rats, then it could also be as wide for chickens, fishes, and other farmed animals. At least, the burden of proof would seem to shift towards showing that pain experience has evolved differently in farmed animals.

Humans’ ability to report on their own experiences would give us the ability to study extreme scenarios. Gomez-Emilsson and Percy (2022) asked respondents how much worse their worst experience was than their second worse experience. The average ratio was over 100,000, which was apparently not due to careless responding (the average ratio for the strongest to second strongest pleasure was only 7.2). Although these results are far from definitive on their own, they slightly update us towards believing that the most severe pains are several orders of magnitude worse than lesser pains. Researchers can also ask humans follow-up clarification questions to address confounds that even controlled experiments have difficulty addressing. For instance, it would be easier to get assurance that humans understand the difference between an event of a very long duration and an event of a short duration than to confirm that nonhuman animals remember the magnitude of the difference.

We also suspect that studying laboratory animals is more feasible than studying in farmed animals because laboratory environments are more easily manipulated. For example, one of the more promising ideas at the workshop was using optogenetics to induce severe and long-lasting pain in rodents (Jarrin & Finn, 2019). To our knowledge, inducing pain to farmed animals using optogenetics is not yet feasible.

No gold standard biomarkers

A biomarker is a measurable physical property in an organism that is indicative of a less directly observable phenomenon, such as a disease or mental state. A biomarker for pain could avoid the methodological challenges of approaches that rely on self-report or revealed preferences if it provided objective information that allowed for comparisons of acute and chronic pain on the same metric. In particular, a biomarker would need to provide some quantitative value that reliably tracked the badness of pain in a way that aggregates over time.

To be useful, a given biomarker needs to have the properties of both sensitivity and specificity. That is, it needs to reliably track when pain is present and to what degree, and additionally it needs to not be activated by other, non-pain states. Pain states are correlated with a number of measurable physical properties such as increased heart rate, increased respiration, and galvanic skin response. However, while these features are reliably activated by pain, they are not activated exclusively by pain, and even if they were they are not suitable for measuring the degree of pain aggregated over time.

The most promising potential biomarkers would seem to be neural or brain-based biomarkers. For example, Wager et al. (2013) discovered a neurologic signature which can predict with 95% accuracy whether a patient is in pain or not based on brain activation patterns, even in novel populations. However, none of these so far reliably track the badness or even intensity of pain across all pain conditions. Of course, even if we did have a reliable neural biomarker for pain that could make predictions across humans, it would need to also have sensitivity and specificity when used across different species. And we would also always be faced with an extremely difficult epistemic challenge of determining whether we could trust that the result was actually indicating the aggregate “badness” of pain, given the lack of a gold standard to validate it against.

There currently is no biomarker of pain that would be suitable for answering our question, but we can still ask whether there is some potential biomarker that could, and whether there is a plausible path to creating such a biomarker. Most workshop participants were skeptical about the prospects of developing a neural biomarker that reliably tracked pain as current methods for measuring brain activity have not found areas with the requisite sensitivity and specificity, and in general the current dominant view in the cognitive sciences emphasize the brain’s plasticity (the degree to which mental states can be realized by different brain areas) and distributed nature.

The potential value of biomarkers cannot be entirely ruled out, however. For example, Corder et al. (2019) has found neurons in specific brain areas of mice that appear to be activated in conjunction with the unpleasantness of pain experience. These neural circuits are fine-grained enough that they are not detected by noninvasive brain imaging techniques, which could explain why previous brain imaging studies have failed to find a reliable biomarker. However, these explorations are at early stages, so it is clear that they would not be able to answer questions comparing chronic and acute pain in the near future.

Priors should favor prioritizing duration

In the absence of clear-cut evidence, how should stakeholders make triage decisions between brief, severe harms, and milder, longer-lasting ones? Here, attendees were split. One attendee felt that we could at least infer that, whatever amount of attention that brief, severe harms are currently receiving, it is too much. This is because the severe harms are more attention-grabbing, and thus seem more deserving of attention even if reflection or empirical evidence would show otherwise.

Other attendees pointed to the greater tractability of brief, severe harms as reasons to prioritize them. In particular, one attendee argued that it is both easier to expose brief, severe problems and to find practical solutions to them. Using band castration of cattle as an example, he pointed out that mild, chronic issues and their sequelae are often not readily visible to farmers, and are more cumbersome to solve because they have many manifestations over long spans of time that each require its own solution. In contrast, the acute pain of surgical castration is obvious and easier to mitigate.

The viewpoint we agreed with most was that stakeholders should assume relieving longer-lasting harms is more important until there is evidence that the brief harms farmed animals face really are thousands of times more severe, given that the former last thousands of times longer than the latter. (Note that the only severe harms that are likely to actually be brief in their total effects on experience are those that quickly result in death. Pains that are severe in their acute effects but milder in their chronic effects may be the worst pains of all.) We echo this perspective because the “mild” harms that farmed animals face are often still fairly severe in absolute terms (e.g., Schuck-Paim & Alonso, 2022). Consequently, the fact that we find it somewhat plausible that the worst pains are several orders of magnitude worse than the mildest pains is not decision-relevant. What matters is the plausibility that the worst pains are several orders of magnitude worse than moderate pains. Moderate pain is already sufficiently intense to make its relief a high priority (Alonso & Schuck-Paim, 2021), so it is unclear to us why such a high ceiling on severity would be necessary to guarantee that individuals do not ignore it. It could even be that such high intensities would interfere with the execution of non-reflexive behaviors that may be necessary to escape the pain stimulus.

The absolute severity of farmed animals’ chronic pains also makes prioritization of long-lasting pains somewhat robust to concerns about whether it is valid to aggregate pains of different severities. Even though we assign moderate credence to the idea that the aggregation of very mild pains (either over many individuals or over a long period of time) are too morally unimportant to outweigh the badness of a severe pain that is either brief or affects few individuals, we assign less credence in the claim that any and all pains above a certain threshold (e.g., the point at which pain is experienced as unbearable) are more morally urgent than any and all pains below that threshold. Put another way, we think that many of the chronic welfare harms on industrial farms are above the minimal severity threshold required to legitimize aggregate comparisons.

Nevertheless, we still think it is appropriate to put some non-trivial minority of resources, perhaps ~20%, towards addressing the most severe pains, even if they are brief. After all, we do not have any empirical demonstrations that the most severe pains are not actually thousands of times worse than moderate pains. Similarly, out of epistemic humility we put some credence on the weaker thesis that unbearable pain deserves special moral consideration.

Acknowledgments

This report is a project of Rethink Priorities–a think tank dedicated to informing decisions made by high-impact organizations and funders across various cause areas. It was written by William McAuliffe and Adam Shriver. Thanks to Open Philanthropy for sponsoring the workshop and spurring our interest in this topic. Thanks to Wladimir Alonso, Bob Fischer, and Daniela Waldhorn for helpful feedback.

If you are interested in RP’s work, please visit our research database and subscribe to our newsletter.

References

Alonso, W. J., & Schuck-Paim, C. (2021). Pain-Track: a time-series approach for the

description and analysis of the burden of pain. BMC Research Notes, 14(1), 229.

Buckley, L. A., McMillan, L. M., Sandilands, V., Tolkamp, B. J., Hocking, P. M., & D’eath, R.

B. (2011). Too hungry to learn? Hungry broiler breeders fail to learn a Y-maze food quantity discrimination task. Animal Welfare, 20(4), 469-481.

Corder, G., Ahanonu, B., Grewe, B. F., Wang, D., Schnitzer, M. J., & Scherrer, G. (2019). An

amygdalar neural ensemble that encodes the unpleasantness of pain. Science, 363(6424), 276-281.

Dixon, L. M., Sandilands, V., Bateson, M., Brocklehurst, S., Tolkamp, B. J., & D’Eath, R. B.

(2013). Conditioned place preference or aversion as animal welfare assessment tools: Limitations in their application. Applied Animal Behaviour Science, 148(1-2), 164-176.

Foo, H., & Mason, P. (2005). Sensory suppression during feeding. Proceedings of the National

Academy of Sciences, 102(46), 16865-16869.

Gómez-Emilsson, A., & Percy, C. (2022). The Heavy-Tailed Valence Hypothesis: The human

capacity for vast variation in pleasure/​pain and how to test it. https://​​psyarxiv.com/​​krysx/​​

Hargraves, W. A., & Hentall, I. D. (2005). Analgesic effects of dietary caloric restriction in

adult mice. Pain, 114(3), 455-461.

Jacobs, L., Bourassa, D. V., Boyal, R. S., Harris, C. E., Josselson, L. N. B., Campbell, A., … &

Buhr, R. J. (2021). Animal welfare assessment of on-farm euthanasia methods for individual, heavy turkeys. Poultry Science, 100(3), 100812.

Jarrin, S., & Finn, D. P. (2019). Optogenetics and its application in pain and anxiety research.

Neuroscience & Biobehavioral Reviews, 105, 200-211.

Jensen, M. B., & Pedersen, L. J. (2008). Using motivation tests to assess ethological needs

and preferences. Applied Animal Behaviour Science, 113(4), 340-356.

Lagisz, M., Zidar, J., Nakagawa, S., Neville, V., Sorato, E., Paul, E. S., … & Løvlie, H. (2020).

Optimism, pessimism and judgement bias in animals: a systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews, 118, 3-17.

McAuliffe, W.H.B. & Shriver, A. (2022). The Relative Importance of the Severity and

Duration of Pain. https://​​osf.io/​​ezvr2/​​

Mogil, J. S. (2022). The history of pain measurement in humans and animals. Frontiers in

Pain Research, 3, 1031058.

Shuck-Paim, C. & Alonso, W. (2022). Quantifying pain in broiler chickens.

https://​​welfarefootprint.org/​​broilers/​​

Wager, T. D., Atlas, L. Y., Lindquist, M. A., Roy, M., Woo, C. W., & Kross, E. (2013). An

fMRI-based neurologic signature of physical pain. New England Journal of Medicine, 368(15), 1388-1397.