I found this report very helpful as someone who mostly thinks about measuring the well-being of humans. I think it lays things out nicely : philosophical foundations, then the types of measurement instruments, ending with a discussion of “state of the art” of what’s commonly used.
I also appreciate how the report lays out assessments of reliability, validity and interpersonal comparisons of utility for each class of welfare indicators. However, I think a reader like myself would feel more oriented if each section concluded with a summary assessment of the measure.
This is obviously difficult. Ideally we’d have some sort of model like
Where we just plug in values 0-1 into each parameter and presto get the best measures—but I’m not sure if that’s coherent.
It only seemed like I got a sense of your overall judgement in the executive summary and somewhat in the concluding discussion. And for this overall judgement I would have enjoyed reading more reasoning (like how much better are physical health and behavioral indicators than physiological indicators?)
You say:
I argue that rather than relying on any given assessment the best solution is to use a combination of methods that rely on different techniques. The ideal system would use a combination of qualitative measures, expert opinion based measures, an index of animal-based measures, and standalone measures such as preference testing or qualitative behavioural assessment.
But I’m not sure I follow. Would the ideal system use a combination? I think the ideal system would have a single measure that perfectly tracks what matters, no? Could you explain what you’re thinking here?
My last question is: what are y’all’s thoughts on making across species comparisons? This is the question that really interests me, and most of these indicators presented seem to be much, much more suitable to within species assessments of welfare.
I agree that in hindsight a summary of each indicator would probably have been useful to provide the reader with an overall assessment given the information I reviewed in the report.
That model is roughly the way I was thinking of this assessment, with validity and interpersonal comparisons being how much I would update on a perfectly accurate measure, and reliability giving some sense how wide the confidence interval would be from a real world measurement. The trade off of these, between groups of indicators and individual indicators, adds some nuance so that a single physiological measure is reliable but can vary due to numerous other factors but a combination of them allows us to measure the welfare benefit of things that health can’t capture easily. For example, it may be better to minimise disease rates instead of blood glucose levels if given no context but disease rates would be unable to assess the importance of different types of environmental enrichment.
If more people comment to express interest in an overview of each section, I am happy to invest the time to go back through the report to add in these sections.
I think the ideal system would have a single measure that perfectly tracks what matters, no?
I definitely agree, which is partially why I put an example of self-reports in humans (which are in my opinion as close to ideal as we can get) alongside the measures we have available in other animals. This is what I currently view as the best available (‘ideal’) system given the weaker methods available.
My last question is: what are y’all’s thoughts on making across species comparisons? This is the question that really interests me, and most of these indicators presented seem to be much, much more suitable to within species assessments of welfare.
In this context, many of these indicators struggle on cross species comparisons. Take cortisol for example, where different species have different cortisol levels, making it difficult to compare levels or even percentage changes across species. We can gain some sense of the relative importance of different improvements or events for an individual from the degree of change of an indicator. An example of this within an operant test could be a human showing a mild preference for social contact vs food compared to a fox who shows the opposite relationship. Yet, this still only gives us information within the range of their utility functions and doesn’t tell us how their ranges compare. It’s a challenging question and due to this we have mostly been deferring to Rethink Priorities’s work on moral weights and Open Philanthropy Project’s report on consciousness. At the moment, we approach this by using an assessment of an animal’s quality of life, to gauge how important an improvement is within an individual’s utility function, and then adjust this based on these considerations. However, I would be cautious about concluding that an ask is more promising if the deciding factors are based across species comparisons given the range of plausible views on the topic.
Hi George (presumably),
I found this report very helpful as someone who mostly thinks about measuring the well-being of humans. I think it lays things out nicely : philosophical foundations, then the types of measurement instruments, ending with a discussion of “state of the art” of what’s commonly used.
I also appreciate how the report lays out assessments of reliability, validity and interpersonal comparisons of utility for each class of welfare indicators. However, I think a reader like myself would feel more oriented if each section concluded with a summary assessment of the measure.
This is obviously difficult. Ideally we’d have some sort of model like
wellbeingmeasured=accuracy∗importance=(reliability∗cardinality)∗(validity∗wellbeingaccount)
Where we just plug in values 0-1 into each parameter and presto get the best measures—but I’m not sure if that’s coherent.
It only seemed like I got a sense of your overall judgement in the executive summary and somewhat in the concluding discussion. And for this overall judgement I would have enjoyed reading more reasoning (like how much better are physical health and behavioral indicators than physiological indicators?)
You say:
But I’m not sure I follow. Would the ideal system use a combination? I think the ideal system would have a single measure that perfectly tracks what matters, no? Could you explain what you’re thinking here?
My last question is: what are y’all’s thoughts on making across species comparisons? This is the question that really interests me, and most of these indicators presented seem to be much, much more suitable to within species assessments of welfare.
Please keep up the great work!
Hello Joel,
I agree that in hindsight a summary of each indicator would probably have been useful to provide the reader with an overall assessment given the information I reviewed in the report.
That model is roughly the way I was thinking of this assessment, with validity and interpersonal comparisons being how much I would update on a perfectly accurate measure, and reliability giving some sense how wide the confidence interval would be from a real world measurement. The trade off of these, between groups of indicators and individual indicators, adds some nuance so that a single physiological measure is reliable but can vary due to numerous other factors but a combination of them allows us to measure the welfare benefit of things that health can’t capture easily. For example, it may be better to minimise disease rates instead of blood glucose levels if given no context but disease rates would be unable to assess the importance of different types of environmental enrichment.
If more people comment to express interest in an overview of each section, I am happy to invest the time to go back through the report to add in these sections.
I definitely agree, which is partially why I put an example of self-reports in humans (which are in my opinion as close to ideal as we can get) alongside the measures we have available in other animals. This is what I currently view as the best available (‘ideal’) system given the weaker methods available.
In this context, many of these indicators struggle on cross species comparisons. Take cortisol for example, where different species have different cortisol levels, making it difficult to compare levels or even percentage changes across species. We can gain some sense of the relative importance of different improvements or events for an individual from the degree of change of an indicator. An example of this within an operant test could be a human showing a mild preference for social contact vs food compared to a fox who shows the opposite relationship. Yet, this still only gives us information within the range of their utility functions and doesn’t tell us how their ranges compare. It’s a challenging question and due to this we have mostly been deferring to Rethink Priorities’s work on moral weights and Open Philanthropy Project’s report on consciousness. At the moment, we approach this by using an assessment of an animal’s quality of life, to gauge how important an improvement is within an individual’s utility function, and then adjust this based on these considerations. However, I would be cautious about concluding that an ask is more promising if the deciding factors are based across species comparisons given the range of plausible views on the topic.
Thanks for the feedback
George,
You’re welcome. I’m excited to see what comes next!