AI writing seems to be pretty poor so far, and I keep trying to use it for things. But it recently did play a role in helping me turn my jumbled bullet points into better writing.
In principle up for some sort of cheap bet. However I have mostly stopped working on this now and handed back to Vicky for review and implementation so have very very limited time to and headspace for more work, or defining a bet or, reviewing data collection, etc. Actually mostly trying not to think about this as much as possible for the next 2 months so if there was a bet it would be saying sure I bet $100 and I trust you to work out a fair answer without needing me and let me know in 2 months.
If you did want to work with Rethink to test this:
The aim should be to test the ratio 50:1 not the specific length of time (12 min). If this research was being done well there could be a case for asking about different lengths of time and seeing how that varies responses (FWIW based on Welfare Footprint the periods of times most animals spend in excruciating pain is 10-15 seconds during slaughter so that would be the most useful anchor if a time is needed).
I expect how you ask the question makes all the difference, I think phrased one way I would easily win and another I would easily loose. Similarly words like âtortureâ have more weight than words like â9.5 out of 10 on the pain scaleâ. I read one paper where they did an iterative approach with face to face interviews to get into what people think rather than trust immediate survey responses and that showed that in the face to face interviews people were more pain averse than in a quick survey but more so at all levels of pain (if anything the ratio between mild and sever was less steep). Here is a fun exercise I wrote for myself.
I think practically everyone would prefer 10 h of hurtful pain over 12 min of excruciating pain under WFIâs definitions. Do you disagree?
I disagree.
It looks like on average people would be indifferent between 10 h of hurtful pain over 12 min of excruciating pain. People are diverse and there would be very high variation and very strong views in both directions, but some people (such as a noticeable minority of women in the cited study) would prefer short sharp very painful fix over ongoing pain.
(One possible source of error here is I might have systematically miscalibrated the welfare footprint pain scale. I connected âhurtfulâ to 4.8 and âexcruciatingâ to 10 on a 0 to 10 scale. It could be good to get estimates on this from others.)
I agree AIM 2025 SADS were below ideal robustness and as such I have spent much of the last few weeks doing additional research to improve the pain scaling estimates. If you have time and want to review this then let me know.
I would be interested in Rethink Priorities or others doing additional work on this topic.
AGREE ON THE LIMITS OF CONDENSING TO A SINGLE NUMBER
I have adapted the 2026 SAD model to give outputs at the four different pain levels, as well as a single aggregated number. This should help users of the model make their own informed decisions and not just focus on the one number.
I DISAGREE ON NOT USING THE RESEARCH WE HAVE
Where I disagree is where you say we basically have no idea how to compare different levels of pain, and your suggestion that we should not be doing so.
Every time we make a decision EG to focus on an issue of stocking densities rather than slaughter methods, we are ultimately deciding to focus on less extreme but longer lasting forms of pain. Being as explicit as we can about our numbers and our thinking helps us make those decisions better (as long as we donât overly rely on a single number).
We do have some data and we should use it to inform our decisions and our numbers. This includes academic studies of people in pain, including those with severe conditions, and self-reports from people who experienced extreme pain.
DISAGREE ON NOT BELIEVING PEOPLE
Less important but: I also disagree on your suggestion not to trust the standard academic approach of asking about /â peopleâs responses on âworse pain imaginableâ. Maybe sometimes people overestimate how bad that is sometimes underestimate it. You seem to be claiming these women (or the whole public) are en mass systematically underestimating. That is a strong claim and not one I would put much weight on without good evidence.
Yes that are (often known) systematic over and underestimation effects. This can be addressed by asking similar questions in different ways that aim to elicit different biases, or by having a back and forth between questioner and respondent to seek consistency.
If research does not match our intuitions we need to be objective in judging the value of that research and not claim systematic bias without evidence.
Hi Vasco. Firstly, it should be noted that the overall ratio used for the 2025 SADs was 1000x not 7x. The updated 2026 ratio based on more extensive research is 50x.
Secondly on âI do not see how one would be indifferent between theseâ. You might be surprised if it does not match your personal experience, but many people are indifferent between relatively extreme levels of pain, including people who have been through quite extreme pain. Just as an example this study on 37 women who have just gone through labour, roughly one third of them would prefer a 9â10 pain for 2 hours than a 1â10 pain for 18 hours!
Finally, I defend putting at least some weight on counter-intuitive results of academic research. I especially defend this in the case where you are analysing and pooling the results of many papers and expect some results to be bias upwards and some results to be bias downwards. The new SAD spreadsheet links to 15 different studies /â pieces of evidence on this topic. Of those 15 some of which show counterintuitively low and some counterintuitively high relative preferences for different levels of pain. I think it is better to put weight on all of them based on the quality of the evidence they present not be (overly) guided by an intuitive sense of the results we want to find.
With current technology probably not an x-risk. With future technology I donât think we can rule out the possibility of bio-sciences reaching the point where extinction is possible. It is a very rapidly evolving field with huge potential.
I think people working on animal welfare have more incentives to post during debate week than people working on global health.
The animal space feels (when you are in it) very funding constrained, especially compared to working in the global health and development space (and I expect gets a higher % of funding from EA /â EA-adjacent sources). So along comes debate week and all the animal folk are very motivated to post and make their case and hopefully shift a few $. This could somewhat bias the balance of the debate. (Of course the fact that one side of the debate feels they needs funding so much more is in itself relevant to the debate.)
Hi there, I was wondering what you mean by âreal estate speculationâ: what the issue is and in what ways it is tractable? Thank you for any insights you can give, hoping to do some research into housing issues in LMICs :-)
To be clear I would consider the target of the campaign in those cases to be on the hospital or the university and those to be B2C organizations in some meaningful way.
Additionally if you want to show that you can credibly engage policymakers (which I think you might need to do in order to put pressure on these companies) I would expect transparency of people and funding sources to help a lot.
What are the key leverage points to get these companies to listen to campaigners such as yourself? How does this differ from the animal right space and how will this affect your strategy? What do you have in terms of strategy documents or theory of change?
Some thoughts on my mind are:
To the best of my understanding the animal rights corporate campaigning space is unable to exert much or any influence on B2B (business to business) companies. Animal campaigns only appear to have influenced B2C (business to consumer) companies. An autonomous coding agent feels more B2B and by analogy having any influence here could be extremely difficult. That said I donât think this should be a huge problem as...
The leverage points for influencing companies in the AI space is very different to the animal space. In particular AI companies are probably much more concerned about losing employees to other companies than food companies. I expect they are also likely concerned about regulation that could restrict their actions. I expect there much less concerned about public image. As such..
This does suggest to somewhat different approach to corporate campaign. Potentially targeting employees more (although probably not picking on individuals) and greater focus on presenting the targeted company negatively to regulators/âpolicymakers or to investors, more than to the public.
This is just quick thoughts and I might be wrong about much of this. I just wanted to flag as your post seemed to suggest that this work would be similar to work in the animal space and in many ways it is but I think thereâs a risk of not seeing the differences. I wish you all the best of luck with your campaigning.
Hi, Thank you. All good points. Fully agree with ongoing iterative improvement to our CEAs and hopefully you will see such improvements happening over the various research rounds (see also my reply to Nick). I also agree with picking up on specific cases where this might be a bigger issue (see my reply to Larks). I donât think it is fair to say that we treat those two numbers as zero but it is fair to say we are currently using a fairly crude approximation to get at what those numbers are getting it in our lives saved calculations.
Hi Nick, Thank you very much for the comment. These are all good points.
I fully agree with you and Larks that where a specific intervention will have reduced impact due to long run health effects this should be included in our models and I will check this is happening.
I apologise for the defensiveness and made a few minor edits to the post trying to keep content the same.
Thatâs not a reason not to continuously be improving models.
To be clear, we are always always improving our CEA models. This is an ongoing iterative process, and my hope is they get better year upon year. However, I guess I donât have confidence right now that a â10% change to this number is actually improving the model or affecting our decision making.
If we dive into these numbers just a bit, I immediately notice that the discount rate in the GBD data is higher than ours and that should suggest that, if we are adjusting these numbers, that probably we want a significant +increase not decrease. But that then raises the question of what discount rate we are using and why, which has a huge effect on some of the models â and this is something there are currently internal debates in the team about, and we are looking at changing. But this then raises a question about how to represent the uncertainty about these numbers in our models and ensure the decision makers and readers are more aware of the inherent estimations that can have big effect on CEA outputs â and improving this is probably towards the top of my list.
Thank you Larks. This is a very good point and I fully agree.
In any cases where this happens it should be incorporated into our current model. That said I will check this for our current research and make sure that in any such cases (such as say pulmonary rehabilitation for COPD where patients are expected to have a lower quality of life if they survive) this is accounted for.
Note that Charity Entrepreneurship (CE) has now rebranded to AIM to reflect our widening scope of programs
[Edited for tone]
Thank you so much for engaging with our work in this level of detail. It is great to get critical feedback and analysis like this. I have made a note of this point on my long list of things to improve about how we do our CEAs, although for the reasons I explain below it is fairly low down on that list.
Ultimately what we are using now is a very crude approximation. That said it is one I am extremely loath to start fiddling without putting the effort in to do this well.
You are right that the numbers used for comparing deaths and disability are a fairly crude approximation. A reasonable change in moral weights can lead to a large change in the comparison between YLDs and YLLs. Consider that when GiveWell last reviewed their moral weights (between 2019 and 2020) they increased the value of an under-5 life saved compared to YLDs by +68% (from 100â3.3 to 116.9/â2.3). Another very valid criticism is that (as you point out) the current numbers we are using are calculated with a 3% discount rate, yet we are now using a 1.5% discount rate for health effects, so perhaps to ensure consistency we should increase the numbers by +42%ish. Or taking the HLI work on the value of death seriously could suggest a huge decrease of â50% or more. The change you suggest would be nice but I think getting this right really needs a lot of work.
Right now I am uncertain how best to update these numbers. A minus â10% change is reasonable but so are many other changes. I would very much like AIM to have our own calculated moral weightings that account for various factors, including life expectancy, a range of ethical views, quality of life, beneficiary preferences, etc. However getting this correct is a complicated and lengthy process. This is on the to-do list but has not happened yet unfortunately.
So what do we do in the meantime:
We use numbers that seem justifiable, close to what I understand as standard and reasonably acceptable within Global Health and Development (from here table 5.1, I believe have been used by GW DCP2 and GBD etc). These numbers are also close to (but somewhat below) a very crude staff survey we did on the moral weight of saving a life. That said I admit I would be interested in updates of what organisations are currently using.
We are aware of the limits of our CEAs and use them cautiously in our decision making process, and would encourage others to be cautious about over relying on them. We have written about this here: https://ââwww.charityentrepreneurship.com/ââcea. We are well aware in making decisions that some of the numbers used to compare different kinds of interventions rest on a lot of shaky assumptions.
We tend to try to pick a range of interventions across reasonable moral weights and moral views. We will try to pick some interventions that save lives, some that improve health, some that improve lives in other ways. That said I expect that maybe we have over (or under) valued lives saved.
Ultimately I believe that this is sufficient for the level of decision making we need to make.
I hope that someday soon we have the time to work this out in detail.
ACTIONS. ⢠[Edited: I wont change anything straight away as the model as a bunch of modelling in this research round has already been done, and for now I would rather use numbers I can back up with a source than numbers that are tweaked for one reason but not another reason.] ⢠I have added a note about the point you raise to our internal list of ways to improve our CEAs. [Edit: I really would like to make some changes here going forward. I expect that if I put a few hours into this the number is more likely to go up than down given the discount rate difference (and the staff survey).] ⢠I might also do some extra sensitivity analysis on our CEAs to highlight the uncertainty around this factor and ensure it is flagged to decision makers.
Antony, If you are looking for early stage funding and support for your charity or a project if it you could consider applying to the charity entrepreneurship program when applications re-open in a few months. There is an option to apply with your own idea.
weeatquinceđ¸
In principle up for some sort of cheap bet. However I have mostly stopped working on this now and handed back to Vicky for review and implementation so have very very limited time to and headspace for more work, or defining a bet or, reviewing data collection, etc. Actually mostly trying not to think about this as much as possible for the next 2 months so if there was a bet it would be saying sure I bet $100 and I trust you to work out a fair answer without needing me and let me know in 2 months.
If you did want to work with Rethink to test this:
The aim should be to test the ratio 50:1 not the specific length of time (12 min). If this research was being done well there could be a case for asking about different lengths of time and seeing how that varies responses (FWIW based on Welfare Footprint the periods of times most animals spend in excruciating pain is 10-15 seconds during slaughter so that would be the most useful anchor if a time is needed).
I expect how you ask the question makes all the difference, I think phrased one way I would easily win and another I would easily loose. Similarly words like âtortureâ have more weight than words like â9.5 out of 10 on the pain scaleâ. I read one paper where they did an iterative approach with face to face interviews to get into what people think rather than trust immediate survey responses and that showed that in the face to face interviews people were more pain averse than in a quick survey but more so at all levels of pain (if anything the ratio between mild and sever was less steep). Here is a fun exercise I wrote for myself.
I disagree.
It looks like on average people would be indifferent between 10 h of hurtful pain over 12 min of excruciating pain. People are diverse and there would be very high variation and very strong views in both directions, but some people (such as a noticeable minority of women in the cited study) would prefer short sharp very painful fix over ongoing pain.
(One possible source of error here is I might have systematically miscalibrated the welfare footprint pain scale. I connected âhurtfulâ to 4.8 and âexcruciatingâ to 10 on a 0 to 10 scale. It could be good to get estimates on this from others.)
Thank you Vasco
AGREE ON THERE BEING SOME VALUE FOR MORE RESEARCH
I agree AIM 2025 SADS were below ideal robustness and as such I have spent much of the last few weeks doing additional research to improve the pain scaling estimates. If you have time and want to review this then let me know.
I would be interested in Rethink Priorities or others doing additional work on this topic.
AGREE ON THE LIMITS OF CONDENSING TO A SINGLE NUMBER
I have adapted the 2026 SAD model to give outputs at the four different pain levels, as well as a single aggregated number. This should help users of the model make their own informed decisions and not just focus on the one number.
I DISAGREE ON NOT USING THE RESEARCH WE HAVE
Where I disagree is where you say we basically have no idea how to compare different levels of pain, and your suggestion that we should not be doing so.
Every time we make a decision EG to focus on an issue of stocking densities rather than slaughter methods, we are ultimately deciding to focus on less extreme but longer lasting forms of pain. Being as explicit as we can about our numbers and our thinking helps us make those decisions better (as long as we donât overly rely on a single number).
We do have some data and we should use it to inform our decisions and our numbers. This includes academic studies of people in pain, including those with severe conditions, and self-reports from people who experienced extreme pain.
DISAGREE ON NOT BELIEVING PEOPLE
Less important but: I also disagree on your suggestion not to trust the standard academic approach of asking about /â peopleâs responses on âworse pain imaginableâ. Maybe sometimes people overestimate how bad that is sometimes underestimate it. You seem to be claiming these women (or the whole public) are en mass systematically underestimating. That is a strong claim and not one I would put much weight on without good evidence.
Yes that are (often known) systematic over and underestimation effects. This can be addressed by asking similar questions in different ways that aim to elicit different biases, or by having a back and forth between questioner and respondent to seek consistency.
If research does not match our intuitions we need to be objective in judging the value of that research and not claim systematic bias without evidence.
Hi Vasco. Firstly, it should be noted that the overall ratio used for the 2025 SADs was 1000x not 7x. The updated 2026 ratio based on more extensive research is 50x.
Secondly on âI do not see how one would be indifferent between theseâ. You might be surprised if it does not match your personal experience, but many people are indifferent between relatively extreme levels of pain, including people who have been through quite extreme pain. Just as an example this study on 37 women who have just gone through labour, roughly one third of them would prefer a 9â10 pain for 2 hours than a 1â10 pain for 18 hours!
Finally, I defend putting at least some weight on counter-intuitive results of academic research. I especially defend this in the case where you are analysing and pooling the results of many papers and expect some results to be bias upwards and some results to be bias downwards. The new SAD spreadsheet links to 15 different studies /â pieces of evidence on this topic. Of those 15 some of which show counterintuitively low and some counterintuitively high relative preferences for different levels of pain. I think it is better to put weight on all of them based on the quality of the evidence they present not be (overly) guided by an intuitive sense of the results we want to find.
Thank you done.
UK political animal welfare work
I believe it is a relatively common beekeeping practice to clip a wing of the queen bee to prevent the colony leaving
I think people working on animal welfare have more incentives to post during debate week than people working on global health.
The animal space feels (when you are in it) very funding constrained, especially compared to working in the global health and development space (and I expect gets a higher % of funding from EA /â EA-adjacent sources). So along comes debate week and all the animal folk are very motivated to post and make their case and hopefully shift a few $. This could somewhat bias the balance of the debate. (Of course the fact that one side of the debate feels they needs funding so much more is in itself relevant to the debate.)
Hi there, I was wondering what you mean by âreal estate speculationâ: what the issue is and in what ways it is tractable? Thank you for any insights you can give, hoping to do some research into housing issues in LMICs :-)
No this seems more than just semantic. It does seem like Iâve underestimated the ability to influence B2B companies. I stand corrected. Thank you.
Thank you for considering my comments
To be clear I would consider the target of the campaign in those cases to be on the hospital or the university and those to be B2C organizations in some meaningful way.
Additionally if you want to show that you can credibly engage policymakers (which I think you might need to do in order to put pressure on these companies) I would expect transparency of people and funding sources to help a lot.
What are the key leverage points to get these companies to listen to campaigners such as yourself? How does this differ from the animal right space and how will this affect your strategy? What do you have in terms of strategy documents or theory of change?
Some thoughts on my mind are:
To the best of my understanding the animal rights corporate campaigning space is unable to exert much or any influence on B2B (business to business) companies. Animal campaigns only appear to have influenced B2C (business to consumer) companies. An autonomous coding agent feels more B2B and by analogy having any influence here could be extremely difficult. That said I donât think this should be a huge problem as...
The leverage points for influencing companies in the AI space is very different to the animal space. In particular AI companies are probably much more concerned about losing employees to other companies than food companies. I expect they are also likely concerned about regulation that could restrict their actions. I expect there much less concerned about public image. As such..
This does suggest to somewhat different approach to corporate campaign. Potentially targeting employees more (although probably not picking on individuals) and greater focus on presenting the targeted company negatively to regulators/âpolicymakers or to investors, more than to the public.
This is just quick thoughts and I might be wrong about much of this. I just wanted to flag as your post seemed to suggest that this work would be similar to work in the animal space and in many ways it is but I think thereâs a risk of not seeing the differences. I wish you all the best of luck with your campaigning.
Hi, Thank you. All good points. Fully agree with ongoing iterative improvement to our CEAs and hopefully you will see such improvements happening over the various research rounds (see also my reply to Nick). I also agree with picking up on specific cases where this might be a bigger issue (see my reply to Larks). I donât think it is fair to say that we treat those two numbers as zero but it is fair to say we are currently using a fairly crude approximation to get at what those numbers are getting it in our lives saved calculations.
For a source on discounting see here: https://âârethinkpriorities.org/ââpublications/ââa-review-of-givewells-discount-rate#we-recommend-that-givewell-continue-discounting-health-at-a-lower-rate-than-consumption-but-we-are-uncertain-about-the-precise-discount-rate
âDiscounting consumption vs. health benefits | Discount health benefits using only the temporal uncertainty componentâ
Hi Nick, Thank you very much for the comment. These are all good points.
I fully agree with you and Larks that where a specific intervention will have reduced impact due to long run health effects this should be included in our models and I will check this is happening.
I apologise for the defensiveness and made a few minor edits to the post trying to keep content the same.
To be clear, we are always always improving our CEA models. This is an ongoing iterative process, and my hope is they get better year upon year. However, I guess I donât have confidence right now that a â10% change to this number is actually improving the model or affecting our decision making.
If we dive into these numbers just a bit, I immediately notice that the discount rate in the GBD data is higher than ours and that should suggest that, if we are adjusting these numbers, that probably we want a significant +increase not decrease. But that then raises the question of what discount rate we are using and why, which has a huge effect on some of the models â and this is something there are currently internal debates in the team about, and we are looking at changing. But this then raises a question about how to represent the uncertainty about these numbers in our models and ensure the decision makers and readers are more aware of the inherent estimations that can have big effect on CEA outputs â and improving this is probably towards the top of my list.
Thank you Larks. This is a very good point and I fully agree.
In any cases where this happens it should be incorporated into our current model. That said I will check this for our current research and make sure that in any such cases (such as say pulmonary rehabilitation for COPD where patients are expected to have a lower quality of life if they survive) this is accounted for.
Hi there. I am Research Director at CE/âAIM
Note that Charity Entrepreneurship (CE) has now rebranded to AIM to reflect our widening scope of programs
[Edited for tone]
Thank you so much for engaging with our work in this level of detail. It is great to get critical feedback and analysis like this. I have made a note of this point on my long list of things to improve about how we do our CEAs, although for the reasons I explain below it is fairly low down on that list.
Ultimately what we are using now is a very crude approximation. That said it is one I am extremely loath to start fiddling without putting the effort in to do this well.
You are right that the numbers used for comparing deaths and disability are a fairly crude approximation. A reasonable change in moral weights can lead to a large change in the comparison between YLDs and YLLs. Consider that when GiveWell last reviewed their moral weights (between 2019 and 2020) they increased the value of an under-5 life saved compared to YLDs by +68% (from 100â3.3 to 116.9/â2.3). Another very valid criticism is that (as you point out) the current numbers we are using are calculated with a 3% discount rate, yet we are now using a 1.5% discount rate for health effects, so perhaps to ensure consistency we should increase the numbers by +42%ish. Or taking the HLI work on the value of death seriously could suggest a huge decrease of â50% or more. The change you suggest would be nice but I think getting this right really needs a lot of work.
Right now I am uncertain how best to update these numbers. A minus â10% change is reasonable but so are many other changes. I would very much like AIM to have our own calculated moral weightings that account for various factors, including life expectancy, a range of ethical views, quality of life, beneficiary preferences, etc. However getting this correct is a complicated and lengthy process. This is on the to-do list but has not happened yet unfortunately.
So what do we do in the meantime:
We use numbers that seem justifiable, close to what I understand as standard and reasonably acceptable within Global Health and Development (from here table 5.1, I believe have been used by GW DCP2 and GBD etc). These numbers are also close to (but somewhat below) a very crude staff survey we did on the moral weight of saving a life. That said I admit I would be interested in updates of what organisations are currently using.
We are aware of the limits of our CEAs and use them cautiously in our decision making process, and would encourage others to be cautious about over relying on them. We have written about this here: https://ââwww.charityentrepreneurship.com/ââcea. We are well aware in making decisions that some of the numbers used to compare different kinds of interventions rest on a lot of shaky assumptions.
We tend to try to pick a range of interventions across reasonable moral weights and moral views. We will try to pick some interventions that save lives, some that improve health, some that improve lives in other ways. That said I expect that maybe we have over (or under) valued lives saved.
Ultimately I believe that this is sufficient for the level of decision making we need to make.
I hope that someday soon we have the time to work this out in detail.
ACTIONS.
⢠[Edited: I wont change anything straight away as the model as a bunch of modelling in this research round has already been done, and for now I would rather use numbers I can back up with a source than numbers that are tweaked for one reason but not another reason.]
⢠I have added a note about the point you raise to our internal list of ways to improve our CEAs. [Edit: I really would like to make some changes here going forward. I expect that if I put a few hours into this the number is more likely to go up than down given the discount rate difference (and the staff survey).]
⢠I might also do some extra sensitivity analysis on our CEAs to highlight the uncertainty around this factor and ensure it is flagged to decision makers.
So thank you for raising this.
Antony, If you are looking for early stage funding and support for your charity or a project if it you could consider applying to the charity entrepreneurship program when applications re-open in a few months. There is an option to apply with your own idea.
See https://ââwww.charityentrepreneurship.com/ââ
(Disclaimer commenting in a personal capacity)