How did you decide what to set for “moderate” levels of (difference-making) risk aversion? Would you consider setting this based on surveys?
Is there a way to increase the sample size? It’s 150,000 by default, and you say it takes billions of samples to see the dominance of x-risk work.
Only going 1000 years into the future seems extremely short for x-risk interventions by default if we’re seriously entertaining expectational total utilitarianism and longtermism. It also seems several times too long for the “common-sense” case for x-risk reduction.
I’m surprised the chicken welfare interventions beat the other animal welfare interventions on risk neutral EV maximization, and do so significantly. Is this a result you’d endorse? This seems to be the case even if I assign moral weights of 1 to black soldier flies, conditional on their sentience (without touching sentience probabilities). If I do the same for shrimp ammonia interventions and chicken welfare interventions, then the two end up with similar cost-effectiveness, but chicken welfare still beats the other animal interventions several times, including all of the animal welfare research projects (with default parameters). Unless the marginal returns to additional chicken welfare work are much lower (and maybe that’s the issue?), it suggests we shouldn’t even bother with the others, unless we can find much higher leverage interventions. Maybe certifier outreach? Or working on farmed insect welfare standards at the EU or US level? The number of targeted individuals born every year for the BSF intervention seems pretty low to me. There are individual farms that will farm more insects per year.
It seems the AI Misalignment Megaproject is more likely to fail (with the same probability of backfire conditional on failing) than the Small-scale AI Misalignment Project. Why is that? I would expect a lower chance of doing nothing, but a higher chance of success and a higher chance of backfire.
2. Is there a way to increase the sample size? It’s 150,000 by default, and you say it takes billions of samples to see the dominance of x-risk work.
There will be! We hope to release an update in the following days, implementing the ability to change the sample size, and allowing billions of samples. This was tricky because it required some optimizations on our end.
3. Only going 1000 years into the future seems extremely short for x-risk interventions by default if we’re seriously entertaining expectational total utilitarianism and longtermism. It also seems several times too long for the “common-sense” case for x-risk reduction.
We were divided on selecting a reasonable default here, and I agree that a shorter default might be more reasonable for the latter case. This was more of a compromise solution, but I think we could pick either perspective and stick with it for the defaults.
That said, I want to emphasize that all default assumptions in CCM should be taken lightly, as we were focused on making a general tool, instead of refining (or agreeing upon) our own particular assumptions.
5. It seems the AI Misalignment Megaproject is more likely to fail (with the same probability of backfire conditional on failing) than the Small-scale AI Misalignment Project. Why is that? I would expect a lower chance of doing nothing, but a higher chance of success and a higher chance of backfire.
As with (3), I agree with your reasoning, and we’ll probably be updating some of these template projects soon, but I would encourage you to tweak these assumptions to match yours.
Hi Michael, here are some additional answers to your questions:
1. I roughly calibrated the reasonable risk aversion levels based on my own intuition and using a Twitter poll I did a few months ago: https://x.com/Laura_k_Duffy/status/1696180330997141710?s=20. A significant number (about a third of those who are risk averse) of people would only take the bet to save 1000 lives vs. 10 for certain if the chance of saving 1000 was over 5%. I judged this a reasonable cut-off for the moderate risk aversion level.
4. The reason the hen welfare interventions are much better than the shrimp stunning intervention is that shrimp harvest and slaughter don’t last very long. So, the chronic welfare threats that ammonia concentrations battery cages impose on shrimp and hens, respectively, outweigh the shorter-duration welfare threats of harvest and slaughter.
The number of animals for black soldier flies is low, I agree. We are currently using estimates of current populations, and this estimate is probably much lower than population sizes in the future. We’re only somewhat confident in the shrimp and hens estimates, and pretty uncertain about the others. Thus, I think one should feel very much at liberty to plug in different numbers for population sizes for animals like black soldier flies.
More broadly, I think this result is likely a limitation of models based on total population size, versus models that are based more on the number of animals affected per campaign. Ideally, as we gather more information about these types of interventions, we could assess the cost-effectiveness using better estimates of the number of animals affected per campaign.
Thanks for doing this!
Some questions and comments:
How did you decide what to set for “moderate” levels of (difference-making) risk aversion? Would you consider setting this based on surveys?
Is there a way to increase the sample size? It’s 150,000 by default, and you say it takes billions of samples to see the dominance of x-risk work.
Only going 1000 years into the future seems extremely short for x-risk interventions by default if we’re seriously entertaining expectational total utilitarianism and longtermism. It also seems several times too long for the “common-sense” case for x-risk reduction.
I’m surprised the chicken welfare interventions beat the other animal welfare interventions on risk neutral EV maximization, and do so significantly. Is this a result you’d endorse? This seems to be the case even if I assign moral weights of 1 to black soldier flies, conditional on their sentience (without touching sentience probabilities). If I do the same for shrimp ammonia interventions and chicken welfare interventions, then the two end up with similar cost-effectiveness, but chicken welfare still beats the other animal interventions several times, including all of the animal welfare research projects (with default parameters). Unless the marginal returns to additional chicken welfare work are much lower (and maybe that’s the issue?), it suggests we shouldn’t even bother with the others, unless we can find much higher leverage interventions. Maybe certifier outreach? Or working on farmed insect welfare standards at the EU or US level? The number of targeted individuals born every year for the BSF intervention seems pretty low to me. There are individual farms that will farm more insects per year.
It seems the AI Misalignment Megaproject is more likely to fail (with the same probability of backfire conditional on failing) than the Small-scale AI Misalignment Project. Why is that? I would expect a lower chance of doing nothing, but a higher chance of success and a higher chance of backfire.
Hi Michael! Some answers:
There will be! We hope to release an update in the following days, implementing the ability to change the sample size, and allowing billions of samples. This was tricky because it required some optimizations on our end.
We were divided on selecting a reasonable default here, and I agree that a shorter default might be more reasonable for the latter case. This was more of a compromise solution, but I think we could pick either perspective and stick with it for the defaults.
That said, I want to emphasize that all default assumptions in CCM should be taken lightly, as we were focused on making a general tool, instead of refining (or agreeing upon) our own particular assumptions.
As with (3), I agree with your reasoning, and we’ll probably be updating some of these template projects soon, but I would encourage you to tweak these assumptions to match yours.
Hi Michael, here are some additional answers to your questions:
1. I roughly calibrated the reasonable risk aversion levels based on my own intuition and using a Twitter poll I did a few months ago: https://x.com/Laura_k_Duffy/status/1696180330997141710?s=20. A significant number (about a third of those who are risk averse) of people would only take the bet to save 1000 lives vs. 10 for certain if the chance of saving 1000 was over 5%. I judged this a reasonable cut-off for the moderate risk aversion level.
4. The reason the hen welfare interventions are much better than the shrimp stunning intervention is that shrimp harvest and slaughter don’t last very long. So, the chronic welfare threats that ammonia concentrations battery cages impose on shrimp and hens, respectively, outweigh the shorter-duration welfare threats of harvest and slaughter.
The number of animals for black soldier flies is low, I agree. We are currently using estimates of current populations, and this estimate is probably much lower than population sizes in the future. We’re only somewhat confident in the shrimp and hens estimates, and pretty uncertain about the others. Thus, I think one should feel very much at liberty to plug in different numbers for population sizes for animals like black soldier flies.
More broadly, I think this result is likely a limitation of models based on total population size, versus models that are based more on the number of animals affected per campaign. Ideally, as we gather more information about these types of interventions, we could assess the cost-effectiveness using better estimates of the number of animals affected per campaign.
Thanks for the thorough questions!