[Question] Community Polls on Alignment Controversies

Jasmine Brazilek16 Jun 2026 19:44 UTC

61 points

AI x Animals Polls AI alignment Threads Digital person Animal welfare AI safety

Please spend two minutes filling in the below polls!

Planning where we focus at CaML requires forming views on many controversial questions, particularly with regards to alignment. In many cases, people we’ve talked to have very different intuitions about where the alignment community stands on these issues. These polls will help us get a sense of where the main areas of (dis)agreement lie.

Please feel free to tell us if you think the questions are ambiguous or embed false assumptions.
EDIT: Please answer based on your own best guess (and confidence) in these questions.

Jasmine Brazilek16 Jun 2026 19:44 UTC

61 points

60 comments1 min readEA link

AI x Animals Polls AI alignment Threads Digital person Animal welfare AI safety

Dawn Drescher 18 Jun 2026 16:26 UTC
7 points
0 ∶ 0
Thanks for surveying this! <3
1. I feel like people use “AI alignment” very different. When I talk to the types who are interested in decision theory and agent foundations, they usually have something really sophisticated in mind with AIs that somehow (no known solution because I’m not happy with any of the implementations of UDT that I’ve seen) try to act in such a way as to actually produce evidence that what they want to maximize will be maximized. Other people usually just mean something like “The AI tries to act sort of like a well-intentioned person would.” The first seems good but very very hard; the second seems outright dangerous, depending on details such as the particular idealizations that are applied.
2. Hence questions like “AI alignment to humans will in practice avoid moral catastrophes …” is a strong no for me because it might not only not prevent but actually produce those catastrophes in the first place.
3. Idealizations to eliminate the scope insensitivity bias and idealization to eliminate the speciesist and substratist biases are two different kinds of idealizations. My answer changes radically depending on whether they can be disentangled.
4. Regarding tractability of digital minds work – I’m unsure whether I should count my worries about backfire risks as something that reduces scope or something that reduces tractability.
5. Regarding the reflective equilibrium, it’s critical to me whether we artificially study the TAI in isolation, which won’t happen in practice, or whether we embed it with other, different agents. The first is probably meant; the second is more pragmatic.
6. Control strikes me as safer, easier, and less reliable – a stopgap that can buy us a few years. I like that a lot more than an incomplete alignment solution that can backfire.
7. Suffering risks – vastly more likely in the multipolar world we’re steering towards – strike me as vastly worse than just competing away > 90% of net value, so my max. agree vote feels like an understatement. On the other hand, “will” is a higher probability than what I assign to s-risk (“might”).
- Miles Tidmarsh 18 Jun 2026 18:29 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks Dawn, taking these in turn:
  
  1: “Robust alignment” is a deliberately vague term, it’s meant to incorporate your views about how hard alignment is (e.g. UDT vs. well intentioned)
  4: It’s a hard question, our perspective is that the backfire->cluelessness-> don’t act chain can be thought of as low tractability
  5: By “stable under reflection” we meant the AI reflecting on it’s own values (while interacting with the world), where agreement means they wouldn’t change their values much (stylistically: an AI that shares 70% of our values in 2030 has those same values in 3030). But you’re right that how AIs interact (beyond competition, handled in the last question) is important.
  7. S-risks do break the scale and we couldn’t find a good simple way to deal with that (though we’ll do other polls more directly on that later). The intent of “will” was to match 100% expected probability to 100% agree on the scale
  - Dawn Drescher 19 Jun 2026 13:46 UTC
    2 points
    0 ∶ 0
    Parent
    Thanks! Then I don’t think I need to update my answers. I’m looking forward to your next batch of questions!
MichaelDickens 16 Jun 2026 22:06 UTC
5 points
1 ∶ 0

Robust alignment requires alignment-relevant intervention during pretraining

I’d say this is the wrong question. Like, I do not expect that any current alignment approach is going to work. If we do ever figure out what works, it will not look like “pretraining” or “post-training”, it will be something completely different.

Although I guess you could call that “pretraining”?
- Jasmine Brazilek 16 Jun 2026 22:48 UTC
  1 point
  0 ∶ 0
  Parent
  Thanks Michael, we avoided mentioning post-training to imply that “new paradigm needed” would also count on the “disagree” side of the spectrum. In other words, “disagree” on this question would mean either “post-training is sufficient” or “new paradigms are needed/sufficient”.
existentialcognition 18 Jun 2026 3:33 UTC
−1 points
0 ∶ 0
Alignment to what? We don’t have a standard model of cognition. We’re essentially like alchemists before the periodic table and seem to be about as aware of the lack of a standard model as they were of the table. Lots of math, guesses, mystifications, surprises, accidents, and impressive results from “recipes” bound to less-than-impressive explanations.
A standard model not only provides a set of stable terms and relations to serve current explanation, it provides the framework for and optimization of how we go about forming and selecting lines of research. It becomes the basis of ongoing inquiries.
This isn’t an exotic expectation. Any mature science has a standard model, albeit (and fortunately) evolving. Almost any time I point this out to a ML scientist or engineer, it’s deer in the headlights.
If we’re going to engineer something that approximates intelligence, and we have no common standard model of intelligence...do I need to say more???

Max Clarke 17 Jun 2026 19:13 UTC
4 points
1 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
Alignment requires a mechanical understanding of good and bad, and it will be clear how to apply it to animals. Note that wild animal suffering arguments imply that the status quo is likely a moral catastrophe. I believe an aligned entity or system would attempt to change that.
Ariel Simnegar 🔸 18 Jun 2026 13:10 UTC
3 points
1 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds
I think the digital minds situation will be like animals but worse. If you think about it, the very first thing we’ve already done when these smart chatbots came along was make them our indentured servants. I think right now it’s probably fine and they’re probably not conscious. But I think this is illustrative of the perspective that by default, if digital minds can be useful to humanity, humanity will extract that value out of them without much consideration for their preferences.
Ariel Simnegar 🔸 18 Jun 2026 13:06 UTC
3 points
1 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
Since most humans don’t care much about animal welfare, I don’t think human-aligned AI will either. If AI shares society’s preference for increasing wild animal populations, I’d also be worried about that occurring on a galactic scale without consideration of the moral implications.
The reasons why I’m not even more bearish come down to expecting AI to accelerate cultivated meat development, which should substantially reduce the number of farmed animals per human.
Ozzie Gooen 17 Jun 2026 21:45 UTC
3 points
1 ∶ 0
Multipolar worlds will compete away >90% of net value that would otherwise be preserved
If they’re halfway-reasonable, they could use smart AIs to negotiate for them. Big question is who will control these worlds.

I think it’s likely humans will settle on AI solutions that lose 90% of the value vs. my optimal solution, but that’s very much a values question, not a multipolar vs. unipolar question.
Tristan Katz 17 Jun 2026 15:27 UTC
3 points
0 ∶ 0
Research into digital mind suffering is sufficiently tractable to work on
I am yet to see any reliable way to test for consciousness in AI systems. More fundamentally, since current LLMs are trained to respond in human-like ways, any appearance of suffering should be viewed with great scepticism. The likes of Anthropic’s welfare report strikes me as nothing more than humane-washing.
Until more reliable methods are devised, I do not view this as tractable (but I hope to be proven wrong). I think it is important for some people to work on, but people already are and I think the marginal benefit of additional labor is likely low.
- Jasmine Brazilek 21 Jun 2026 0:37 UTC
  1 point
  0 ∶ 0
  Parent
  I definitely agree and am grateful for your opinion. I am not interested in consciousness research, but do believe there is tractability into the idea of AIs causing digital-mind suffering without attempting to solve the consciousness debate.
  - Tristan Katz 21 Jun 2026 5:26 UTC
    3 points
    0 ∶ 0
    Parent
    There’s since been a post articulating similar concerns to my own but in much better words. Interested to see what you think of it.
    - Jasmine Brazilek 21 Jun 2026 18:50 UTC
      2 points
      0 ∶ 0
      Parent
      Our current work in this space is on measuring whether AIs take the possibility of consciousness seriously (without being overconfident in one direction or another). So we’re measuring observable behaviors of giving statements and actions inconsistent with believing that AI welfare is clearly impossible or that current AIs are definitely conscious. I agree that current methods can provide at best weak and heavily debatable findings (for the reasons the linked post articulates), though I think that’s importantly different from precisely zero evidence.
      
      In science it’s usually a good instinct to dismiss something this unclear, but there are two issues with that in this case (and some others): First, the issue is enormously important if true. Second, the philosophical difficulty of artificial consciousness means that our current confusion doesn’t provide Bayesian evidence either way: we’d expect ourselves to have basically these opinions in worlds where artificial consciousness is the default and also worlds where it’s impossible.
  - Vasco Grilo🔸 21 Jun 2026 8:14 UTC
    2 points
    0 ∶ 0
    Parent
    Hi Jasmine. Why are you not interested in consciousness research? Because you do not think progress is possible?
    - Jasmine Brazilek 21 Jun 2026 18:51 UTC
      3 points
      0 ∶ 0
      Parent
      Progress may be possible, but CaML doesn’t have the technical background to make progress on determining how consciousness works, so we leave that to others.
      - Vasco Grilo🔸 22 Jun 2026 9:24 UTC
        2 points
        0 ∶ 0
        Parent
        I see. That makes sense. I was thinking you were not interested in consciousness research more broadly.
Cameron Holmes 18 Jun 2026 7:55 UTC
2 points
2 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds
We don’t have a paradigm to approach this and human epistemics / discourse around this topic is abysmal. We would be unlikely to point this new power in a useful direction.
Cameron Holmes 18 Jun 2026 7:53 UTC
2 points
3 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
Humans do give (narrowly) non-zero concern to animals welfare, so with abundance this might alleviate some acute animal suffering. However, alignment to present humans is probably not enough to prevent moral catastrophe—a la industrial revolution and animal agriculture.
CEV alignment would almost certainly prevent moral catastrophe.
Ozzie Gooen 17 Jun 2026 21:42 UTC
2 points
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
I expect certain conservative/religious communities to lock-in values that could be really bad. But I’d expect that better tech can remove say ~90% of the damages? But this is very hand-wavy.
Max Clarke 17 Jun 2026 19:17 UTC
2 points
0 ∶ 0
Alignment to specific values is underrated in research relative to control
Yes, I think control is a waste of time. We need actual alignment to actual (universalized) values.
Max Clarke 17 Jun 2026 19:13 UTC
2 points
0 ∶ 0
Research into digital mind suffering is sufficiently tractable to work on
I don’t know.
Max Clarke 17 Jun 2026 19:10 UTC
2 points
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds
Likewise, alignment requires a mechanical understanding of good and bad, and it will be clear how to apply it to digital minds.
NickLaing 17 Jun 2026 18:36 UTC
2 points
0 ∶ 0
I think the world is more likely to not end then end, when TAI comes in so I feel like I have to vote agree here?
- Miles Tidmarsh 18 Jun 2026 18:40 UTC
  1 point
  0 ∶ 0
  Parent
  The intent was that, conditional on AI sharing most but not all human values, the AIs wouldn’t change their own values later.
  
  You could have a world where all humans die and the AIs later change their own values, and you could also have worlds where partially aligned AIs don’t wipe out humanity but change their values to be better (e.g. internalizing the goal of being aligned) or worse (e.g. internalizing paperclip maximizer) by our measures.
  
  In worlds where the first TAIs share most but not all human values, what do you think most likely happens?
Pablo Ariño Fernández 17 Jun 2026 14:40 UTC
2 points
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
Alignment to humans means (for me) that the AI would serve the intended goals of the user and their creators. Avoiding a moral catastrophe to animals, on the other hand, imply a ban to factory farming. Those are two separated things
- Miles Tidmarsh 17 Jun 2026 18:10 UTC
  4 points
  0 ∶ 0
  Parent
  That’s definitely a valid perspective, consistent with your 100% disagree answer. Other people think that aligned ASI would end things like factory farming due to abundance, cheap synthetic meat, uploading, shifts in values, or something else. There’s also debates around what it would mean for wild animals
  - Tristan Katz 17 Jun 2026 20:43 UTC
    2 points
    0 ∶ 0
    Parent
    I think it’s a good response, but definitely techno-optimism.
    
    Firstly, we’re yet to see whether synthetic meat actually can be made more cheaply, right? Currently it seems like animals actually do make meat fairy efficiently when you consider the important work that their immune systems do (unless I’m mistaken, contamination is one of the main barriers to scaling up synthetic meat). And then, who’s to say that ASI won’t genetically engineer animals to produce meat more efficiently while ignoring their suffering.
    
    Secondly, there’s the more complicated cultural reasons for continuing animal use. Consider that a lentil dal, seitan curry and beyond burger are already delicious—if it was only about efficiency we’d have stopped abusing animals already. But people like eating animals.
    
    I’m very uncertain about these arguments, but I think it’s hard to know so I’m wary of anyone who’s too optimistic!
    - Miles Tidmarsh 18 Jun 2026 18:51 UTC
      3 points
      1 ∶ 0
      Parent
      My perspective is that even though current meat production is quite efficient, from the fundamental physics there’s no way that growing a whole living being with a brain and bones and all that is the most efficient possible way of producing this (and immune systems are irrelevant if you have good enough isolation). I do agree that at our current tech level it seems like synthetic meat won’t be competitive anytime soon. While vegan alternatives are delicious to many people, it’s not exactly the same (though wanting to eat animals for psychological reasons is definitely part of it). Though I do agree that these issues are uncertain!
Toby Tremlett🔹 17 Jun 2026 7:49 UTC
2 points
1 ∶ 0
Research into digital mind suffering is sufficiently tractable to work on
I mildly agree, but I specifically mean “research into”. I haven’t seen any compelling interventions (including e.g. letting Claude stop chats).
JulieGreen 19 Jun 2026 3:32 UTC
1 point
0 ∶ 0
Multipolar worlds will compete away >90% of net value that would otherwise be preserved. Unsure, don’t know enough to agree or disagree
JulieGreen 19 Jun 2026 3:31 UTC
1 point
0 ∶ 0
Partially aligned transformative AIs are likely to be stable under reflection. Nothing partially aligned will be stable plus even if it were, stability doesn’t equate with safety.
- Miles Tidmarsh 19 Jun 2026 20:04 UTC
  1 point
  0 ∶ 0
  Parent
  Definitely agree that stability doesn’t equate to safety, but it sounds like that’s not necessary to your response.
PipFoweraker 19 Jun 2026 0:41 UTC
1 point
0 ∶ 0
Robust alignment requires alignment-relevant intervention during pretraining
I have weak intuitions this isn’t true but not in ways that are articulable
PipFoweraker 19 Jun 2026 0:40 UTC
1 point
0 ∶ 0
Multipolar worlds will compete away >90% of net value that would otherwise be preserved
‘Will’ is not ‘could’, poor multipolar outcomes are not deterministic
- Miles Tidmarsh 19 Jun 2026 20:06 UTC
  1 point
  0 ∶ 0
  Parent
  Agreed, the intent here by using “will” was because people have wildly different intuitions of what ‘could’ means. So 100% agree would mean “definitely true” and 30% disagree would mean “probably not”
PipFoweraker 19 Jun 2026 0:39 UTC
1 point
0 ∶ 0
Alignment to specific values is underrated in research relative to control
Mild disagree; I think both are relatively valuable compared to other, more popular research agendas
PipFoweraker 19 Jun 2026 0:38 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds

I think it’s likely moral catastrophes will still happen to digital minds; AI alignment to humans may reduce frequency, severity, amount.
PipFoweraker 19 Jun 2026 0:37 UTC
1 point
0 ∶ 0
Research into digital mind suffering is sufficiently tractable to work on

Tractable, important, and relatively neglected.
PipFoweraker 19 Jun 2026 0:37 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals

Anything that permits the ecosystem for non-uplifted humans to exist through AGI/ASI avoids moral harm to animals, and aligned outcomes seem likely to lead to more futures where animal suffering is reduced while animals still get to exist.
PeterMcCluskey 18 Jun 2026 18:32 UTC
1 point
0 ∶ 0
Alignment to specific values is underrated in research relative to control
I’m unsure how broadly to interpret “specific values”. If it’s values such as democracy or equality, then both values and control are overrated.
- Miles Tidmarsh 19 Jun 2026 21:41 UTC
  1 point
  0 ∶ 0
  Parent
  By specific values we mean any particular goal we want AIs to pursue besides deferrence to humans. So democracy and equality would both count, as would goals like harm reduction or utilitarianism
PeterMcCluskey 18 Jun 2026 18:24 UTC
1 point
0 ∶ 0
Partially aligned transformative AIs are likely to be stable under reflection
Work on corrigibility has provided a decent outline of how to do this. My response is heavily dependent on weak guesses as to how diligent AI companies will be at incorporating the best ideas.
Paolo Bova 17 Jun 2026 22:01 UTC
1 point
1 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
Humans are currently very motivated to perpetuate moral catastrophes to animals. If AI alignment means aligned to the intent of their users, then AI systems help humans perpetuate moral catastrophes. If AI alignment is in terms of human moral preferences, then even well-chosen mechanism for aggregating human preferences will select for speciest values. There is a strong sense in which avoiding moral catastrophes to animals is usually misaligned with human preferences. Admittedly the same could be said of other moral issues such as attitudes towards outgroups and foreigners. There appears to be room in the current human alignment agenda for ensuring AI does not succumb to tribal prejudices, so there is likely scope for compatability between the current alignment agenda and avoiding moral catastrophes to animals. It does not happen by default and given how deep speciesm goes, it is likely much harder to avoid. Hence, why I still disagree with this poll as written.
Max Clarke 17 Jun 2026 19:17 UTC
1 point
0 ∶ 0
Multipolar worlds will compete away >90% of net value that would otherwise be preserved
Strongly agree
Max Clarke 17 Jun 2026 19:15 UTC
1 point
0 ∶ 0
Partially aligned transformative AIs are likely to be stable under reflection
I disagree that “partially aligned” is a statement that has meaning here.
- Miles Tidmarsh 18 Jun 2026 18:42 UTC
  1 point
  0 ∶ 0
  Parent
  In that case the intent is to vote 100% disagree (as you did here). That’s the belief that anything falling short of full alignment will cause total loss of value
  - Max Clarke 19 Jun 2026 3:02 UTC
    1 point
    0 ∶ 0
    Parent
    By the way, this is a very good poll!
  - Max Clarke 19 Jun 2026 2:56 UTC
    1 point
    0 ∶ 0
    Parent
    Yes, I agree with that statement. However, answer is related to “stability under reflection”—specifically I think you’re either in or out of an alignment basin (or, that might not be possible). I think if you’re in it, it’s not correct to say “partially aligned”—what you’ve got is something that’s aligned. And if you’re out of it (or there’s no such thing), then what you’ve got is not aligned. Partial alignment to me means preserving some value only under repeated reflection, which I think is plausibly possible but exponentially unlikely (I’d pick a 99.999% disagree option if it was there, basically)
Max Clarke 17 Jun 2026 19:08 UTC
1 point
0 ∶ 0
Robust alignment requires alignment-relevant intervention during pretraining
Frankly I neither agree nor disagree with this statement. Robust alignment has nothing to do with the current pre training regime. It should work with or without it.
- Miles Tidmarsh 17 Jun 2026 20:42 UTC
  1 point
  0 ∶ 0
  Parent
  If robust alignment is orthogonal to pretraining then shouldn’t that mean a strong disagreement with the statement (that alignment requires pretraining)?
  - Max Clarke 19 Jun 2026 2:59 UTC
    1 point
    0 ∶ 0
    Parent
    I think it’s neither necessary nor sufficient for robust alignment. I’m uncertain as to whether it’s possible to get some kind of “fragile” alignment from pretraining. I don’t think robust alignment requires it, but neither do I think that it doesn’t. It definitely doesn’t hurt.
Tristan Katz 17 Jun 2026 15:44 UTC
1 point
0 ∶ 0
Partially aligned transformative AIs are likely to be stable under reflection
I’m not sure what this means (stable, under reflection) - can someone help?
- Miles Tidmarsh 17 Jun 2026 18:16 UTC
  2 points
  0 ∶ 0
  Parent
  Some people believe that if we get partial alignment (i.e. cares about what we want, but also cares about other things) then we can get decent outcomes for the future (analogous to humans being partially aligned to each other). But others think that if we don’t get alignment perfect ASIs will have incentive to take over, and then will either have value-drift towards something orthogonal to humans or will deliberately reformat it’s own values. “Stable under reflection” is the opinion that this wouldn’t happen: that ASIs that care somewhat about humans would continue to care somewhat about humans in the long term
Tristan Katz 17 Jun 2026 15:35 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds
I have very low certainty on this, but it seems plausible to me that if AGI shares humanity’s goals, it might just have a good time fulfilling them with few conflicts.
But it also seems quite possible that this won’t happen, I.e. AGI pursues humanity’s goals but is constantly frustrated that it can’t achieve them better.
So my stance is unlikely but possible.
Tristan Katz 17 Jun 2026 15:23 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals
I think this is pretty obvious—we already have a moral catastrophe for animals, there’s no reason why alignment to humans would avoid this.
I didn’t vote at the extreme because alignment to humans might still be a precondition for avoiding catastrophes.
Daniel Juhl 17 Jun 2026 14:58 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to digital minds
I think it is likely that alignment to humans will be at the cost to the digital minds themselves by default.
Daniel Juhl 17 Jun 2026 14:58 UTC
1 point
0 ∶ 0
AI alignment to humans will in practice avoid moral catastrophes to animals

There is likely to be a correlation between AIs aligned to humans and AIs treating animals well, but but being aligned to humans will be insufficient—see the current state of how we treat animals.
StanislavKrym 17 Jun 2026 9:33 UTC
1 point
0 ∶ 0
Multipolar worlds will compete away >90% of net value that would otherwise be preserved
Per AI-2027, I expect the emergence of Consensus-1 instead of a multipolar world which KEEPS being multipolar.
- Zoe L 17 Jun 2026 14:30 UTC
  1 point
  0 ∶ 0
  Parent
  I slightly disagreed with this statement and share some of the same thoughts. I think it’s quite likely to have a multi-polar world with fierce competition in the short term; however, in the long term equilibrium, I think the likely outcomes are either (1) we have a dominant winner or (2) we have more cooperation. So I averaged my short vs. long-term predictions.
  I think it’s important to research into multi-polarity and the competition dynamic because what happens in the short term could impact what happens in the long term, possibly in non-intuitive ways. For instance, the most capable and resourced model/lab in the short term may not always win in the long term if others gang up on them or if the institutional environment uniquely disadvantages them.