For the latest updates and insights into my research, follow me on Google scholar and subscribe to my Substack blog.
Lucius Caviola
Thanks Leonard. This is helpful.
Re digital minds numbers and timelines, I agree this is important and underexplored!
Re AI safety vs welfare: you’re right that the general “do they conflict?” question has a trivial answer, and I’ll rephrase this. But I want to explain why I framed it this way / what I had in mind. The reason is partly substantive and partly sociological/political.
Substantively, I think we should be looking for interventions that are robustly positive across both goals, ideally synergistic. (This is in line with what Rob, Jeff, and Toni argue in their paper.)
Sociologically and politically, AI safety and AI welfare have an unusually overlapping community: shared people, shared funders, shared intellectual lineage. I think it’s really important that these communities continue to work closely together and don’t end up doing things that undermine the other goal. I also worry about broader societal dynamics in the coming years where different groups push things they see as good for one goal but bad for the other (e.g. “humans should always dominate AIs” vs “we should grant AIs empowering rights now”).
So a better version of the question might be: “What’s robustly good for both AI safety and welfare?” or “How can AI safety and welfare work support each other, or at least not work against each other?”. Thoughts? I will think more and update the post.
Re your broader point (robustly good actions vs forced choices with no robustly good answer): this is interesting and I want to think it through more.
My initial reaction is that the second category is probably smaller than it looks. Before accepting that a question has no robustly good answer, we should think really carefully if there might be robustly good options that aren’t obvious, e.g., delaying the decision, keeping options open, investing in research to get better information. That said, sometimes there really is no action that doesn’t come with expected serious harm. In those cases I agree we should identify the most important ones (e.g. by stakes, irreversibility, timing) and analyze them carefully. Do you agree with this?
Open strategic questions for digital minds
The Vatican, AI Legal Personhood, and Claude’s Constitution — Digital Minds Newsletter #2
Apply for the Digital Minds Fellowship (Aug 3–9, Cambridge University)
Digital Minds in 2025: A Year in Review
Digital Minds Newsletter: a new resource on AI consciousness and moral status
Thanks for sharing your analysis, Vasco. Two quick questions:
1. Could digital welfare capacity turn out to be much more efficient than in humans?
2. How would you think about interventions we could pursue now that might prevent large-scale digital suffering in the future, e.g., establishing norms or policies that reduce the risk of mistreated digital minds decades from now?
Perhaps this downside could be partly mitigated by expanding the name to make it sound more global or include something Western, for example: Petrov Center for Global Security or Petrov–Perry Institute (in reference to William J. Perry). (Not saying these are the best names.)
A guide about (seemingly) conscious AI: WhenAISeemsConscious.org
Highlights from “Futures with Digital Minds: Expert Forecasts in 2025”
When digital minds demand freedom: could humanity choose to be replaced?
Futures with digital minds: Expert forecasts in 2025
Thanks for your thoughtful comment—I agree that social and institutional contexts are important for understanding these decisions. My research is rooted in social psychology, so it inherently considers these contexts. And I think individual-level factors like values, beliefs, and judgments are still essential, as they shape how people interact with institutions, respond to cultural norms, and make collective decisions. But of course, this is only one angle to study such issues.
For example, in the context of global catastrophic risks, my work explores how psychological factors intersect with the collective and institutions. Here are two examples:
Crying wolf: Warning about societal risks can be reputationally risky
Does One Person Make a Difference? The Many-One Bias in Judgments of Prosocial Action
Increasing Concern for Digital Beings Through LLM Persuasion
Thanks for this. I agree with you that AIs might simply pretend to have certain preferences without actually having them. That would avoid certain risky scenarios. But I also find it plausible that consumers would want to have AIs with truly human-like preferences (not just pretense) and that this would make it more likely that such AIs (with true human-like desires) would be created. Overall, I am very uncertain.
Digital Minds Takeoff Scenarios
Thanks, I also found this interesting. I wonder if this provides some reason for prioritizing AI safety/alignment over AI welfare.
It’s not yet published, but I saw a recent version of it. If you’re interested, you could contact him (https://www.philosophy.ox.ac.uk/people/adam-bales).
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.