For the latest updates and insights into my research, follow me on Google scholar and subscribe to my Substack blog.
Lucius Caviola
Thanks Leonard. This is helpful.
Re digital minds numbers and timelines, I agree this is important and underexplored!
Re AI safety vs welfare: you’re right that the general “do they conflict?” question has a trivial answer, and I’ll rephrase this. But I want to explain why I framed it this way / what I had in mind. The reason is partly substantive and partly sociological/political.
Substantively, I think we should be looking for interventions that are robustly positive across both goals, ideally synergistic. (This is in line with what Rob, Jeff, and Toni argue in their paper.)
Sociologically and politically, AI safety and AI welfare have an unusually overlapping community: shared people, shared funders, shared intellectual lineage. I think it’s really important that these communities continue to work closely together and don’t end up doing things that undermine the other goal. I also worry about broader societal dynamics in the coming years where different groups push things they see as good for one goal but bad for the other (e.g. “humans should always dominate AIs” vs “we should grant AIs empowering rights now”).
So a better version of the question might be: “What’s robustly good for both AI safety and welfare?” or “How can AI safety and welfare work support each other, or at least not work against each other?”. Thoughts? I will think more and update the post.
Re your broader point (robustly good actions vs forced choices with no robustly good answer): this is interesting and I want to think it through more.
My initial reaction is that the second category is probably smaller than it looks. Before accepting that a question has no robustly good answer, we should think really carefully if there might be robustly good options that aren’t obvious, e.g., delaying the decision, keeping options open, investing in research to get better information. That said, sometimes there really is no action that doesn’t come with expected serious harm. In those cases I agree we should identify the most important ones (e.g. by stakes, irreversibility, timing) and analyze them carefully. Do you agree with this?
Thanks for sharing your analysis, Vasco. Two quick questions:
1. Could digital welfare capacity turn out to be much more efficient than in humans?
2. How would you think about interventions we could pursue now that might prevent large-scale digital suffering in the future, e.g., establishing norms or policies that reduce the risk of mistreated digital minds decades from now?
Perhaps this downside could be partly mitigated by expanding the name to make it sound more global or include something Western, for example: Petrov Center for Global Security or Petrov–Perry Institute (in reference to William J. Perry). (Not saying these are the best names.)
Thanks for your thoughtful comment—I agree that social and institutional contexts are important for understanding these decisions. My research is rooted in social psychology, so it inherently considers these contexts. And I think individual-level factors like values, beliefs, and judgments are still essential, as they shape how people interact with institutions, respond to cultural norms, and make collective decisions. But of course, this is only one angle to study such issues.
For example, in the context of global catastrophic risks, my work explores how psychological factors intersect with the collective and institutions. Here are two examples:
Crying wolf: Warning about societal risks can be reputationally risky
Does One Person Make a Difference? The Many-One Bias in Judgments of Prosocial Action
Thanks for this. I agree with you that AIs might simply pretend to have certain preferences without actually having them. That would avoid certain risky scenarios. But I also find it plausible that consumers would want to have AIs with truly human-like preferences (not just pretense) and that this would make it more likely that such AIs (with true human-like desires) would be created. Overall, I am very uncertain.
Thanks, I also found this interesting. I wonder if this provides some reason for prioritizing AI safety/alignment over AI welfare.
It’s not yet published, but I saw a recent version of it. If you’re interested, you could contact him (https://www.philosophy.ox.ac.uk/people/adam-bales).
Thanks, Siebe. I agree that things get tricky if AI minds get copied and merged, etc. How do you think this would impact my argument about the relationship between AI safety and AI welfare?
I wonder what you think about this argument by Schwitzgebel: https://schwitzsplinters.blogspot.com/2021/12/against-value-alignment-of-future.html
Thanks, Adrià. Is your argument similar to (or a more generic version of) what I say in the ‘Optimizing for AI safety might harm AI welfare’ section above?
I’d love to read your paper. I will reach out.
The Global Risk Behavioral Lab is looking for a full-time Junior Research Scientist (Research Assistant) and a Research Fellow for one year (with the possibility of renewal).
The researchers will work primarily with Prof Joshua Lewis (NYU), Dr Lucius Caviola (University of Oxford), researchers at Polaris Ventures, and the Effective Altruism Psychology Research Group. Our research studies psychological aspects of relevance to global catastrophic risk and effective altruism. A research agenda is here.
Location: New York University or Remote
Apply nowResearch topics include:
Judgments and decisions about global catastrophic risk from artificial intelligence, pandemics, etc.
The psychology of dangerous actors that could cause large-scale harm, such as malevolent individuals or fanatical and extremist ideological groups
Biases that prevent choosing the most effective options for improving societal well-being, including obstacles to an expanded moral circle
Suggested skills: Applicants for the Junior Research Scientist position ideally have some experience in psychological/behavioral/social science research. Applicants for the Research Fellow position can also come from other fields relevant to studying large-scale harm from dangerous actors.
Thanks Ben!
13.6% (3 people) of the 22 students who clicked on a link to sign up to a newsletter about EA already knew what EA was.
And 6.9% of the 115 students who clicked on at least one link (e.g. EA website, link to subscribe to newsletter, 80k website) already knew what EA was.
Another potentially useful measure (to get at people’s motivation to act) could be this one:
“Some people in the Effective Altruism community have changed their career paths in order to have a career that will do the most good possible in line with the principles of Effective Altruism. Could you imagine doing the same now or in the future? Yes / No”
Of the total sample, 42.9% said yes to it. And of those people, only 10.4% already knew what EA was.
And if we only look at those who are very EA-sympathetic (scoring high on EA agreement, effectiveness-focus, expansive altruism and interest to learn more about EA), the number is 21.8%. In other words: of the most EA-sympathetic students who said they could imagine changing their career to do the most good, 21.8% (12 people) already knew what EA was.
(66.3% of the very EA-sympathetic students said they could imagine changing their career path to do the most good.)
A caveat is that some of these percentages are inferred from relatively small sample sizes — so they could be off.
We’ve asked them about a few ‘schools of thought’: effective altruism, utilitarianism, existential risk mitigation, longtermism, evidence-based medicine, poststructuralism (see footnote 4 for results). But very good idea to ask about a fake one too!
(Note that we also asked participants who said they have heard of EA to explain what it is. And we then manually coded whether their definition was sufficiently accurate. That’s how we derived the 7.4% estimate.)
We considered this too. But the significant correlations with education level and income held even after controlling for age. (We mention this below one of the tables.)
I see that it may seem surprising at first glance that education doesn’t correlate positively with our two scales. (Like David, I am not sure if the negative correlation will hold up.) It seems surprising because we know that most existing highly engaged EAs are highly educated (and likely have high cognitive abilities). But what this lack of positive correlation shows is simply that high education (and probably also high cognitive abilities) is not required to intuitively share the core moral values of EA.
As we point out in the article, there are likely several additional factors that predict whether someone will become a highly engaged EA. And it’s possible that education (and likely high cognitive abilities) is such an additional, and psychologically separate, factor.
Just to add to what David said: It’s difficult to say whether our NYU business sample or our MTurk sample is more representative of our primary target audience. The best way to find out is to do a large representative survey, e.g., amongst students at a top uni (of all study subjects—not just business).
Yes, it was initially quite surprising that so many donors are willing to support the matching system. We found similar results when we tested it with MTurk participants (who were given a small bonus which they could give or keep; see Study 7). One possibility is that it’s a kind of intergenerational reciprocity tendency, where people who benefited from the generosity of previous donors want to pay it forward to the next ones.
Thanks!
Perhaps, but we are uncertain. It depends on whether we can find a scalable strategy for reaching donors who are amenable to EA but not yet engaged with effective altruism. Such a strategy might come from paid advertising, further earned media coverage (our strategy so far), or from the formation of institutional relationships (e.g. with businesses, universities, or wealth managers) who offer guidance or incentives for charitable giving.
Yes, we’ve recently introduced our donors to GWWC. (Results of that campaign are not in yet.)
Thanks, Linch.
First, you’re right that several EA psychology researchers are studying how people donate to charity. But most of them (including myself) are also studying other EA-related topics, such as the psychology of xrisk and longtermism, moral attitudes towards animals, etc. My hunch is that only a minority of currently ongoing EA psychological research projects have charitable giving as their primary topic of interest.
Second, as David pointed out, donation choices are a useful behavioral outcome measure when studying the public’s beliefs, attitudes and preferences about EA related issues more generally. In many cases, the goal of the research is not necessarily to understand how people donate to charity specifically but to understand the fundamental psychological drivers of and obstacles to EA-aligned attitudes and behavior more generally (example). Studying these in the context of charitable giving is an obvious and often straightforward first step — in the hope that these insights can be generalized.
For example, the fact that people are willing to split their donation, as described in the post, tells us something more fundamental about people’s preferences structure (the fact that most people value effectiveness but only as a secondary preference), the potential market size of EA in the general public, and possible routes for reaching a wider adoption of EA ideas. Another example is the study of individual differences: who are the people who immediately find EA ideas appealing, where can we find them and how should we target them? It’s natural to test this, in part, by observing people’s donation choices.
My view on prioritization is that psychological research can be useful when it yields such fundamental insights. But there can also be really useful applied research, such as marketing or psychometric research that can be practically useful for recruitment.
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.