AI safety governance/strategy research & field-building.
Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.
AI safety governance/strategy research & field-building.
Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.
Very excited to see this section (from the condensed report). Are you able to say more about the kind of work you would find useful in this space or the organizations/individuals that you think are doing some exemplary work in this space?
We recommend interventions which plan for worst-case scenarios—that is, interventions which are effective when preventative measures fail to prevent AI threats emerging. For concreteness, we outline some potential interventions which boost resilience against AI risks.
Developing contingency plans: Ensure there are clear plans and protocols in the event that an AI system poses an unacceptably high level of risk.13 Such planning could be analogous to planning in other fields, such as pandemic preparedness or nuclear wargaming.
Robust shutdown mechanisms: Invest in infrastructure and planning to make it easier to close down AI systems in scenarios where they pose unacceptably high levels of risk.
Also, very minor, but I think there’s a minor formatting issue with footnote 23.
To me, it sounds like you’re saying, ‘Bob is developing a more healthy relationship with EA’.
Oh just a quick clarification– I wasn’t trying to say anything about Bob or Bob’s relationship with EA here.
I just wanted to chime in with my own experience (which is not the same as Bob’s but shares one similarity in that they’re both in the “rethinking one’s relationship with the EA community/movement” umbrella).
More generally, I suspect many forum readers are grappling with this question of “what do I want my relationship with the EA community/movement to be”. Given this, it might be useful for more people to share how they’ve processed these questions (whether they’re related to the recent Manifold events or related to other things that have caused people to question their affiliation with EA).
Thanks for sharing your experience here. I’m glad you see a path forward that involves continuing to work on issues you care about despite distancing yourself from the community.
In general, I think people should be more willing to accept that you can accept EA ideas or pursue EA-inspired careers without necessarily accepting the EA community. I sometimes hear people struggling with the fact that they like a lot of the values/beliefs in EA (e.g., desire to use evidence and reason to find cost-effective and time-effective ways of improving the world) while having a lot of concerns about the modern EA movement/community.
The main thing I tell these folks is that you can live by certain EA principles while distancing yourself from the community. I’ve known several people who have distanced themselves from the community (for various reasons, not just the ones listed here) but remained in AI safety or other topics they care about.
Personally, I feel like I’ve benefitted quite a bit from being less centrally involved in the EA space (and correspondingly being more involved in other professional/social spaces). I think this comment by Habryka describes a lot of the psychological/intellectual effects that I experienced.
Relatedly, as I specialized more in AI safety, I found it useful to ask questions like “what spaces should I go to where I can meet people who could help with my AI safety goals”. This sometimes overlapped with “go to EA event” but often overlapped with “go meet people outside the EA community who are doing relevant work or have relevant experience”, and I think this has been a very valuable part of my professional growth over the last 1-2 years.
oops yup— was conflating and my comment makes less sense once the conflation goes away. good catch!
To clarify, I did see the invitations to other funders. However, my perception was that those are invitations to find people to hand things off to, rather than to be a continuing partner like with GV.
This was also my impression. To the extent that the reason why OP doesn’t want to fund something is because of PR risks & energy/time/attention costs, it’s a bit surprising that OP would partner with another group to fund something.
Perhaps the idea here is that the PR/energy/time/attention costs would be split between orgs? And that this would outweigh the costs of coordinating with another group?
Or it’s just that OP feels better if OP doesn’t have to spend its own money on something? Perhaps because of funding constraints?
I’m also a bit confused about scenarios where OP wouldn’t fund X for PR reasons but would want some other EA group to fund X. It seems to me like the PR attacks against the EA movement would be just as strong– perhaps OP as an institution could distance itself, but from an altruistic standpoint that wouldn’t matter much. (I do see how OP would want to not fund something for energy/capacity reasons but then be OK with some other funder focusing on that space.)
In general, I feel like communication from OP could have been clearer in a lot of the comments. Or OP could’ve done a “meta thing” just making it explicit that they don’t currently want to share more details.
In the post above we (twice!) invited outreach from other funders
But EG phrasing like this makes me wonder if OP believes it’s communicating clearly and is genuinely baffled when commentators have (what I see as quite reasonable) misunderstandings or confusions.
I read the part after “or” as extending the frame beyond reputation risks, and I was pleased to see that and chose to engage with it.
Ah, gotcha. This makes sense– thanks for the clarification.
If you look at my comments here and in my post, I’ve elaborated on other issues quite a few times and people keep ignoring those comments and projecting “PR risk” on to everything
I’ve looked over the comments here a few times, and I suspect you might think you’re coming off more clearly than you actually are. It’s plausible to me that since you have all the context of your decision-making, you don’t see when you’re saying things that would genuinely confuse others.
For example, even in statement you affirmed, I see how if one is paying attention to the “or”, one could see you technically only/primarily endorsing the non-PR part of the phrase.
But in general, I think it’s pretty reasonable and expected that people ended up focusing on the PR part.
More broadly, I think some of your statements have been kind of short and able to be interpreted in many ways. EG, I don’t get a clear sense of what you mean by this:
It’s not just “lower risk” but more shared responsibility and energy to engage with decision making, persuading, defending, etc.
I think it’s reasonable for you to stop engaging here. Communication is hard and costly, misinterpretations are common and drain energy, etc. Just noting that– from my POV– this is less of a case of “people were interpreting you uncharitably” and more of a case of “it was/is genuinely kind of hard to tell what you believe, and I suspect people are mostly engaging in good faith here.”
The attitude in EA communities is “give an inch, fight a mile”. So I’ll choose to be less legible instead.
As a datapoint (which you can completely ignore), I feel like in the circles I travel in, I’ve heard a lot more criticism of OP that look more like “shady non-transparent group that makes huge decisions/mistakes without consulting anyone except a few Trusted People who all share the same opinions.”
There are certainly some cases in which the attack surface is increased when you’re fully open/transparent about reasoning.
But I do think it can be easy to underestimate the amount of reputational damage that OP (and you, by extension) take from being less legible/transparent. I think there’s a serious risk that many subgroups in EA will continue to feel more critical of OP as it becomes more clear that OP is not interested in explaining its reasoning to the broader community, becomes more insular, etc. I also suspect this will have a meaningful effect on how OP is perceived in non-EA circles. I don’t mean e/accs being like “OP are evil doomers who want to give our future to China”– I mean neutral third-parties who dispassionately try to form an impression of OP. When they encounter arguments like “well OP is just another shady billionaire-funded thing that is beholden to a very small group of people who end up deciding things in non-transparent and illegible ways, and those decisions sometimes produce pretty large-scale failures”, I expect that they will find these concerns pretty credible.
Caveating that not all of these concerns would go away with more transparency and that I do generally buy that more transparency will (in some cases) lead to a net increase on the attack surface. The tradeoffs here seem quite difficult.
But my own opinion is that OP has shifted too far in the “worry a lot about PR in the conventional sense” direction in ways that have not only led to less funding for important projects but also led to a corresponding reduction in reputation/status/prestige, both within and outside of EA circles.
@Dustin Moskovitz I think some of the confusion is resulting from this:
Your second statement is basically right, though my personal view is they impose costs on the movement/EA brand and not just us personally. Digital minds work, for example, primes the idea that our AI safety concerns are focused on consciousness-driven catalysts (“Terminator scenarios”), when in reality that is just one of a wide variety of ways AI can result in catastrophe.
In my reading of the thread, you first said “yeah, basically I think a lot of these funding changes are based on reputational risk to me and to the broader EA movement.”
Then, people started challenging things like “how much should reputational risk to the EA movement matter and what really are the second-order effects of things like digital minds research.”
Then, I was expecting you to just say something like “yeah, we probably disagree on the importance of reputation and second-order effects.”
But instead, it feels (to me) like you kind of backtracked and said “no actually, it’s not really about reputation. It’s more about limited capacity– we have finite energy, attention, stress, etc. Also shared responsibility.”
It’s plausible that I’m misunderstanding something, but it felt (at least to me) like your earlier message made it seem like PR/reputation was the central factor and your later messages made it seem like it’s more about limited capacity/energy. These feel like two pretty different rationales, so it might be helpful for you to clarify which one is more influential (or present a clearer synthesis of the two rationales).
(Also, I don’t think you necessarily owe the EAF an explanation– it’s your money etc etc.)
The field is not ready, and it’s not going to suddenly become ready tomorrow. We need urgent and decisive action, but to indefinitely globally halt progress toward this technology that threatens our lives and our children’s lives, not to accelerate ourselves straight off a cliff.
I think most advocacy around international coordination (that I’ve seen, at least) has this sort of vibe to it. The claim is “unless we can make this work, everyone will die.”
I think this is an important point to be raising– and in particular I think that efforts to raise awareness about misalignment + loss of control failure modes would be very useful. Many policymakers have only or primarily heard about misuse risks and CBRN threats, and the “policymaker prior” is usually to think “if there is a dangerous, tech the most important thing to do is to make the US gets it first.”
But in addition to this, I’d like to see more “international coordination advocates” come up with concrete proposals for what international coordination would actually look like. If the USG “wakes up”, I think we will very quickly see that a lot of policymakers + natsec folks will be willing to entertain ambitious proposals.
By default, I expect a lot of people will agree that international coordination in principle would be safer but they will fear that in practice it is not going to work. As a rough analogy, I don’t think most serious natsec people were like “yes, of course the thing we should do is enter into an arms race with the Soviet Union. This is the safeest thing for humanity.”
Rather, I think it was much more a vibe of “it would be ideal if we could all avoid an arms race, but there’s no way we can trust the Soviets to follow-through on this.” (In addition to stuff that’s more vibesy and less rational than this, but I do think insofar as logic and explicit reasoning were influential, this was likely one of the core cruses.)
In my opinion, one of the most important products for “international coordination advocates” to produce is some sort of concrete plan for The International Project. And importantly, it would need to somehow find institutional designs and governance mechanisms that would appeal to both the US and China. Answering questions like “how do the international institutions work”, “who runs them”, “how are they financed”, and “what happens if the US and China disagree” will be essential here.
The Baruch Plan and the Acheson-Lilienthal Report (see full report here) might be useful sources of inspiration.
P.S. I might personally spend some time on this and find others who might be interested. Feel free to reach out if you’re interested and feel like you have the skillset for this kind of thing.
Potentially Pavel Izmailov– not sure if he is related to the EA community and not sure the exact details of why he was fired.
https://www.maginative.com/article/openai-fires-two-researchers-for-alleged-leaking/
Thanks! Familiar with the post— another way of framing my question is “has Holden changed his mind about anything in the last several months? Now that we’ve had more time to see how governments and labs are responding, what are his updated views/priorities?”
(The post, while helpful, is 6 months old, and I feel like the last several months has given us a lot more info about the world than we had back when RSPs were initially being formed/released.)
Congratulations on the new role– I agree that engaging with people outside of existing AI risk networks has a lot of potential for impact.
Besides RSPs, can you give any additional examples of approaches that you’re excited about from the perspective of building a bigger tent & appealing beyond AI risk communities? This balancing act of “find ideas that resonate with broader audiences” and “find ideas that actually reduce risk and don’t merely serve as applause lights or safety washing” seems quite important. I’d be interested in hearing if you have any concrete ideas that you think strike a good balance of this, as well as any high-level advice for how to navigate this.
Additionally, how are you feeling about voluntary commitments from labs (RSPs included) relative to alternatives like mandatory regulation by governments (you can’t do X or you can’t do X unless Y), preparedness from governments (you can keep doing X but if we see Y then we’re going to do Z), or other governance mechanisms?
(I’ll note I ask these partially as someone who has been pretty disappointed in the ultimate output from RSPs, though there’s no need to rehash that debate here– I am quite curious for how you’re reasoning through these questions despite some likely differences in how we think about the success of previous efforts like RSPs.)
What are some of your favorite examples of their effectiveness?
Congrats to Zach! I feel like this is mostly supposed to be a “quick update/celebratory post”, but I feel like there’s a missing mood that I want to convey in this comment. Note that my thoughts mostly come from an AI Safety perspective, so these thoughts may be less relevant for folks who focus on other cause areas.
My impression is that EA is currently facing an unprecedented about of PR backlash, as well as some solid internal criticisms among core EAs who are now distancing from EA. I suspect this will likely continue into 2024. Some examples:
EA has acquired several external enemies as a result of the OpenAI coup. I suspect that investors/accelerationists will be looking for ways to (further) damage EA’s reputation.
EA is acquiring external enemies as a result of its political engagements. There have been a few news articles recently criticizing EA-affiliated or EA-influenced fellowship programs and think-tanks.
EA is acquiring an increasing number of internal critics. Informally, I feel like many people I know (myself included) have become increasingly dissatisfied with the “modern EA movement” and “mainstream EA institutions”. Examples of common criticisms include “low integrity/low openness”, “low willingness to critique powerful EA institutions”, “low willingness to take actions in the world that advocate directly/openly for beliefs”, “cozyness with AI labs”, “general slowness/inaction bias”, and “lack of willingness to support groups pushing for concrete policies to curb the AI race.” (I’ll acknowledge that some of these are more controversial than others and could reflect genuine worldview differences, though even so, my impression is that they’re meaningfully contributing to a schism in ways that go beyond typical worldview differences).
I’d be curious to know how CEA is reacting to this. The answer might be “well, we don’t really focus much on AI safety, so we don’t really see this as our thing to respond to.” The answer might be “we think these criticisms are unfair/low-quality, so we’re going to ignore them.” Or the answer might be “we take X criticism super seriously and are planning to do Y about it.”
Regardless, I suspect that this is an especially important and challenging time to be the CEO of CEA. I hope Zach (and others at CEA) are able to navigate the increasing public scrutiny & internal scrutiny of EA that I suspect will continue into 2024.
Do you know anything about the strategic vision that Zach has for CEA? Or is this just meant to be a positive endorsement of Zach’s character/judgment?
(Both are useful; just want to make sure that the distinction between them is clear).
I appreciate the comment, though I think there’s a lack of specificity that makes it hard to figure out where we agree/disagree (or more generally what you believe).
If you want to engage further, here are some things I’d be excited to hear from you:
What are a few specific comms/advocacy opportunities you’re excited about//have funded?
What are a few specific comms/advocacy opportunities you view as net negative//have actively decided not to fund?
What are a few examples of hypothetical comms/advocacy opportunities you’ve been excited about?
What do you think about EG Max Tegmark/FLI, Andrea Miotti/Control AI, The Future Society, the Center for AI Policy, Holly Elmore, PauseAI, and other specific individuals or groups that are engaging in AI comms or advocacy?
I think if you (and others at OP) are interested in receiving more critiques or overall feedback on your approach, one thing that would be helpful is writing up your current models/reasoning on comms/advocacy topics.
In the absence of this, people simply notice that OP doesn’t seem to be funding some of the main existing examples of comms/advocacy efforts, but they don’t really know why, and they don’t really know what kinds of comms/advocacy efforts you’d be excited about.
(Oops, fixed!)
I expect that your search for a “unified resource” will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real “EA perspective”.
Some things you could consider doing:
Having a dialogue with 1-2 key people you disagree with
Pick one perspective (e.g., Paul’s worldview, Eliezer’s worldview) and write about areas you disagree with it.
Write up a “Matthew’s worldview” doc that focuses more on explaining what you expect to happen and isn’t necessarily meant as a “counterargument” piece.
Among the questions you list, I’m most interested in these:
How bad human disempowerment would likely be from a utilitarian perspective
Whether there will be a treacherous turn event, during which AIs violently take over the world after previously having been behaviorally aligned with humans
How likely AIs are to kill every single human if they are unaligned with humans
How society is likely to respond to AI risks, and whether they’ll sleepwalk into a catastrophe
Thanks for this overview, Trevor. I expect it’ll be helpful– I also agree with your recommendations for people to consider working at standard-setting organizations and other relevant EU offices.
One perspective that I see missing from this post is what I’ll call the advocacy/comms/politics perspective. Some examples of this with the EU AI Act:
Foundation models were going to be included in the EU AI Act, until France and Germany (with lobbying pressure from Mistral and Aleph Alpha) changed their position.
This initiated a political/comms battle between those who wanted to exclude foundation models (led by France and Germany) and those who wanted to keep it in (led by Spain).
This political fight rallied lots of notable figures, including folks like Gary Marcus and Max Tegmark, to publicly and privately fight to keep foundation models in the act.
There were open letters, op-eds, and certainly many private attempts at advocacy.
There were attempts to influence public opinion, pieces that accused key lobbyists of lying, and a lot of discourse on Twitter.
It’s difficult to know the impact of any given public comms campaign, but it seems quite plausible to me that many readers would have more marginal impact by focusing on advocacy/comms than focusing on research/policy development.
More broadly, I worry that many segments of the AI governance/policy community might be neglecting to think seriously about what ambitious comms/advocacy could look like in the space.
I’ll note that I might be particularly primed to bring this up now that you work for Open Philanthropy. I think many folks (rightfully) critique Open Phil for being too wary of advocacy, campaigns, lobbying, and other policymaker-focused activities. I’m guessing that Open Phil has played an important role in shaping both the financial and cultural incentives that (in my view) leads to an overinvestment into research and an underinvestment into policy/advocacy/comms.
(I’ll acknowledge these critiques are pretty high-level and I don’t claim that this comment provides compelling evidence for them. Also, you only recently joined Open Phil, so I’m of course not trying to suggest that you created this culture, though I guess now that you work there you might have some opportunities to change it).
I’ll now briefly try to do a Very Hard thing which is like “put myself in Trevor’s shoes and ask what I actually want him to do.” One concrete recommendation I have is something like “try to spend at least 5 minutes thinking about ways in which you or others around you might be embedded in a culture that has blind spots to some of the comms/advocacy stuff.” Another is “make a list of people you read actively or talked to when writing this post. Then ask if there were any other people/orgs you could’ve reached out, particularly those that might focus more on comms+adovacy”. (Also, to be clear, you might do both of these things and conclude “yea, actually I think my approach was very solid and I just had Good Reasons for writing the post the way I did.”)
I’ll stop here since this comment is getting long, but I’d be happy to chat further about this stuff. Thanks again for writing the post and kudos to OP for any of the work they supported/will support that ends up increasing P(good EU AI Act goes through & gets implemented).
This quick take seems relevant: https://forum.effectivealtruism.org/posts/auAYMTcwLQxh2jB6Z/zach-stein-perlman-s-quick-takes?commentId=HiZ8GDQBNogbHo8X8