Retrospective on the AI Safety Field Building Hub

Overview

In late July I posted Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding. This new organization, funded by FTXFF, set out to work on AI safety outreach projects aimed at AI researchers, with funding to mentor new people and the flexibility to take on projects suggested by the community. As we’re now nearing the end of the FTXFF funding, the AISFB Hub is finishing up all existing projects and closing down. It’s been an exciting six months! Here’s a retrospective of all the AISFB Hub has been up to, as well as potential future work we’d get up to if we secure new funding.

What AISFB Hub has done

There are three major categories that the work has fallen into: working with my interview series, completing the pilot of an outreach survey, and everything else.

Interview Series

In Februrary-March 2022, I conducted 97 interviews with AI researchers (who had papers accepted to NeurIPS / ICML 2021). Interviewees were asked about their perceptions of artificial intelligence now and in the future, with special focus on risks from advanced AI systems. I presented the alignment problem and the idea of instrumental incentives, and asked them questions like whether they thought we’d ever achieve AGI, and if they’d be interested in working on AI alignment. I released 11 transcripts of those interviews at the time, posted a talk on the preliminary results, and promised future analysis. Now, for what the AISFB Hub did!

Created a website to display all of the below: https://ai-risk-discussions.org/interviews (EAF post)
Anonymized and released 72 more transcripts (i.e. everyone who gave permission), which brings us up to ⁸³⁄₉₇ transcripts available.
Completed a quantitative analysis of these interviews, especially focused on how people responded to the core AI safety questions (EAF post), with a separate writeup on predicting researchers’ interest in alignment (EAF post).
Created an interactive walkthrough of common perspectives in the interviews, as well as counterarguments to some of the common objections to AI safety arguments (EAF post).

Outreach Survey

In this project, we sent AI researchers (paper accepted at NeurIPS / ICML / ICLR 2021) a 5-hour survey of AI safety readings to engage critically with and answer questions about. 28 researchers completed the pilot survey, and a partial writeup of those pilot results is available on EAF / LW as “What AI Safety Materials Do ML Researchers Find Compelling?” We’re interested in continuing this project with significant modifications (drawn from lessons in the pilot study) if we receive further funding (as a scalable project, it most benefits from funding).

A few writeups were commissioned for use in this project, some of which were used, some not:

Summary writeup of AGI timeline/risk projections as of Oct 2022 - by Kelsey Theriault
List of AI alignment / safety organizations (long version, short version) + EAF post—by Austin Witte
Counterarguments to AI safety and links to refutations—by Kelsey Theriault (related: Arguments against advanced AI safety)

We also had some additional work that was set up for future surveys, which is described later.

Miscellaneous

Write-ups not listed above:

Analysis of AI Safety surveys for field-building insights—by Ash Jafari
Website Resources and What can I do? pages were updated based on Resources post

There was a lot of logistics, featuring: setting up under a fiscal sponsor, hiring / working with lots of people on individual projects, and neverending reimbursements!

I also talked to various members of the community about AI safety fieldbuilding, and acquired a healthy amount of confusion about AI field-building strategy, and our familiar friend clawbacks.

People involved in AISFB Hub

So many people worked with the AISFB Hub, or helped out with various projects! Thanks so much to all of them.

ai-risk-discussions.org: Lukas Trötzmüller (interactive walkthrough, website), Maheen Shermohammed (quantitative analysis), Michael Keenan (website)
Outreach survey: Collin Burns
Data collection, data organizing, text cleaning, copyediting, and ops (alphabetical order): Rauno Arike, Angelica Belo, Tom Hutton, Ash Jafari, Aashish Khmiasia, Harvey LeNar, Kitt Morjanova, Jonathan Ng, Nicole Nohemi, Cleyton Pires, David Spearman, Kelsey Theriault, Stephen Thomas, Lukas Trötzmüller, Austin Witte
Writing: check out the linked EA Forum posts above for names
Interviews: Zi Cheng (Sam) Huang (tagging), Mary Collier Wilks (advising), Andrew Critch (idea suggestion), Tobi Gerstenberg (support)
Broader community: Many people not listed here provided helpful advice, feedback, did a short trial with me, or otherwise contributed. I have AW, Michael Chen, and Vaniver listed in my notes, but let me know if I lost track and you should be listed here!
(If you’re wondering what my role in this org is: I do a lot of the direct work – writing / ops / data analysis etc. – and also manage / hire people to work on projects.)

Funding: FTX Future Fund, Stanford University, two anonymous donors, and LTFF

What AISFB Hub did not do

Of the Specific Projects

There were some projects listed on the original AISFB Hub post that I decided not to pursue further.

Helping with internal OpenAI / DeepMind field-building efforts → My guess after talking to people is that they mostly need internal people rather than external people, and while there’s a possibility of involvement, it’s pretty niche.
AI safety-oriented film → I talked to a number of people attempting to make films, but didn’t think any were pursuing the fully-fledged feature-length film I hoped for. (However, one organization that I didn’t end up talking to is perhaps doing this!) It’s a very difficult vision, though. I’m stepping out of this now since I don’t see a clear path to contribution.
Projects developed by Center for AI Safety → I referred some people to CAIS, but didn’t end up taking on any of their projects myself.

Of the Stated Aims

More broadly, I also failed to complete two of the major stated aims of the AI Safety Field-Building Hub.

First: I basically failed to take on community-suggested field-building projects. I found that the community was unlikely to suggest things I wanted to do more than what I already wanted to do (perhaps unsurprising in retrospect). I was also quite busy with existing projects, and felt bottlenecked on people I felt happy delegating high-level projects to. I was able to help match 1-2 field-building suggestions with people who were ready to execute, but it was rare.
I also failed to mentor people new to AI safety fieldbuilding. As it turns out, I find the [hiring / evaluating / training] circuit stressful, and prefer to work with a couple of people whose world-models and skills are quite close to mine. This meant I was mostly working in relatively peer relationships, or in evaluative rather than mentoring relationships.

Given the above, if I secure additional funding, I plan to substantially restructure this org. I’ll rebrand (probably to “Arkose”, rather than “AI Safety Field Building Hub”) to allow other groups to take the more ambitious, far-reaching name. I’ll have narrower organizational focus, and as such won’t take public suggestions for field-building projects. I’ll also not take on mentorship roles outside of my usual capacity as an EA (though people are welcome to contact me in that capacity, especially to talk about field building). I still aim to work on AI safety field-building projects aimed at ML researchers, with a smaller team of people on individual projects!

Of note, a lot of the anticipated changes above are due to personal fit. I find that field-building can be quite draining, especially when orienting on a sense of impact, and even before the FTXFF situation changed the funding environment and cut off my growth aims, I had been trying lots of pivots to make the experience feel sustainable. To my surprise and pleasure, at this point I’ve settled into a workflow I feel actively happy with. I got to try lots of different things during this AISFB Hub experiment! I really appreciate having had the opportunity to get feedback (from reality and others) about the quality of various ideas, and what kind of tasks, environments, and mindsets feel engaging to me.

Impact Assessment

Finally, a last thing the AISFB Hub did not do: an accursed impact assessment. They’re so correct and good, and I think I’m basically too lazy to do one or figure out how. (As an aside, “making up scrappy processes with no external review” is such a refrain of this new org). Regardless, some notes are below.

My overall goal is to encourage ML researchers to be more interested in AI alignment, given that I think this is an important problem that needs more attention. I’m interested in changing the overall perception of the field to be more pro-safety, in addition to encouraging specific people to work on it if they’re interested. The final output I’m looking for is something like “how many counterfactual people became interested enough to do AI alignment research”. One way to measure this is on the level of individuals – who became more involved, who were less involved, etc. The other measure is more nebulous, and I think needs to incorporate the fact that much of AISFB Hub outputs seem to be “non-peer-reviewed research output”-shaped. And I think it’s worth noting that a lot of my work feels pretty preliminary, like it’s vaguely promising small-scale stuff that feels like it could set up for large-scale outreach if it goes well (which is definitely still in question).

Here’s some data on individuals, and we’ll end there.

Interview series

On 7/29/22 (interviews took place in Feb-early March 2022, so about 5-6 months after), ⁸⁶⁄₉₇ participants were emailed. ⁸²⁄₈₆ participants responded to the email or the reminder email. They were asked:

“Did the interview have a lasting effect on your beliefs (Y/N)?”
- ⁴²⁄₈₂ (51%) responded Y.
“Did the interview cause you to take any new actions in your work (Y/N)?”
- ¹²⁄₈₂ (15%) responded Y.

Note however that the interviews didn’t take place within AISFB Hub’s establishment.

Outreach survey (pilot)

²⁄₂₈ (7%) seemed actively aggravated (maybe ³⁄₃₀ (10%)) with AI alignment post-survey
²⁄₂₈ (7%) seemed high degree of interest (maybe ²⁄₃₀ (7%)) in AI alignment post-survey
⁸⁄₂₈ (29%) seemed high degree (n=2) or pretty interested (n=6) in AI alignment post-survey

This ratio is not great, and we would want better numbers before progressing.

What’s next?

I’m interested in continuing to work on AI safety field-building, aimed at ML researchers. AISFB Hub is closing down, but if I secure additional funding, then I’d want to start a new rebranded org (probably “Arkose”) with a more focused mode of operation. I’d likely focus on a “survey + interview(?) field-building org aimed at ML researchers”, where I’d hire a couple of people to help work on individual projects and still do a bunch of direct work myself. While these ideas are going to need to be more fleshed out before applying for funding, the directions I’m most excited about are:

Surveys

I like surveys because they’re scalable, and because a lot of people haven’t heard of AI alignment (41% had heard of AI alignment in any capacity in my interviews) so it’s a nice way to introduce people to the ideas. (I also personally enjoy the tasks involved in surveys). We completed a pilot outreach survey, which went all right, but we have ideas for how to restructure it so that it goes better.

Once we have that better survey, I’m also interested in modifying it so that it can be run in China. I continue to be interested in AI safety in China, and have since talked to a lot of the involved parties there, who seem tentatively interested in me running such a survey.

We also did a fair amount of work to prepare for larger-scale deployment of surveys – if our pilot had gone better and funding had remained, we would have likely started scaling. A lot of the field-building insights from previous posts will be useful for focusing on specific populations, and we’ve done some work with respect to scalable survey logistics.

Interviews

One-on-one conversations with AI researchers offer an unparalleled degree of personalization. During the AISFB Hub period, I was most interested in training other people to conduct these interviews, since I find conducting interviews to be pretty tiring as an introvert. There were a couple of potentially good candidates, but ultimately I don’t think anything was firmly set in motion.

However, people continue to be excited about interviews, and there are a couple of different ways I could see this progressing:

More structured interviews compared to my previous interview series (also more technical-focused). This might make it more sustainable for me personally to conduct interviews. (And might increase trainability, but the technical requirements would be higher…)
I continue to be interested in doing a pairing program between AI safety researchers and AI researchers for one-on-ones. I haven’t had time to invest in this, but have some preliminary interest and plans.

Pipeline

I also remain firmly interested in “building the ML researcher pipeline” activities, despite not having concrete plans here. These wouldn’t be focuses of the org, but will be something I’m often thinking about when developing surveys and interviews.

After ML researchers are introduced to AI alignment, where do interested researchers go next? Is there something they can do before the AGISF AI alignment curriculum?
We probably need new materials aimed specifically at AI researchers.
- We haven’t tested all the existing introductory material, but the intended audience does seem to matter a lot for reception, and I’d definitely take more options.
- One next-step need: If an ML researcher comes from [x subfield] and is interested in working on an alignment research project, what are the existing current papers they should read?
Prizes to work on AI alignment projects is something I haven’t looked into, but still seems potentially quite good – how can those be integrated better into the pipeline? (CAIS’s work)
Having AI safety workshops at the major conferences also seems good – is that covered adequately? (CAIS and others)
Thoughts I’ve been hearing around: peer review for AI alignment papers, prediction markets / forecasting training

Interested in funding me?

The above thus constitutes my rough plans for a “survey + interview(?) field-building org aimed at ML researchers”! I’m going to put together a more detailed plan in the next month and apply to Open Phil and probably SFF. (I’ve talked with Open Phil some already; they’re interested in seeing a full grant proposal from me.)

However, I’m also interested in talking to individual donors who want more of this stuff done, or have specific projects they’d be excited about funding. This is not a strong bid at all – I’ll get it sorted one way or another, and I’m personally financially stable. But if you happened to be interested in this work, I have a 501(c)(3) setup such that donations are tax deductible, and I’d like to chat about what you think field-building priorities are and how that may intersect with my plans.

Timeline: I anticipate taking a pause from this work for ~2-5 months pretty soon. This is mostly to explore some different field-building / ops experiences while I wait for funding evaluations. After that, I’ll be making a choice about whether to continue AISFB Hub-like activities, or work for a different org where I can do AI safety fieldbuilding. While my future plans are quite uncertain and subject to revision, my current best guess is that I’ll want to return to running this org, and ease and continuity of funding is likely to be a major crux.

Conclusion

And that’s all, folks. It’s been a really cool ride for me – thanks so much for everyone who contributed to the AISFB Hub! Looking forward to whatever comes next, and very happy to discuss.